Skip to main content

Audio Analysis and Features

This section describes the various audio features that can be extracted from an audio recording, including the AudioFeatures interface, AudioAnalysis, and the extractAudioAnalysis function.

AudioAnalysis

The AudioAnalysis interface represents the detailed analysis of an audio signal, including the extracted audio features.

Interface


/**
* Represents the complete data from the audio analysis.
*/
export interface AudioAnalysis {
pointsPerSecond: number // How many consolidated value per second
durationMs: number // Duration of the audio in milliseconds
bitDepth: number // Bit depth of the audio
samples: number // Size of the audio in bytes
numberOfChannels: number // Number of audio channels
sampleRate: number // Sample rate of the audio
dataPoints: DataPoint[] // Array of data points from the analysis.
amplitudeRange: {
min: number
max: number
}
// TODO: speaker detection
speakerChanges?: {
timestamp: number // Timestamp of the speaker change in milliseconds.
speaker: number // Speaker identifier.
}[]
}

AudioFeatures

The AudioFeatures interface represents various audio features that can be extracted from an audio signal.

Interface

export interface AudioFeatures {
energy: number // The infinite integral of the squared signal, representing the overall energy of the audio.
mfcc: number[] // Mel-frequency cepstral coefficients, describing the short-term power spectrum of a sound.
rms: number // Root mean square value, indicating the amplitude of the audio signal.
minAmplitude: number // Minimum amplitude value in the audio signal.
maxAmplitude: number // Maximum amplitude value in the audio signal.
zcr: number // Zero-crossing rate, indicating the rate at which the signal changes sign.
spectralCentroid: number // The center of mass of the spectrum, indicating the brightness of the sound.
spectralFlatness: number // Measure of the flatness of the spectrum, indicating how noise-like the signal is.
spectralRolloff: number // The frequency below which a specified percentage (usually 85%) of the total spectral energy lies.
spectralBandwidth: number // The width of the spectrum, indicating the range of frequencies present.
chromagram: number[] // Chromagram, representing the 12 different pitch classes of the audio.
tempo: number // Estimated tempo of the audio signal, measured in beats per minute (BPM).
hnr: number // Harmonics-to-noise ratio, indicating the proportion of harmonics to noise in the audio signal.
}

DataPoint

The DataPoint interface represents individual data points extracted from an audio signal during analysis.

Interface

/**
* Represents a single data point in the audio analysis.
*/
export interface DataPoint {
id: number
amplitude: number
activeSpeech?: boolean
dB?: number
silent?: boolean
features?: AudioFeatures
startTime?: number
endTime?: number
// start / end position in bytes
startPosition?: number
endPosition?: number
// number of audio samples for this point (samples size depends on bit depth)
samples?: number
// TODO: speaker detection
speaker?: number
}