Recording Configuration
The recording configuration specifies the settings used for audio recording on different platforms. Below are the default settings for Android, iOS, and web platforms:
- On Android: 16kHz sample rate, 16-bit depth, 1 channel.
- On IOS: 48kHz sample rate, 16-bit depth, 1 channel.
- On the web, default configuration is 44.1kHz sample rate, 32-bit depth, 1 channel.
Note on iOS Recording: The library now automatically detects and adapts to the hardware's actual sample rate on both iOS devices and simulators. This means you can specify any supported sample rate (e.g., 16kHz, 44.1kHz, 48kHz) in your configuration, and the library will:
- Capture audio at the hardware's native sample rate (typically 44.1kHz on simulators)
- Perform high-quality resampling to match your requested sample rate
- Deliver the final recording at your specified configuration
This automatic adaptation prevents crashes that previously occurred when the requested sample rate didn't match the hardware capabilities, especially in simulators.
export interface RecordingConfig {
sampleRate?: SampleRate // Sample rate for recording (16000, 44100, or 48000 Hz)
channels?: 1 | 2 // Number of audio channels (1 for mono, 2 for stereo)
encoding?: EncodingType // Encoding type for the recording (pcm_32bit, pcm_16bit, pcm_8bit)
interval?: number // Interval in milliseconds at which to emit recording data (minimum: 10ms)
intervalAnalysis?: number // Interval in milliseconds at which to emit analysis data (minimum: 10ms)
// Device and notification settings
keepAwake?: boolean // Continue recording when app is in background. On iOS, requires both 'audio' and 'processing' background modes (default is true)
showNotification?: boolean // Show a notification during recording (default is false)
showWaveformInNotification?: boolean // Show waveform in the notification (Android only)
notification?: NotificationConfig // Configuration for the notification
audioFocusStrategy?: 'background' | 'interactive' | 'communication' | 'none' // Audio focus strategy for handling interruptions (Android)
// Audio processing settings
enableProcessing?: boolean // Enable audio processing (default is false)
pointsPerSecond?: number // Number of data points to extract per second of audio (default is 10)
algorithm?: AmplitudeAlgorithm // Algorithm to use for amplitude computation (default is "rms")
features?: AudioFeaturesOptions // Feature options to extract (default is empty)
// Platform specific configuration
ios?: IOSConfig // iOS-specific configuration
web?: WebConfig // Web-specific configuration
// Output configuration
output?: OutputConfig // Control which files are created during recording
outputDirectory?: string // Custom directory for saving recordings (uses app default if not specified)
filename?: string // Custom filename for the recording (uses UUID if not specified)
// Interruption handling
autoResumeAfterInterruption?: boolean // Whether to automatically resume after interruption
onRecordingInterrupted?: (_: RecordingInterruptionEvent) => void // Callback for interruption events
// Callback functions
onAudioStream?: (_: AudioDataEvent) => Promise<void> // Callback function to handle audio stream
onAudioAnalysis?: (_: AudioAnalysisEvent) => Promise<void> // Callback function to handle audio features
// Performance options
bufferDurationSeconds?: number // Buffer duration in seconds (controls audio buffer size)
}
Platform-Specific Architecture
Web
On the web, the recording utilizes the AudioWorkletProcessor
for handling audio data. The AudioWorkletProcessor
allows for real-time audio processing directly in the browser, making it a powerful tool for web-based audio applications.
Android
On Android, the recording is managed using Android's native AudioRecord
API along with AudioFormat
and MediaRecorder
. These classes are part of the Android framework and provide low-level access to audio hardware, allowing for high-quality audio recording.
iOS
On iOS, the recording is managed using AVAudioEngine
and related classes from the AVFoundation
framework. The implementation uses a sophisticated audio handling approach that:
- Automatically detects and adapts to the hardware's native sample rate
- Handles sample rate mismatches between iOS audio session and actual hardware capabilities
- Performs high-quality resampling to match the requested configuration
- Works reliably on both physical devices and simulators regardless of the requested sample rate
- Supports both 16-bit and 32-bit PCM formats
- Maintains audio quality through intermediate Float32 format when necessary
Event Emission Intervals
The interval
and intervalAnalysis
options control how frequently audio data and analysis events are emitted during recording. Both have a minimum value of 10ms to ensure consistent behavior across platforms while preventing excessive CPU usage.
Performance Considerations
Interval | CPU Usage | Battery Impact | Use Case |
---|---|---|---|
10-50ms | High | High | Real-time visualizations, live frequency analysis |
50-100ms | Medium | Medium | Responsive UI updates, waveform display |
100-500ms | Low | Low | Progress indicators, level meters |
500ms+ | Very Low | Minimal | File size monitoring, duration tracking |
Best Practices
- For real-time visualizations: Use
intervalAnalysis: 10
with minimal features enabled - For general recording: Use
interval: 100
or higher to balance responsiveness and performance - For battery-sensitive apps: Use intervals of 500ms or higher
- Platform considerations: While both iOS and Android support 10ms intervals, actual performance may vary based on device capabilities
Example configuration for real-time visualization:
const realtimeConfig = {
intervalAnalysis: 10, // 10ms for smooth updates
interval: 100, // 100ms for data emission
enableProcessing: true,
features: {
fft: true, // Only enable what you need
energy: false,
rms: false
}
};
Platform Differences
Android and iOS
On Android and iOS, the library attempts to record audio in the specified format. On iOS, the audio is automatically resampled to match the requested configuration using AVAudioConverter, ensuring high-quality output even when the hardware sample rate differs from the target rate.
Web
On the web, the default configuration is typically higher, with a 44.1kHz sample rate and 32-bit depth. This ensures better sound quality, but it can lead to issues when resampling is required to lower settings.
Recording Process
To start recording, you use the startRecording
function which accepts a RecordingConfig
object. The output of this function is a StartRecordingResult
.
export interface StartRecordingResult {
fileUri: string
mimeType: string
channels?: number
bitDepth?: BitDepth
sampleRate?: SampleRate
compression?: {
compressedFileUri: string
size: number
mimeType: string
bitrate: number
format: string
}
}
Zero-Latency Recording
The library provides a prepareRecording
method that can significantly reduce the latency between a user action and the actual start of recording. This is particularly useful for time-sensitive applications where any delay in starting audio capture could be problematic.
How it Works
When using the standard startRecording
function, there's an inherent delay caused by several initialization steps:
- Requesting user permissions (if not already granted)
- Setting up audio sessions
- Allocating memory for audio buffers
- Initializing hardware resources
- Configuring encoders and audio processing pipelines
The prepareRecording
method decouples these initialization steps from the actual recording start, allowing your application to pre-initialize all necessary resources in advance.
Using prepareRecording
import { useAudioRecorder, useSharedAudioRecorder } from '@siteed/expo-audio-studio';
// With individual recorder hook
const {
prepareRecording,
startRecording,
stopRecording
} = useAudioRecorder();
// Or with shared recorder context
const {
prepareRecording,
startRecording,
stopRecording
} = useSharedAudioRecorder();
// Prepare recording during component mounting or any appropriate initialization phase
useEffect(() => {
const prepare = async () => {
await prepareRecording({
sampleRate: 44100,
channels: 1,
encoding: 'pcm_16bit',
// Add any other recording configuration options
});
console.log('Recording resources prepared and ready');
};
prepare();
}, []);
// Later, when the user triggers recording, it starts with minimal latency
const handleRecordButton = async () => {
await startRecording({
// Use the same configuration as in prepareRecording
sampleRate: 44100,
channels: 1,
encoding: 'pcm_16bit',
});
};
Key Benefits
- Eliminates perceptible lag between user action and recording start
- Improves user experience for time-sensitive applications
- Consistent behavior across all supported platforms
- Maintains audio quality while reducing startup latency
Implementation Notes
- Call
prepareRecording
as early as possible, such as during screen loading - Use identical configuration for both
prepareRecording
andstartRecording
- The preparation state persists until recording starts or the app is terminated
- If
startRecording
is called without prior preparation, it performs normal initialization - Resources are automatically released when recording starts or when the component is unmounted
Example: Capture Time-Critical Audio
This example demonstrates how to implement a voice command system where capturing the beginning of speech is critical:
function VoiceCommandScreen() {
const {
prepareRecording,
startRecording,
stopRecording,
isRecording
} = useSharedAudioRecorder();
// Prepare audio resources when screen loads
useEffect(() => {
const prepareAudio = async () => {
await prepareRecording({
sampleRate: 16000, // Optimized for speech
channels: 1,
encoding: 'pcm_16bit',
enableProcessing: true,
features: {
energy: true,
rms: true,
}
});
};
prepareAudio();
return () => {
// Clean up if needed
if (isRecording) {
stopRecording();
}
};
}, []);
return (
<View style={styles.container}>
<Text style={styles.instructions}>
Press and hold to capture voice command
</Text>
<Pressable
onPressIn={() => startRecording({
sampleRate: 16000,
channels: 1,
encoding: 'pcm_16bit',
enableProcessing: true,
features: {
energy: true,
rms: true,
}
})}
onPressOut={stopRecording}
style={({ pressed }) => [
styles.recordButton,
{ backgroundColor: pressed || isRecording ? 'red' : 'blue' }
]}
>
<Text style={styles.buttonText}>
{isRecording ? 'Recording...' : 'Press to Record'}
</Text>
</Pressable>
</View>
);
}
Web Memory Optimization
On the web platform, audio recording can consume significant memory, especially for longer recordings. The library offers a memory optimization option to reduce memory usage by controlling how uncompressed audio data is stored.
Web Configuration Options
interface WebConfig {
/**
* Whether to store uncompressed audio data for WAV generation (web only)
*
* Default: true (for backward compatibility)
*/
storeUncompressedAudio?: boolean
}
Memory Usage Control
The storeUncompressedAudio
option lets you control how audio data is handled in memory:
-
When true (default): All PCM chunks are stored in memory during recording, enabling WAV file generation when compression is disabled. This provides maximum flexibility but can use significant memory for long recordings.
-
When false: Only compressed audio is kept (if compression is enabled), significantly reducing memory usage. This is ideal for long recordings where memory constraints are a concern.
Example Usage
const { startRecording } = useAudioRecorder();
// Memory-efficient recording for long sessions
await startRecording({
sampleRate: 44100,
channels: 1,
compression: {
enabled: true, // Enable compression to ensure audio is captured
format: 'opus',
bitrate: 64000
},
web: {
storeUncompressedAudio: false // Only store compressed data
}
});