Create Audio AI Companions

Audio AI Companion - Configuration Guide

The Audio AI Editor enables creators to create and customize the behavior of live audio AI Companions using various configuration settings.

Voice Activity Detection determines when the system detects speech in an audio stream.

Duration (Milliseconds): Configure the time window for detecting voice activity, ranging from 200 ms to 2000 ms.
How It Works:
- A shorter duration makes VAD more sensitive to brief sounds.
- Longer durations are useful for capturing sustained speech while reducing noise interference.

The silence threshold sets the minimum audio level required for the system to detect voice input.

Threshold: A value between 0.2 and 1.0.
How It Works:
- Lower thresholds (e.g., 0.2) make the system sensitive to softer sounds.
- Higher thresholds (e.g., 1.0) ensure only loud or prominent sounds are captured, filtering background noise.

Creativity (Temperature): Adjusts the system's randomness in generating outputs (Between 0-2). Higher values (e.g., 1.7) produce more creative responses, while lower values generate more deterministic outputs.
Word Diversity (Top P): Controls how diverse or focused the generated responses are (Between 0-1). Lower values ensure more relevant and concise outputs.
Voice Model: Allows selection from available AI voice models (e.g., "coral") for tailoring audio output styles.

Optimizing VAD Settings: Experiment with the duration to find the ideal balance between responsiveness and accuracy for your use case.
Fine-Tuning Silence Threshold: Use a lower threshold in quiet environments to capture all audio and a higher threshold in noisy spaces to focus on clear speech.
Preview and Test: Always test your configurations in the "Test" or "Preview" section to ensure your settings meet project requirements.

Last updated 12 months ago

Was this helpful?