# Create Audio AI Companions

## Audio AI Companion - Configuration Guide

The Audio AI Editor enables creators to create and customize the behavior of live audio AI Companions using various configuration settings.&#x20;

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fb2XHHmnGm9gfPrTW5NRc%2Fuploads%2FYvTdMajt2FvmnLJmsYCJ%2F2024-12-02-CreatorsAGI_Audio_AI_Companions.mp4?alt=media&token=ab718608-ce1c-4de5-bb0e-34e862682fc1>" %}

### Key Features and Adjustable Settings

#### 1. Voice Activity Detection (VAD)

Voice Activity Detection determines when the system detects speech in an audio stream.

* Duration (Milliseconds):  Configure the time window for detecting voice activity, ranging from 200 ms to 2000 ms.
* How It Works:
  * A shorter duration makes VAD more sensitive to brief sounds.
  * Longer durations are useful for capturing sustained speech while reducing noise interference.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcuYQHFEoFEd__Y5QliNczxMzydw8L-HRDx5ulJ8xNq9lVDAfzC4U-F2zk83N-xtbdrFphg_ndNUZzgscnuuw_KzDkShvZuIY5pHspEoVMeQeAYEF5Jqx6yOEHQTN3PYIJPHdGQ?key=Rwl6h3XPCxb03Fd9vdjC2oKb" alt=""><figcaption></figcaption></figure>

#### 2. Audio Silence Threshold

The silence threshold sets the minimum audio level required for the system to detect voice input.

* Threshold: A value between 0.2 and 1.0.
* How It Works:
  * Lower thresholds (e.g., 0.2) make the system sensitive to softer sounds.
  * Higher thresholds (e.g., 1.0) ensure only loud or prominent sounds are captured, filtering background noise.

<figure><img src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXeLFxRhBGabtPMjYJLyd1pVDcbkKsLXWYZ6BQ3rKRPqEaT_do7B0O101SkDE8WUfLiO1hatO6llDJ222QIVmSFWLI_HGGKA0-bQ_CUzJGTv9F0QCBZywgocTQdg03I1QCA9XmH0zw?key=Rwl6h3XPCxb03Fd9vdjC2oKb" alt=""><figcaption></figcaption></figure>

### Other Configurable Options

* Creativity (Temperature): Adjusts the system's randomness in generating outputs (Between 0-2).  Higher values (e.g., 1.7) produce more creative responses, while lower values generate more deterministic outputs.
* Word Diversity (Top P): Controls how diverse or focused the generated responses are (Between 0-1). Lower values ensure more relevant and concise outputs.
* Voice Model: Allows selection from available AI voice models (e.g., "coral") for tailoring audio output styles.

***

### Usage Tips

* Optimizing VAD Settings:\
  Experiment with the duration to find the ideal balance between responsiveness and accuracy for your use case.
* Fine-Tuning Silence Threshold:\
  Use a lower threshold in quiet environments to capture all audio and a higher threshold in noisy spaces to focus on clear speech.
* Preview and Test:\
  Always test your configurations in the "Test" or "Preview" section to ensure your settings meet project requirements.
