Agent Creation
Voice & Conversational Configuration
On this page
Tailor your AI agent’s auditory persona and interaction logic to deliver a seamless, professional user experience. You have full control over the agent's identity by selecting specific voice profiles, stack models and calibrating speaking speed. Beyond vocal quality, you can control dialogue dynamics pacing and interruption handling to ensure natural conversation flow. For global applications, implement language-specific settings to maintain regional accuracy and authenticity.
Voice & language
Begin the configuration process by defining the core language and persona for your agent.
Voice
Access the Voice menu to browse the integrated library and select a profile that best aligns with your company’s brand identity and specific application requirements.
Language
To provide a seamless global experience, our platform allows you to customize exactly how your AI agent interacts with callers. You can define which languages the agent supports and, more importantly, how the agent determines which language to use at the start of a call.
Selecting Your Language Mode
Depending on your target audience and the complexity of your workflow, you can choose from three distinct language detection modes:
- Single Language: The agent is hard-coded to one specific language. This is the fastest option for localized services where the language is guaranteed.
- Auto-Detect: The agent "listens" to the caller’s opening phrase and automatically switches its voice and processing to match the detected language.
- DTMF (Dual-Tone Multi-Frequency): This mode uses a traditional keypad menu (e.g., "Press 1 for English"). By requiring a manual button press, DTMF provides 100% accuracy and is the most reliable method for routing callers in noisy environments or multi-lingual regions.
Conversational configuration
Refine your agent’s interaction logic by selecting the appropriate stack and configuring speech dynamics, interruption handling, and conversation initiation.
Stack Configuration
Select your Stack model from the drop-down menu to align with your specific use case. The optimal selection depends on your requirements for response speed, output accuracy, and the agent's primary operating language.
Speech Speed
Adjust the agent’s vocal cadence using the range slider, which supports speeds from 0.7x to 1.3x.
Recommendation
For the most natural user experience, maintaining the default 1.0x speed is recommended.
Conversation initiation & Interruptions
Customize how your agent manages the flow of a conversation and determine who takes the lead during the initial connection.
Handling Interruptions
Allow Interruptions (Enabled)
The agent will stop speaking immediately upon detecting the caller's voice. This creates a natural, fluid "back-and-forth" dialogue.
Allow Interruptions (Disabled)
The agent will complete its current thought regardless of the caller’s input. This is best for delivering critical disclosures or instructions where the full message must be heard.
Conversation Initiation
You can control which party begins the interaction once the call is successfully connected.
Start First (Enabled): The agent takes the lead by delivering an opening greeting immediately (e.g., "Hello, thank you for calling. How can I help you?").
Start First (Disabled): The agent remains silent upon connection and waits for the caller to speak first.
Important Note
If Start First is disabled, the agent will not respond until it detects audio from the caller. This is commonly used for "Warm Transfer" scenarios where a human agent introduces the AI.
Audio Realism
To create a more immersive and human-like experience, you can customize the auditory environment of your agent. These settings help mask processing time and make the AI feel like a natural part of a live setting.
Background Ambient Noise
You can enhance the agent’s realism by enabling a subtle background soundscape (e.g., the low hum of a busy office or distant chatter). This helps the agent blend into a professional environment rather than sounding like it is speaking from a void.
Configure
Use the Background Ambient toggle to enable or disable this feature.
Volume Control
A dedicated slider allows you to mix the ambient noise level, so it supports rather than overpowers the agent’s voice.

Thinking Sounds
To maintain a natural conversational flow and eliminate "dead air" while the agent performs tasks (e.g., booking an appointment or querying a database), you can enable Thinking Sounds.
Effect
The agent will produce subtle auditory cues, such as the sound of keyboard typing, during pauses in the conversation.
The Benefit
This provides the caller with immediate feedback that the agent is still active and working on their request, reducing the likelihood of the caller hanging up during brief processing moments.
Configure
Use the Thinking Sounds toggle to enable or disable this feature.
Volume Control
A dedicated slider allows you to mix the thinking sound, so it supports rather than overpowers the agent’s voice.

Tip
Balancing Audio Levels
We recommend keeping Thinking Sounds slightly louder than Background Ambient noise. This ensures the user clearly perceives the agent's "activity" without the background noise becoming a distraction.