Voice & Conversational Configuration

Tailor your AI agent’s auditory persona and interaction logic to deliver a seamless, professional user experience. You have full control over the agent's identity by selecting specific voice profiles, stack models and calibrating speaking speed. Beyond vocal quality, you can control dialogue dynamics pacing and interruption handling to ensure natural conversation flow. For global applications, implement language-specific settings to maintain regional accuracy and authenticity.

Voice & language

Begin the configuration process by defining the core language and persona for your agent.

Voice

Access the Voice menu to browse the integrated library and select a profile that best aligns with your company’s brand identity and specific application requirements.

Language

To provide a seamless global experience, our platform allows you to customize exactly how your AI agent interacts with callers. You can define which languages the agent supports and, more importantly, how the agent determines which language to use at the start of a call.

Selecting Your Language Mode

Depending on your target audience and the complexity of your workflow, you can choose from three distinct language detection modes:

Single Language: The agent is hard-coded to one specific language. This is the fastest option for localized services where the language is guaranteed.
Auto-Detect: The agent "listens" to the caller’s opening phrase and automatically switches its voice and processing to match the detected language.
DTMF (Dual-Tone Multi-Frequency): This mode uses a traditional keypad menu (e.g., "Press 1 for English"). By requiring a manual button press, DTMF provides 100% accuracy and is the most reliable method for routing callers in noisy environments or multi-lingual regions.

Conversational configuration

Refine your agent’s interaction logic by selecting the appropriate stack and configuring speech dynamics, interruption handling, and conversation initiation.

Stack Configuration

Select your Stack model from the drop-down menu to align with your specific use case. The optimal selection depends on your requirements for response speed, output accuracy, and the agent's primary operating language.

Speech Speed

Adjust the agent’s vocal cadence using the range slider, which supports speeds from 0.7x to 1.3x.

Recommendation

For the most natural user experience, maintaining the default 1.0x speed is recommended.

Conversation initiation & Interruptions

Customize how your agent manages the flow of a conversation and determine who takes the lead during the initial connection.

Handling Interruptions

Allow Interruptions (Enabled)

The agent will stop speaking immediately upon detecting the caller's voice. This creates a natural, fluid "back-and-forth" dialogue.

Allow Interruptions (Disabled)

The agent will complete its current thought regardless of the caller’s input. This is best for delivering critical disclosures or instructions where the full message must be heard.

Conversation Initiation

You can control which party begins the interaction once the call is successfully connected.

Start First (Enabled): The agent takes the lead by delivering an opening greeting immediately (e.g., "Hello, thank you for calling. How can I help you?").

Start First (Disabled): The agent remains silent upon connection and waits for the caller to speak first.

Important Note

If Start First is disabled, the agent will not respond until it detects audio from the caller. This is commonly used for "Warm Transfer" scenarios where a human agent introduces the AI.

Audio Realism

To create a more immersive and human-like experience, you can customize the auditory environment of your agent. These settings help mask processing time and make the AI feel like a natural part of a live setting.

Background Ambient Noise

You can enhance the agent’s realism by enabling a subtle background soundscape (e.g., the low hum of a busy office or distant chatter). This helps the agent blend into a professional environment rather than sounding like it is speaking from a void.

Configure

Use the Background Ambient toggle to enable or disable this feature.

Volume Control

A dedicated slider allows you to mix the ambient noise level, so it supports rather than overpowers the agent’s voice.

Thinking Sounds

To maintain a natural conversational flow and eliminate "dead air" while the agent performs tasks (e.g., booking an appointment or querying a database), you can enable Thinking Sounds.

Effect

The agent will produce subtle auditory cues, such as the sound of keyboard typing, during pauses in the conversation.

The Benefit

This provides the caller with immediate feedback that the agent is still active and working on their request, reducing the likelihood of the caller hanging up during brief processing moments.

Configure

Use the Thinking Sounds toggle to enable or disable this feature.

Volume Control

A dedicated slider allows you to mix the thinking sound, so it supports rather than overpowers the agent’s voice.

Tip

Balancing Audio Levels

We recommend keeping Thinking Sounds slightly louder than Background Ambient noise. This ensures the user clearly perceives the agent's "activity" without the background noise becoming a distraction.

Voice & language

Begin the configuration process by defining the core language and persona for your agent.

Voice

Access the Voice menu to browse the integrated library and select a profile that best aligns with your company’s brand identity and specific application requirements.

Language

Selecting Your Language Mode

Depending on your target audience and the complexity of your workflow, you can choose from three distinct language detection modes:

Single Language: The agent is hard-coded to one specific language. This is the fastest option for localized services where the language is guaranteed.
Auto-Detect: The agent "listens" to the caller’s opening phrase and automatically switches its voice and processing to match the detected language.
DTMF (Dual-Tone Multi-Frequency): This mode uses a traditional keypad menu (e.g., "Press 1 for English"). By requiring a manual button press, DTMF provides 100% accuracy and is the most reliable method for routing callers in noisy environments or multi-lingual regions.

Conversational configuration

Refine your agent’s interaction logic by selecting the appropriate stack and configuring speech dynamics, interruption handling, and conversation initiation.

Stack Configuration

Speech Speed

Adjust the agent’s vocal cadence using the range slider, which supports speeds from 0.7x to 1.3x.

Recommendation

For the most natural user experience, maintaining the default 1.0x speed is recommended.

Conversation initiation & Interruptions

Customize how your agent manages the flow of a conversation and determine who takes the lead during the initial connection.

Handling Interruptions

Allow Interruptions (Enabled)

The agent will stop speaking immediately upon detecting the caller's voice. This creates a natural, fluid "back-and-forth" dialogue.

Allow Interruptions (Disabled)

The agent will complete its current thought regardless of the caller’s input. This is best for delivering critical disclosures or instructions where the full message must be heard.

Conversation Initiation

You can control which party begins the interaction once the call is successfully connected.

Start First (Enabled): The agent takes the lead by delivering an opening greeting immediately (e.g., "Hello, thank you for calling. How can I help you?").

Start First (Disabled): The agent remains silent upon connection and waits for the caller to speak first.

Important Note

If Start First is disabled, the agent will not respond until it detects audio from the caller. This is commonly used for "Warm Transfer" scenarios where a human agent introduces the AI.

Audio Realism

Background Ambient Noise

Configure

Use the Background Ambient toggle to enable or disable this feature.

Volume Control

A dedicated slider allows you to mix the ambient noise level, so it supports rather than overpowers the agent’s voice.

Thinking Sounds

To maintain a natural conversational flow and eliminate "dead air" while the agent performs tasks (e.g., booking an appointment or querying a database), you can enable Thinking Sounds.

Effect

The agent will produce subtle auditory cues, such as the sound of keyboard typing, during pauses in the conversation.

The Benefit

This provides the caller with immediate feedback that the agent is still active and working on their request, reducing the likelihood of the caller hanging up during brief processing moments.

Configure

Use the Thinking Sounds toggle to enable or disable this feature.

Volume Control

A dedicated slider allows you to mix the thinking sound, so it supports rather than overpowers the agent’s voice.

Tip

Balancing Audio Levels

We recommend keeping Thinking Sounds slightly louder than Background Ambient noise. This ensures the user clearly perceives the agent's "activity" without the background noise becoming a distraction.

Voice & Conversational Configuration

On this page

Voice & language

Voice

Language

Conversational configuration

Stack Configuration

Speech Speed

Conversation initiation & Interruptions

Audio Realism

Voice & Conversational Configuration

On this page

Voice & language

Voice

Language

Conversational configuration

Stack Configuration

Speech Speed

Conversation initiation & Interruptions

Audio Realism

Search documentation

Search documentation

On this page

Voice & language

Voice

Language

Conversational configuration

Stack Configuration

Speech Speed

Conversation initiation & Interruptions

Audio Realism

Search documentation

Search documentation

On this page

Voice & language

Voice

Language

Conversational configuration

Stack Configuration

Speech Speed

Conversation initiation & Interruptions

Audio Realism