Documentation Index
Fetch the complete documentation index at: https://daily-docs-pr-4386.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
AWSNovaSonicLLMService enables natural, real-time conversations with AWS Nova Sonic. It provides built-in audio transcription, voice activity detection, and context management for creating interactive AI experiences with bidirectional audio streaming, text generation, and function calling capabilities.
AWS Nova Sonic API Reference
Pipecat’s API methods for AWS Nova Sonic integration
Example Implementation
Complete AWS Nova Sonic conversation example
AWS Bedrock Documentation
Official AWS Bedrock and Nova Sonic documentation
AWS Console
Access AWS Bedrock and manage Nova Sonic models
Installation
To use AWS Nova Sonic services, install the required dependencies:Prerequisites
AWS Account Setup
Before using AWS Nova Sonic services, you need:- AWS Account: Set up at AWS Console
- Bedrock Access: Enable AWS Bedrock service in your region
- Model Access: Request access to Nova Sonic models in Bedrock
- IAM Credentials: Configure AWS access keys with Bedrock permissions
Required Environment Variables
AWS_SECRET_ACCESS_KEY: Your AWS secret access keyAWS_ACCESS_KEY_ID: Your AWS access key IDAWS_REGION: AWS region where Bedrock is available
Key Features
- Real-time Speech-to-Speech: Direct audio input to audio output processing
- Built-in Transcription: Automatic speech-to-text with real-time streaming
- Voice Activity Detection: Automatic detection of speech start/stop
- Function Calling: Support for external function and API integration
- Multiple Voices: Choose from matthew, tiffany, and amy voice options
Configuration
AWSNovaSonicLLMService
AWS secret access key for authentication.
AWS access key ID for authentication.
AWS session token for temporary credentials (e.g., when using AWS STS).
AWS region where the service is hosted. Supported regions for Nova 2 Sonic
(default):
"us-east-1", "us-west-2", "ap-northeast-1". Supported regions
for Nova Sonic (older model): "us-east-1", "ap-northeast-1".Model identifier. Use
"amazon.nova-2-sonic-v1:0" for the latest model or
"amazon.nova-sonic-v1:0" for the older model.Deprecated in v0.0.105. Use settings=AWSNovaSonicLLMService.Settings(model=...) instead.Voice ID for speech synthesis. Some voices are designed for specific
languages. See AWS Nova 2 Sonic voice
support
for available voices.Deprecated in v0.0.105. Use
settings=AWSNovaSonicLLMService.Settings(voice=...) instead.Model parameters for audio configuration and inference. See Params
below.Deprecated in v0.0.105. Use
settings=AWSNovaSonicLLMService.Settings(...) for inference settings and audio_config=AudioConfig(...) for audio configuration.Audio configuration (sample rates, sample sizes, channel counts). If not
provided, defaults are used (16kHz input, 24kHz output, 16-bit, mono). See
AudioConfig below.
Runtime-configurable settings. See Settings below.
System-level instruction for the model.Deprecated in v0.0.105. Use
settings=AWSNovaSonicLLMService.Settings(system_instruction=...) instead.Available tools/functions for the model to use.
Configuration for automatic session continuation. When enabled (the default),
sessions are seamlessly rotated before the AWS time limit (~8 minutes) with no
user-perceptible interruption. See
SessionContinuationParams below.
Settings
Runtime-configurable settings passed via thesettings constructor argument using AWSNovaSonicLLMService.Settings(...). These can be updated mid-conversation with LLMUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | NOT_GIVEN | Model identifier. (Inherited from base settings.) |
system_instruction | str | NOT_GIVEN | System instruction/prompt. (Inherited from base settings.) |
temperature | float | NOT_GIVEN | Sampling temperature for text generation. (Inherited from base settings.) |
max_tokens | int | NOT_GIVEN | Maximum number of tokens to generate. (Inherited from base settings.) |
top_p | float | NOT_GIVEN | Nucleus sampling parameter. (Inherited from base settings.) |
voice | str | NOT_GIVEN | Voice ID for speech synthesis. |
endpointing_sensitivity | str | None | NOT_GIVEN | Controls how quickly Nova Sonic decides the user has stopped speaking. Values: "LOW", "MEDIUM", or "HIGH". Only supported with Nova 2 Sonic (default model). |
NOT_GIVEN values are omitted, letting the service use its own defaults (e.g.
"amazon.nova-2-sonic-v1:0" for model, "matthew" for voice, 0.7 for
temperature, 1024 for max_tokens). Only parameters that are explicitly set
are included.AudioConfig
Audio configuration passed via theaudio_config constructor argument.
| Parameter | Type | Default | Description |
|---|---|---|---|
input_sample_rate | int | 16000 | Audio input sample rate in Hz. |
input_sample_size | int | 16 | Audio input sample size in bits. |
input_channel_count | int | 1 | Number of input audio channels. |
output_sample_rate | int | 24000 | Audio output sample rate in Hz. |
output_sample_size | int | 16 | Audio output sample size in bits. |
output_channel_count | int | 1 | Number of output audio channels. |
SessionContinuationParams
Configuration for automatic session continuation, passed via thesession_continuation constructor argument. Nova Sonic sessions have an AWS-imposed time limit (~8 minutes). When enabled, session continuation proactively creates a new session in the background before the limit is reached, buffers user audio during the transition, and seamlessly hands off — preserving conversation context with no user-perceptible gap.
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | bool | True | Whether automatic session continuation is enabled. |
transition_threshold_seconds | float | 360.0 | How many seconds into a session to begin monitoring for a transition opportunity. The transition will occur when the assistant next starts speaking after this threshold. |
audio_buffer_duration_seconds | float | 3.0 | Duration of the rolling audio buffer (in seconds) that captures user audio during the transition window. This audio is replayed into the new session so no user input is lost. |
audio_start_timeout_seconds | float | 80.0 | Maximum time to wait for the assistant to start speaking after the threshold is reached. If no assistant audio arrives within this window, the transition is forced. Set to 0 to disable the timeout (wait indefinitely). |
Usage
Basic Setup
With Settings
With Function Calling
With Session Continuation
Notes
- Model versions: Nova 2 Sonic (
amazon.nova-2-sonic-v1:0) is the default and recommended model. The older Nova Sonic (amazon.nova-sonic-v1:0) has fewer features and requires an assistant response trigger mechanism. - Session continuation: Enabled by default to handle AWS’s ~8-minute session limit. The service automatically rotates sessions in the background with no user-perceptible interruption, preserving conversation context and buffering user audio during the transition. You can tune the threshold or disable it via
session_continuationparameter. - Endpointing sensitivity: Only supported with Nova 2 Sonic. Controls how quickly the model decides the user has stopped speaking —
"HIGH"causes the model to respond most quickly. - Transcription frames: User speech transcription frames are always emitted upstream. Assistant text transcripts are delivered in real-time using speculative text events, providing text synchronized with audio output for responsive client UIs.
- Connection resilience: If a connection error occurs while the service wants to stay connected, it automatically resets the conversation and reconnects.
- System instruction precedence: The
system_instructionfrom service settings takes precedence over an initial system message in the LLM context. A warning is logged when both are set. Tools provided in the LLM context take precedence over those provided at initialization time. - Audio format: Uses LPCM (Linear PCM) audio format for both input and output. Input defaults to 16kHz and output defaults to 24kHz.