Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-docs-pr-4386.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Soniox provides real-time text-to-speech synthesis using a WebSocket-based streaming API. SonioxTTSService streams text incrementally to the Soniox TTS endpoint and receives audio back as base64-encoded chunks. Multiple concurrent streams (up to 5) are multiplexed over a single WebSocket connection, making it efficient for interactive voice applications.

Soniox TTS API Reference

Pipecat’s API methods for Soniox TTS integration

Example Implementation

Complete example with Soniox STT and TTS

Soniox Documentation

Official Soniox TTS WebSocket API documentation

Supported Languages

Browse supported languages (60+)

Installation

To use Soniox TTS, install the required dependencies:
uv add "pipecat-ai[soniox]"

Prerequisites

Soniox Account Setup

Before using Soniox TTS, you need:
  1. Soniox Account: Sign up at Soniox Console
  2. API Key: Generate an API key from your console dashboard
  3. Voice Selection: Choose from available voices

Required Environment Variables

  • SONIOX_API_KEY: Your Soniox API key for authentication

Configuration

api_key
str
required
Soniox API key for authentication. Create API keys at Soniox Console.
url
str
default:"wss://tts-rt.soniox.com/tts-websocket"
WebSocket endpoint URL for Soniox TTS.
sample_rate
int
default:"None"
Output sample rate in Hz. Must be one of {8000, 16000, 24000, 44100, 48000} when using a raw PCM audio format. When None, inherits from the pipeline’s configured sample rate.
audio_format
str
default:"pcm_s16le"
Output audio format. Defaults to "pcm_s16le", which matches Pipecat’s downstream audio pipeline.
text_aggregation_mode
TextAggregationMode
default:"TextAggregationMode.SENTENCE"
Controls how incoming text is aggregated before synthesis. SENTENCE (default) buffers text until sentence boundaries, producing more natural speech. TOKEN streams tokens directly for lower latency. Import from pipecat.services.tts_service.
settings
SonioxTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using SonioxTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrtts-rt-v1TTS model identifier. (Inherited from base settings.)
voicestrAdrianVoice identifier. (Inherited from base settings.)
languageLanguage | strLanguage.ENLanguage for synthesis. (Inherited from base settings.) See supported languages.

Usage

Basic Setup

import os
from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        voice="Maya",
    ),
)

With Custom Voice and Model

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        model="tts-rt-v1",
        voice="Adrian",
        language="en",
    ),
)

With Custom Sample Rate

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    sample_rate=16000,
    settings=SonioxTTSService.Settings(
        voice="Maya",
    ),
)

Notes

  • WebSocket streaming: Soniox uses a persistent WebSocket connection for streaming text-in and audio-out, enabling low-latency real-time synthesis.
  • Concurrent streams: The service supports up to 5 concurrent streams multiplexed over a single WebSocket connection via Pipecat’s audio-context mechanism.
  • Sample rates: When using raw PCM audio formats, the sample rate must be one of {8000, 16000, 24000, 44100, 48000}.
  • Keepalive: The service automatically sends keepalive messages every 20 seconds to prevent Soniox’s idle timeout (20-30s).
  • Text aggregation: Sentence aggregation is enabled by default (text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Set text_aggregation_mode=TextAggregationMode.TOKEN to stream tokens directly for lower latency.
  • Language support: Soniox supports 60+ languages. See the language documentation for the complete list.

Event Handlers

Soniox TTS supports the standard service connection events:
EventDescription
on_connectedConnected to Soniox WebSocket
on_disconnectedDisconnected from Soniox WebSocket
on_connection_errorWebSocket connection error occurred
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Soniox TTS")