Documentation Index
Fetch the complete documentation index at: https://daily-docs-pr-4386.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
XAISTTService provides real-time speech-to-text transcription using xAI’s WebSocket STT API with support for interim results, configurable endpointing, multichannel audio, and speaker diarization.
The service streams raw audio (PCM, μ-law, or A-law) to xAI’s endpoint and emits interim and final transcription frames based on the server’s is_final and speech_final flags. The connection is persistent: audio is streamed continuously and the server automatically detects utterance boundaries.
xAI STT API Reference
Pipecat’s API methods for xAI STT integration
Example Implementation
Complete transcription example with xAI STT
Voice Agent Example
Full voice agent with xAI STT, LLM, and TTS
xAI Documentation
Official xAI voice API documentation
Installation
To use xAI STT services, install the required dependencies:Prerequisites
xAI Account Setup
Before using xAI STT services, you need:- xAI Account: Sign up at xAI
- API Key: Generate an API key from your account dashboard
- Language Selection: Choose from 16 supported languages
Required Environment Variables
XAI_API_KEY: Your xAI API key for authentication
Configuration
XAISTTService
xAI API key for authentication (used as Bearer token for the WebSocket
handshake).
WebSocket endpoint URL for xAI STT.
Audio sample rate in Hz. Supported values: 8000, 16000, 22050, 24000, 44100,
48000.
Audio encoding format. One of
"pcm" (signed 16-bit LE), "mulaw", or
"alaw".Runtime-configurable settings for the STT service. See Settings
below.
P99 latency from speech end to final transcript in seconds. Override for your
deployment. See
stt-benchmark.
Settings
Runtime-configurable settings passed via thesettings constructor argument using XAISTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | None | Not applicable for xAI STT. (Inherited from base STT settings.) |
language | Language | str | Language.EN | Recognition language. Supports: AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, PT, RU, TR, VI, ZH. (Inherited from base STT settings.) |
interim_results | bool | True | When True, partial transcripts are emitted approximately every 500ms. |
endpointing | int | None | None | Silence duration in milliseconds that triggers a speech-final event. Range 0-5000. Server default is 10ms. |
multichannel | bool | None | None | When True, transcribes each interleaved channel independently. Requires channels >= 2. |
channels | int | None | None | Number of interleaved channels (2-8). Required when multichannel is True. |
diarize | bool | None | None | When True, the server attaches a speaker field to each word identifying the detected speaker. |
Usage
Basic Setup
With Custom Settings
With Multichannel Audio
Notes
- Connection management: The WebSocket connection is persistent and automatically reconnects if it drops mid-session. Audio is streamed continuously and the server emits
transcript.partialevents withis_finalandspeech_finalflags to mark utterance boundaries. - Language support: xAI STT accepts two-letter language codes. When set, the server applies Inverse Text Normalization for improved accuracy.
- Audio encoding: Supports PCM (signed 16-bit LE), μ-law, and A-law encoding formats. PCM is recommended for best quality.
- Settings updates: Changing settings requires reconnecting to the WebSocket. The service automatically handles disconnect and reconnect when settings are updated via
STTUpdateSettingsFrame.
Event Handlers
xAI STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to xAI WebSocket |
on_disconnected | Disconnected from xAI WebSocket |