Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-docs-pr-4386.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

XAISTTService provides real-time speech-to-text transcription using xAI’s WebSocket STT API with support for interim results, configurable endpointing, multichannel audio, and speaker diarization. The service streams raw audio (PCM, μ-law, or A-law) to xAI’s endpoint and emits interim and final transcription frames based on the server’s is_final and speech_final flags. The connection is persistent: audio is streamed continuously and the server automatically detects utterance boundaries.

xAI STT API Reference

Pipecat’s API methods for xAI STT integration

Example Implementation

Complete transcription example with xAI STT

Voice Agent Example

Full voice agent with xAI STT, LLM, and TTS

xAI Documentation

Official xAI voice API documentation

Installation

To use xAI STT services, install the required dependencies:
uv add "pipecat-ai[xai]"

Prerequisites

xAI Account Setup

Before using xAI STT services, you need:
  1. xAI Account: Sign up at xAI
  2. API Key: Generate an API key from your account dashboard
  3. Language Selection: Choose from 16 supported languages

Required Environment Variables

  • XAI_API_KEY: Your xAI API key for authentication

Configuration

XAISTTService

api_key
str
required
xAI API key for authentication (used as Bearer token for the WebSocket handshake).
ws_url
str
default:"wss://api.x.ai/v1/stt"
WebSocket endpoint URL for xAI STT.
sample_rate
int
default:"16000"
Audio sample rate in Hz. Supported values: 8000, 16000, 22050, 24000, 44100, 48000.
encoding
str
default:"pcm"
Audio encoding format. One of "pcm" (signed 16-bit LE), "mulaw", or "alaw".
settings
XAISTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
ttfs_p99_latency
float
default:"XAI_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using XAISTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneNot applicable for xAI STT. (Inherited from base STT settings.)
languageLanguage | strLanguage.ENRecognition language. Supports: AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, PT, RU, TR, VI, ZH. (Inherited from base STT settings.)
interim_resultsboolTrueWhen True, partial transcripts are emitted approximately every 500ms.
endpointingint | NoneNoneSilence duration in milliseconds that triggers a speech-final event. Range 0-5000. Server default is 10ms.
multichannelbool | NoneNoneWhen True, transcribes each interleaved channel independently. Requires channels >= 2.
channelsint | NoneNoneNumber of interleaved channels (2-8). Required when multichannel is True.
diarizebool | NoneNoneWhen True, the server attaches a speaker field to each word identifying the detected speaker.

Usage

Basic Setup

import os
from pipecat.services.xai.stt import XAISTTService

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
)

With Custom Settings

import os
from pipecat.services.xai.stt import XAISTTService
from pipecat.transcriptions.language import Language

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
    sample_rate=24000,
    settings=XAISTTService.Settings(
        language=Language.ES,
        interim_results=True,
        endpointing=1000,
        diarize=True,
    ),
)

With Multichannel Audio

import os
from pipecat.services.xai.stt import XAISTTService

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
    settings=XAISTTService.Settings(
        multichannel=True,
        channels=2,
    ),
)

Notes

  • Connection management: The WebSocket connection is persistent and automatically reconnects if it drops mid-session. Audio is streamed continuously and the server emits transcript.partial events with is_final and speech_final flags to mark utterance boundaries.
  • Language support: xAI STT accepts two-letter language codes. When set, the server applies Inverse Text Normalization for improved accuracy.
  • Audio encoding: Supports PCM (signed 16-bit LE), μ-law, and A-law encoding formats. PCM is recommended for best quality.
  • Settings updates: Changing settings requires reconnecting to the WebSocket. The service automatically handles disconnect and reconnect when settings are updated via STTUpdateSettingsFrame.

Event Handlers

xAI STT supports the standard service connection events:
EventDescription
on_connectedConnected to xAI WebSocket
on_disconnectedDisconnected from xAI WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to xAI STT")

@stt.event_handler("on_disconnected")
async def on_disconnected(service):
    print("Disconnected from xAI STT")