xAI

Overview

XAISTTService provides real-time speech-to-text transcription using xAI’s WebSocket STT API with support for interim results, configurable endpointing, multichannel audio, and speaker diarization. The service streams raw audio (PCM, μ-law, or A-law) to xAI’s endpoint and emits interim and final transcription frames based on the server’s is_final and speech_final flags. The connection is persistent: audio is streamed continuously and the server automatically detects utterance boundaries.

xAI STT API Reference

Pipecat’s API methods for xAI STT integration

Example Implementation

Complete transcription example with xAI STT

Voice Agent Example

Full voice agent with xAI STT, LLM, and TTS

xAI Documentation

Official xAI voice API documentation

Installation

To use xAI STT services, install the required dependencies:

uv add "pipecat-ai[xai]"

Prerequisites

xAI Account Setup

Before using xAI STT services, you need:

xAI Account: Sign up at xAI
API Key: Generate an API key from your account dashboard
Language Selection: Choose from 16 supported languages

Required Environment Variables

XAI_API_KEY: Your xAI API key for authentication

Configuration

XAISTTService

api_key

str

required

xAI API key for authentication (used as Bearer token for the WebSocket handshake).

ws_url

str

default:"wss://api.x.ai/v1/stt"

WebSocket endpoint URL for xAI STT.

sample_rate

int

default:"16000"

Audio sample rate in Hz. Supported values: 8000, 16000, 22050, 24000, 44100, 48000.

encoding

str

default:"pcm"

Audio encoding format. One of "pcm" (signed 16-bit LE), "mulaw", or "alaw".

settings

XAISTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

ttfs_p99_latency

float

default:"XAI_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using XAISTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	Not applicable for xAI STT. (Inherited from base STT settings.)
`language`	`Language \| str`	`Language.EN`	Recognition language. Supports: AR, BN, DE, EN, ES, FR, HI, ID, IT, JA, KO, PT, RU, TR, VI, ZH. (Inherited from base STT settings.)
`interim_results`	`bool`	`True`	When True, partial transcripts are emitted approximately every 500ms.
`endpointing`	`int \| None`	`None`	Silence duration in milliseconds that triggers a speech-final event. Range 0-5000. Server default is 10ms.
`multichannel`	`bool \| None`	`None`	When True, transcribes each interleaved channel independently. Requires `channels` >= 2.
`channels`	`int \| None`	`None`	Number of interleaved channels (2-8). Required when `multichannel` is True.
`diarize`	`bool \| None`	`None`	When True, the server attaches a `speaker` field to each word identifying the detected speaker.

Usage

Basic Setup

import os
from pipecat.services.xai.stt import XAISTTService

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
)

With Custom Settings

import os
from pipecat.services.xai.stt import XAISTTService
from pipecat.transcriptions.language import Language

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
    sample_rate=24000,
    settings=XAISTTService.Settings(
        language=Language.ES,
        interim_results=True,
        endpointing=1000,
        diarize=True,
    ),
)

With Multichannel Audio

import os
from pipecat.services.xai.stt import XAISTTService

stt = XAISTTService(
    api_key=os.getenv("XAI_API_KEY"),
    settings=XAISTTService.Settings(
        multichannel=True,
        channels=2,
    ),
)

Notes

Connection management: The WebSocket connection is persistent and automatically reconnects if it drops mid-session. Audio is streamed continuously and the server emits transcript.partial events with is_final and speech_final flags to mark utterance boundaries.
Language support: xAI STT accepts two-letter language codes. When set, the server applies Inverse Text Normalization for improved accuracy.
Audio encoding: Supports PCM (signed 16-bit LE), μ-law, and A-law encoding formats. PCM is recommended for best quality.
Settings updates: Changing settings requires reconnecting to the WebSocket. The service automatically handles disconnect and reconnect when settings are updated via STTUpdateSettingsFrame.

Event Handlers

xAI STT supports the standard service connection events:

Event	Description
`on_connected`	Connected to xAI WebSocket
`on_disconnected`	Disconnected from xAI WebSocket

@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to xAI STT")

@stt.event_handler("on_disconnected")
async def on_disconnected(service):
    print("Disconnected from xAI STT")

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Overview

xAI STT API Reference

Example Implementation

Voice Agent Example

xAI Documentation

Installation

Prerequisites

xAI Account Setup

Required Environment Variables

Configuration

XAISTTService

Settings

Usage

Basic Setup

With Custom Settings

With Multichannel Audio

Notes

Event Handlers

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Documentation Index

​Overview

xAI STT API Reference

Example Implementation

Voice Agent Example

xAI Documentation

​Installation

​Prerequisites

​xAI Account Setup

​Required Environment Variables

​Configuration

​XAISTTService

​Settings

​Usage

​Basic Setup

​With Custom Settings

​With Multichannel Audio

​Notes

​Event Handlers

Overview

Installation

Prerequisites

xAI Account Setup

Required Environment Variables

Configuration

XAISTTService

Settings

Usage

Basic Setup

With Custom Settings

With Multichannel Audio

Notes

Event Handlers