Sarvam

Overview

SarvamSTTService provides real-time speech recognition using Sarvam AI’s WebSocket API, supporting Indian language transcription with Voice Activity Detection (VAD) and multiple audio formats for high-accuracy speech recognition.

Sarvam STT API Reference

Pipecat’s API methods for Sarvam STT integration

Example Implementation

Complete example with interruption handling

Sarvam Documentation

Official Sarvam AI STT documentation and features

Sarvam AI Platform

Access API keys and speech models

Installation

To use Sarvam services, install the required dependency:

uv add "pipecat-ai[sarvam]"

Prerequisites

Sarvam AI Account Setup

Before using Sarvam STT services, you need:

Sarvam AI Account: Sign up at Sarvam AI
API Key: Generate an API key from your account dashboard
Model Access: Access to Saarika (STT) or Saaras (STT-Translate) models, including the saaras:v3 model with support for multiple modes (transcribe, translate, verbatim, translit, codemix)

Required Environment Variables

SARVAM_API_KEY: Your Sarvam AI API key for authentication

Configuration

SarvamSTTService

api_key

str

required

Sarvam API key for authentication.

model

str

default:"saaras:v3"

deprecated

Sarvam model to use. Allowed values: "saarika:v2.5" (standard STT), "saaras:v2.5" (STT-Translate, auto-detects language), "saaras:v3" (advanced, supports mode and fine-grained VAD). Deprecated in v0.0.105. Use settings=SarvamSTTService.Settings(...) instead.

sample_rate

int

default:"None"

Audio sample rate in Hz. Defaults to 16000 if not specified.

mode

Literal['transcribe', 'translate', 'verbatim', 'translit', 'codemix']

default:"None"

Mode of operation. Only applicable to models that support it (e.g., saaras:v3). Defaults to the model’s default mode.

input_audio_codec

str

default:"wav"

Audio codec/format of the input file.

params

SarvamSTTService.InputParams

default:"None"

deprecated

Configuration parameters for Sarvam STT service. Deprecated in v0.0.105. Use settings=SarvamSTTService.Settings(...) instead.

settings

SarvamSTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

keepalive_timeout

float

default:"None"

Seconds of no audio before sending silence to keep the connection alive. None disables keepalive.

ttfs_p99_latency

float

default:"SARVAM_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

keepalive_interval

float

default:"5.0"

Seconds between idle checks when keepalive is enabled.

Settings

Runtime-configurable settings passed via the settings constructor argument using SarvamSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	STT model identifier. (Inherited from base STT settings.)
`language`	`Language \| str`	`None`	Target language for transcription. (Inherited from base STT settings.) Behavior varies by model: `saarika:v2.5` defaults to “unknown” (auto-detect), `saaras:v2.5` ignores this (auto-detects), `saaras:v3` defaults to “en-IN”.
`prompt`	`str`	`None`	Optional prompt to guide transcription/translation style. Only applicable to `saaras:v2.5`.
`vad_signals`	`bool`	`None`	Enable VAD signals in responses. When enabled, the service broadcasts `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` from the server.
`high_vad_sensitivity`	`bool`	`None`	Enable high VAD sensitivity for more responsive speech detection.
`positive_speech_threshold`	`float`	`None`	VAD probability threshold (0.0-1.0) above which a frame is considered speech. Only for `saaras:v3`.
`negative_speech_threshold`	`float`	`None`	VAD probability threshold (0.0-1.0) below which a frame is considered silence. Only for `saaras:v3`.
`min_speech_frames`	`int`	`None`	Minimum consecutive speech frames to start a speech segment. Only for `saaras:v3`.
`first_turn_min_speech_frames`	`int`	`None`	Minimum speech frames for the first user turn. Only for `saaras:v3`.
`negative_frames_count`	`int`	`None`	Number of silence frames within the window to end a speech segment. Only for `saaras:v3`.
`negative_frames_window`	`int`	`None`	Sliding window size (in frames) for counting negative frames. Only for `saaras:v3`.
`start_speech_volume_threshold`	`float`	`None`	Volume level (dB) below which audio is too quiet to be speech. Only for `saaras:v3`.
`interrupt_min_speech_frames`	`int`	`None`	Minimum speech frames to register a barge-in/interruption. Only for `saaras:v3`.
`pre_speech_pad_frames`	`int`	`None`	Number of audio frames to prepend before detected speech onset. Only for `saaras:v3`.
`num_initial_ignored_frames`	`int`	`None`	Number of leading audio frames to skip at connection start. Only for `saaras:v3`.

Usage

Basic Setup

from pipecat.services.sarvam.stt import SarvamSTTService

stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
)

With Language and Model Configuration

from pipecat.services.sarvam.stt import SarvamSTTService
from pipecat.transcriptions.language import Language

stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
    mode="transcribe",
    settings=SarvamSTTService.Settings(
        model="saaras:v3",
        language=Language.HI_IN,
        prompt="Transcribe Hindi conversation about technology.",
    ),
)

With Server-Side VAD

from pipecat.services.sarvam.stt import SarvamSTTService

stt = SarvamSTTService(
    api_key=os.getenv("SARVAM_API_KEY"),
    settings=SarvamSTTService.Settings(
        vad_signals=True,
        high_vad_sensitivity=True,
    ),
)

Notes

Default model changed: As of this update, the default model is saaras:v3 (previously saarika:v2.5). Applications that relied on the previous default should set settings=SarvamSTTService.Settings(model="saarika:v2.5") explicitly.
Supported languages: Bengali (bn-IN), Gujarati (gu-IN), Hindi (hi-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Tamil (ta-IN), Telugu (te-IN), Punjabi (pa-IN), Odia (od-IN), English (en-IN), and Assamese (as-IN).
Model-specific parameter validation: The service validates that parameters are compatible with the selected model. For example, prompt is only supported with saaras:v2.5, language is not supported with saaras:v2.5 (which auto-detects language), and the fine-grained VAD parameters are only supported with saaras:v3.
Fine-grained VAD tuning (saaras:v3 only): The saaras:v3 model supports server-side VAD with 10 tuning parameters for speech detection thresholds, frame-count controls, pre-speech padding, interruption sensitivity, and initial-frame skipping. These parameters are only available with the saaras:v3 model.
VAD modes: When vad_signals=False (default), the service relies on Pipecat’s local VAD and flushes the server buffer on VADUserStoppedSpeakingFrame. When vad_signals=True, the service uses Sarvam’s server-side VAD and broadcasts speaking frames from the server.

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

In addition to the standard service connection events (on_connected, on_disconnected, on_connection_error), Sarvam STT provides:

Event	Description
`on_speech_started`	Speech detected in the audio stream
`on_speech_stopped`	Speech stopped
`on_utterance_end`	End of utterance detected

@stt.event_handler("on_speech_started")
async def on_speech_started(service):
    print("User started speaking")

@stt.event_handler("on_utterance_end")
async def on_utterance_end(service):
    print("Utterance ended")

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Overview

Sarvam STT API Reference

Example Implementation

Sarvam Documentation

Sarvam AI Platform

Installation

Prerequisites

Sarvam AI Account Setup

Required Environment Variables

Configuration

SarvamSTTService

Settings

Usage

Basic Setup

With Language and Model Configuration

With Server-Side VAD

Notes

Event Handlers

Pipecat Server

Pipecat Subagents

Client SDKs

Pipecat Flows

Pipecat Cloud

CLI

Documentation Index

​Overview

Sarvam STT API Reference

Example Implementation

Sarvam Documentation

Sarvam AI Platform

​Installation

​Prerequisites

​Sarvam AI Account Setup

​Required Environment Variables

​Configuration

​SarvamSTTService

​Settings

​Usage

​Basic Setup

​With Language and Model Configuration

​With Server-Side VAD

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Sarvam AI Account Setup

Required Environment Variables

Configuration

SarvamSTTService

Settings

Usage

Basic Setup

With Language and Model Configuration

With Server-Side VAD

Notes

Event Handlers