Skip to main content

Documentation Index

Fetch the complete documentation index at: https://daily-docs-pr-4386.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

MistralSTTService provides real-time speech recognition using Mistral’s Voxtral Realtime API. It uses the Mistral SDK’s RealtimeConnection to stream audio and receive transcription events over WebSocket. Key features include:
  • Streaming transcription with interim results
  • Automatic language detection
  • VAD-driven utterance lifecycle management
  • Built-in metrics support

Mistral STT API Reference

Pipecat’s API methods for Mistral STT

Example Implementation

Complete example with Mistral STT and TTS

Transcription Example

Transcription-only example

Mistral Documentation

Official Mistral API documentation

Installation

To use Mistral STT service, install the required dependencies:
uv add "pipecat-ai[mistral]"

Prerequisites

Before using MistralSTTService, you need:
  1. Mistral Account: Sign up at Mistral AI
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure you have access to the Voxtral Realtime API

Required Environment Variables

  • MISTRAL_API_KEY: Your Mistral API key for authentication

Configuration

api_key
str
Mistral API key for authentication.
base_url
str
default:"None"
Custom API endpoint URL. Leave empty for the default Mistral endpoint.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
target_streaming_delay_ms
int
default:"None"
Streaming delay for accuracy/latency tradeoff. Higher values may improve accuracy at the cost of latency.
ttfs_p99_latency
float
default:"MISTRAL_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.
settings
MistralSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using MistralSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr"voxtral-mini-transcribe-realtime-2602"Mistral STT model to use. (Inherited from base STT settings.)
languageLanguage | strNoneLanguage hint for transcription. (Inherited from base STT settings.)

Usage

Basic Setup

import os
from pipecat.services.mistral import MistralSTTService

stt = MistralSTTService(
    api_key=os.getenv("MISTRAL_API_KEY"),
)

With Custom Settings

import os
from pipecat.services.mistral import MistralSTTService

stt = MistralSTTService(
    api_key=os.getenv("MISTRAL_API_KEY"),
    target_streaming_delay_ms=1000,
    settings=MistralSTTService.Settings(
        model="voxtral-mini-transcribe-realtime-2602",
        language="en",
    ),
)

Notes

  • SDK-managed WebSocket: The service extends STTService directly (rather than WebsocketSTTService) because the Mistral SDK manages the WebSocket connection internally.
  • Language detection: When language is not specified in settings, the service automatically detects the spoken language and includes it in the transcription frames.
  • VAD integration: The service works with VAD (Voice Activity Detection) to manage utterance lifecycle. When the user starts speaking, it begins accumulating interim transcripts. When the user stops, it flushes remaining audio for final transcription.

Event Handlers

Supports the standard service connection events:
EventDescription
on_connectedTranscription session created
on_disconnectedConnection closed
on_connection_errorTranscription error occurred
@stt.event_handler("on_connected")
async def on_connected(stt):
    print("Mistral STT connected")

@stt.event_handler("on_connection_error")
async def on_connection_error(stt, error_msg):
    print(f"Connection error: {error_msg}")