Voxtral Realtime 4B Pure C Implementation

GitHub - antirez/voxtral.c: Pure C inference of Mistral Voxtral Realtime 4B speech to text model

Pure C inference of Mistral Voxtral Realtime 4B speech to text model - antirez/voxtral.c

Speech to text model inference in pure C.
This is a C implementation of the inference pipeline for the Mistral AI's Voxtral Realtime 4B model. It has zero external dependencies beyond the C standard library. The MPS inference is decently fast, while the BLAS acceleration is usable but slow (it continuously convert the bf16 weights to fp32).
Audio processing uses a chunked encoder with overlapping windows, bounding memory usage regardless of input length. Audio can also be piped from stdin (--stdin), or captured live from the microphone (--from-mic, macOS), making it easy to transcode and transcribe any format via ffmpeg. A streaming C API (vox_stream_t) lets you feed audio incrementally and receive token strings as they become available.
Similar projects: Whisper.cpp