Paraformer-zh · GGUF (FunASR llama.cpp runtime)

GGUF build of FunASR's Paraformer-zh (SAN-M encoder + CIF predictor + SAN-M decoder, non-autoregressive) for the zero-Python, CPU/edge FunASR llama.cpp runtime — fast Mandarin ASR, ~21× real-time on CPU.

Get it running (no Python, no build)

These are GGUF weights for the FunASR llama.cpp runtime — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:

Prebuilt binaries (Linux / macOS / Windows) → GitHub Releases (tag runtime-llamacpp-v*)
One-page quickstart & benchmarks → funasr.com/llama-cpp

bash download-funasr-model.sh paraformer ./gguf
llama-funasr-paraformer -m ./gguf/paraformer-q8.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav

Files

file	size	notes
`paraformer-f16.gguf`	435 MB	recommended (f16 matmul weights)
`paraformer-q8.gguf`	~217 MB	recommended — half of f16, same accuracy
`paraformer.gguf`	863 MB	f32 reference

Usage

The binary prints transcription text directly (no Python detok). --ids for raw ids.

llama-funasr-paraformer -m paraformer-f16.gguf -a audio.wav --vad fsmn-vad.gguf

On CPU (8 threads): 9.85 % CER on the 184-clip Mandarin benchmark (vs whisper.cpp 22–31 %).

FunAudioLLM
/

Paraformer-GGUF

Paraformer-zh · GGUF (FunASR llama.cpp runtime)

Get it running (no Python, no build)

Files

Usage

Links