Paraformer-zh · GGUF (FunASR llama.cpp runtime)
GGUF build of FunASR's Paraformer-zh (SAN-M encoder + CIF predictor + SAN-M decoder, non-autoregressive) for the zero-Python, CPU/edge FunASR llama.cpp runtime — fast Mandarin ASR, ~21× real-time on CPU.
Get it running (no Python, no build)
These are GGUF weights for the FunASR llama.cpp runtime — a whisper.cpp-style, single self-contained binary for CPU / edge. Grab a prebuilt binary, then fetch this model and run:
- Prebuilt binaries (Linux / macOS / Windows) → GitHub Releases (tag
runtime-llamacpp-v*) - One-page quickstart & benchmarks → funasr.com/llama-cpp
bash download-funasr-model.sh paraformer ./gguf
llama-funasr-paraformer -m ./gguf/paraformer-q8.gguf --vad ./gguf/fsmn-vad.gguf -a audio.wav
Files
| file | size | notes |
|---|---|---|
paraformer-f16.gguf |
435 MB | recommended (f16 matmul weights) |
paraformer-q8.gguf |
~217 MB | recommended — half of f16, same accuracy |
paraformer.gguf |
863 MB | f32 reference |
Usage
The binary prints transcription text directly (no Python detok). --ids for raw ids.
llama-funasr-paraformer -m paraformer-f16.gguf -a audio.wav --vad fsmn-vad.gguf
On CPU (8 threads): 9.85 % CER on the 184-clip Mandarin benchmark (vs whisper.cpp 22–31 %).
Links
- 🧩 Runtime & build: FunASR · runtime/llama.cpp — ⭐ Star FunASR!
- Source model: funasr/paraformer-zh
- Downloads last month
- 332
Hardware compatibility
Log In to add your hardware
16-bit