Instructions to use zai-org/GLM-4.6V-Flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/GLM-4.6V-Flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zai-org/GLM-4.6V-Flash")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("zai-org/GLM-4.6V-Flash")
model = AutoModelForImageTextToText.from_pretrained("zai-org/GLM-4.6V-Flash")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zai-org/GLM-4.6V-Flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/GLM-4.6V-Flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.6V-Flash",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/GLM-4.6V-Flash

SGLang

How to use zai-org/GLM-4.6V-Flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/GLM-4.6V-Flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.6V-Flash",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/GLM-4.6V-Flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-4.6V-Flash",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use zai-org/GLM-4.6V-Flash with Docker Model Runner:
```
docker model run hf.co/zai-org/GLM-4.6V-Flash
```

extract reasoning_content error

#24

by zet-yd - opened Dec 29, 2025

Discussion

zet-yd

Dec 29, 2025

here is my start code by vllm:
docker run -d
--name vllm-custom
--gpus device=0
-p 8000:8000
-v /data/lab/Models/zai/GLM-4.6V-Flash:/models
vllm-glm46v-t5
/models
--host 0.0.0.0
--max-model-len 3840
--tool-call-parser glm45
--reasoning-parser glm45
--enable-auto-tool-choice
--served-model-name GLM-4.6V-Flash

when i call this port by:
predict_ret = client.chat.completions.create(
model='GLM-4.6V-Flash',
messages=messages,
extra_body={
"chat_template_kwargs": {
"enable_thinking": True
}
}
)

i cant get the predict_ret.choices[0].message.reasoning_content, all think result are put in the predict_ret.choices[0].message.content.

example:
prompt: what is bug ?
print(f"cot_Content: {predict_ret.choices[0].message.reasoning_content}\n")
print(f"Content: {predict_ret.choices[0].message.content}\n")
cot_Content: None
Content: 用户问的是“what is bug?”，也就是“什么是bug？”。

首先，我需要理解用户的需求。用户可能是在询问软件或程序中的bug的定义，也可能是在更广泛的意义上询问bug，比如生物学上的昆虫，或者一般意义上的错误。不过，考虑到上下文，用户可能是在编程或技术领域，所以主要应该解释软件或程序中的bug。
...

zet-yd

Dec 29, 2025

vllm version is 0.13.0 , transformers 5.0.0rc0

meganoob1337

Jan 6

I have the same problem, there seems to be some problem with the Reasoning Parser
When switching to vllm nightly everything is in reasoning_content but content is empty....
it seems the end thinking token is either not properly generated or maybe the chat template has some problem

Funnyly enough when running it through open-webui (there it uses the streaming endpoint) it properly splits thinking and content in the UI
but im not too deep into understanding how the chat template and reasoning parsers work...
any help would be appreciated

meganoob1337

Jan 6

for now im using it with streaming, and that seems to properly work, just if you want to test it, try it with streaming

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment