Instructions to use GSAI-ML/iLLaDA-8B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use GSAI-ML/iLLaDA-8B-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="GSAI-ML/iLLaDA-8B-Base", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("GSAI-ML/iLLaDA-8B-Base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use GSAI-ML/iLLaDA-8B-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "GSAI-ML/iLLaDA-8B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/iLLaDA-8B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/GSAI-ML/iLLaDA-8B-Base

SGLang

How to use GSAI-ML/iLLaDA-8B-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "GSAI-ML/iLLaDA-8B-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/iLLaDA-8B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "GSAI-ML/iLLaDA-8B-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "GSAI-ML/iLLaDA-8B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use GSAI-ML/iLLaDA-8B-Base with Docker Model Runner:
```
docker model run hf.co/GSAI-ML/iLLaDA-8B-Base
```

iLLaDA-8B-Base

iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation. It was introduced in the paper Improved Large Language Diffusion Models.

Inference and evaluation codes: https://github.com/ML-GSAI/LLaDA.

Architecture

	iLLaDA 8B	LLaDA 8B
Layers	32	32
Model dimension	4096	4096
Attention heads	32	32
Key/Value heads	8	32
FFN dimension	14,336	12,288
Vocabulary size	155,136	126,464
Maximum sequence length	8192	4096
Embedding and LM-head	Tied	Untied
Total parameters	7.62B	8.02B
Non-embedding parameters	6.98B	6.98B

Benchmark Results of Base Models

	iLLaDA 8B	LLaDA 8B	Dream 7B	Qwen2.5 7B
Model	Diffusion	Diffusion	Diffusion	AR
Training tokens	12T	2.3T	18T + 0.6T	18T
MMLU	74.8	65.9	69.5	71.9
BBH	71.3	49.7	57.9	63.9
ARC-C	60.8	45.9	59.8	51.5
HellaSwag	76.6	70.5	73.3	79.0
GSM8K	81.9	70.3	77.2	78.9
MATH	38.4	31.4	39.6	41.1
HumanEval	50.0	35.4	57.9	56.7
MBPP	57.8	40.0	56.2	63.6
Average	63.9	51.1	61.4	63.3

How to use

You can load and use the model with transformers as follows:

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/iLLaDA-8B-Base', trust_remote_code=True)
model = AutoModel.from_pretrained('GSAI-ML/iLLaDA-8B-Base', trust_remote_code=True, torch_dtype=torch.bfloat16)

Refer to the GitHub repository for generation scripts such as generate.py.

Citation

@article{nie2026illada,
  title={Improved Large Language Diffusion Models},
  author={Nie, Shen and Min, Qiyang and Xu, Shaoxuan and Huang, Zihao and Song, Yuxuan and Shan, Yong and Lin, Yankai and Zhao, Wayne Xin and Li, Chongxuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2606.25331},
  year={2026}
}

Downloads last month: 201

Safetensors

Model size

8B params

Tensor type

BF16

Paper for GSAI-ML/iLLaDA-8B-Base

Improved Large Language Diffusion Models

Paper • 2606.25331 • Published 11 days ago • 43