iLLaDA-8B-Base

iLLaDA is an 8B fully bidirectional masked diffusion language model trained from scratch with 12T pre-training tokens, an 8192-token context length, variable-length generation, and confidence-based scoring for multiple-choice evaluation. It was introduced in the paper Improved Large Language Diffusion Models.

Inference and evaluation codes: https://github.com/ML-GSAI/LLaDA.

Architecture

iLLaDA 8B LLaDA 8B
Layers 32 32
Model dimension 4096 4096
Attention heads 32 32
Key/Value heads 8 32
FFN dimension 14,336 12,288
Vocabulary size 155,136 126,464
Maximum sequence length 8192 4096
Embedding and LM-head Tied Untied
Total parameters 7.62B 8.02B
Non-embedding parameters 6.98B 6.98B

Benchmark Results of Base Models

iLLaDA 8B LLaDA 8B Dream 7B Qwen2.5 7B
Model Diffusion Diffusion Diffusion AR
Training tokens 12T 2.3T 18T + 0.6T 18T
MMLU 74.8 65.9 69.5 71.9
BBH 71.3 49.7 57.9 63.9
ARC-C 60.8 45.9 59.8 51.5
HellaSwag 76.6 70.5 73.3 79.0
GSM8K 81.9 70.3 77.2 78.9
MATH 38.4 31.4 39.6 41.1
HumanEval 50.0 35.4 57.9 56.7
MBPP 57.8 40.0 56.2 63.6
Average 63.9 51.1 61.4 63.3

How to use

You can load and use the model with transformers as follows:

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('GSAI-ML/iLLaDA-8B-Base', trust_remote_code=True)
model = AutoModel.from_pretrained('GSAI-ML/iLLaDA-8B-Base', trust_remote_code=True, torch_dtype=torch.bfloat16)

Refer to the GitHub repository for generation scripts such as generate.py.

Citation

@article{nie2026illada,
  title={Improved Large Language Diffusion Models},
  author={Nie, Shen and Min, Qiyang and Xu, Shaoxuan and Huang, Zihao and Song, Yuxuan and Shan, Yong and Lin, Yankai and Zhao, Wayne Xin and Li, Chongxuan and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2606.25331},
  year={2026}
}
Downloads last month
201
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for GSAI-ML/iLLaDA-8B-Base