Playbooks Running on CPU Upgrade Featured 3.21k The Smol Training Playbook 📚 3.21k The secrets to building world-class LLMs Running 3.89k The Ultra-Scale Playbook 🌌 3.89k The ultimate guide to training LLM on large GPU Clusters Running 328 Evaluation Guidebook 📝 328 Explore LLM benchmark scores over time
Running on CPU Upgrade Featured 3.21k The Smol Training Playbook 📚 3.21k The secrets to building world-class LLMs
Running 3.89k The Ultra-Scale Playbook 🌌 3.89k The ultimate guide to training LLM on large GPU Clusters
LLM Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 125 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 29 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 23 Language Models are Few-Shot Learners Paper • 2005.14165 • Published May 28, 2020 • 20
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 29
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 23
Playbooks Running on CPU Upgrade Featured 3.21k The Smol Training Playbook 📚 3.21k The secrets to building world-class LLMs Running 3.89k The Ultra-Scale Playbook 🌌 3.89k The ultimate guide to training LLM on large GPU Clusters Running 328 Evaluation Guidebook 📝 328 Explore LLM benchmark scores over time
Running on CPU Upgrade Featured 3.21k The Smol Training Playbook 📚 3.21k The secrets to building world-class LLMs
Running 3.89k The Ultra-Scale Playbook 🌌 3.89k The ultimate guide to training LLM on large GPU Clusters
LLM Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 125 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 29 DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 23 Language Models are Few-Shot Learners Paper • 2005.14165 • Published May 28, 2020 • 20
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 29
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Paper • 1910.01108 • Published Oct 2, 2019 • 23