-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2503.09516
-
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Paper • 2404.10981 • Published • 1 -
Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation
Paper • 2505.10792 • Published -
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 29 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 731 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 104 -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3
3B • Updated • 51.2k -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3
3B • Updated • 145 • 1
-
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Paper • 2503.19470 • Published • 19 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
A Survey on Large Language Model Benchmarks
Paper • 2508.15361 • Published • 19 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 104
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 125 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Paper • 2404.10981 • Published • 1 -
Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation
Paper • 2505.10792 • Published -
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper • 2502.18418 • Published • 29 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 238 -
Tongyi DeepResearch Technical Report
Paper • 2510.24701 • Published • 104 -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3
3B • Updated • 51.2k -
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3
3B • Updated • 145 • 1
-
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Paper • 2503.19470 • Published • 19 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
A Survey on Large Language Model Benchmarks
Paper • 2508.15361 • Published • 19 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 104
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 731 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 40 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89