Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments Paper • 2606.14397 • Published 3 days ago • 15 • 2
How Post-Training Shapes Biological Reasoning Models Paper • 2606.16517 • Published 13 days ago • 3 • 2
CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies Paper • 2606.16613 • Published 13 days ago • 7 • 2
GridVQA-X: A Framework for Evaluating Multimodal Explainability Methods Paper • 2606.14740 • Published 26 days ago • 3 • 2
RL-Index: Reinforcement Learning for Retrieval Index Reasoning Paper • 2606.16316 • Published 13 days ago • 5 • 2
Beyond NL2Code: A Structured Survey of Multimodal Code Intelligence Paper • 2606.15932 • Published 12 days ago • 37 • 2
CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks Paper • 2208.07652 • Published Aug 16, 2022 • 1
Ctrl-World: A Controllable Generative World Model for Robot Manipulation Paper • 2510.10125 • Published Oct 11, 2025 • 1 • 1
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis Paper • 2602.09379 • Published 17 days ago • 23 • 2
An Efficient Method for the Optimal Control of Microgrids Under Uncertainties using Local Reduction Paper • 2606.12345 • Published 18 days ago • 3
ForeAct: Steering Your VLA with Efficient Visual Foresight Planning Paper • 2602.12322 • Published Feb 12 • 2
FastMix: Fast Data Mixture Optimization via Gradient Descent Paper • 2606.14971 • Published 16 days ago • 3 • 2
DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Paper • 2606.12871 • Published 17 days ago • 14 • 2
Exploring the Design Space of Reward Backpropagation for Flow Matching Paper • 2606.11075 • Published 18 days ago • 10 • 2
Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining Paper • 2606.16246 • Published 9 days ago • 4 • 2
AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining Paper • 2505.23878 • Published 14 days ago • 1 • 2
Notes2Skills: From Lab Notebooks to Certainty-Aware Scientific Agent Skills Paper • 2606.11897 • Published 18 days ago • 11 • 2