LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis Paper • 2605.30434 • Published 12 days ago • 21
Exploring Autonomous Agentic Data Engineering for Model Specialization Paper • 2605.30407 • Published 12 days ago • 23
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning Paper • 2605.30260 • Published 12 days ago • 42
When Should Models Change Their Minds? Contextual Belief Management in Large Language Models Paper • 2605.30219 • Published 12 days ago • 24
Terminal-World: Scaling Terminal-Agent Environments via Agent Skills Paper • 2605.20876 • Published 20 days ago • 7
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 22 days ago • 50
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models Paper • 2605.06196 • Published May 7 • 9
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 166
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 5
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models Paper • 2604.08064 • Published Apr 9 • 8
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published Feb 4 • 22
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model Paper • 2602.07422 • Published Feb 7 • 22
MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models Paper • 2602.10934 • Published Feb 11 • 50
Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies Paper • 2601.12369 • Published Jan 18 • 4
Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments Paper • 2602.01244 • Published Feb 1 • 16
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper • 2602.08794 • Published Feb 9 • 159
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published Feb 8 • 24