Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings Paper • 2606.07502 • Published 7 days ago • 87
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 7 days ago • 108
NITP: Next Implicit Token Prediction for LLM Pre-training Paper • 2605.24956 • Published 19 days ago • 35
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published 15 days ago • 34
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models nvidia • Jul 18, 2025 • 51
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published Apr 9 • 102
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published Mar 12 • 23
Qwen3.5 Collection Qwen3.5 is Qwen's new model family including Qwen3.5 Small: 0.8B, 2B, 4B, 9B and Qwen3.5 Medium: 35B-A3B, 27B, 122B-A10B and 397B-A17B. • 25 items • Updated 7 days ago • 158
Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device Paper • 2602.20161 • Published Feb 23 • 23
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models Paper • 2602.18993 • Published Feb 22 • 4
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 12 • 83
timm DINOv3 Collection Meta AI's DINOv3 weights in timm. ViTs with `qkvb` have a zero QV bias present, otherwise bias is disabled. QKV bias are all 0 in original weights. • 18 items • Updated Sep 19, 2025 • 37
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 80
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space Paper • 2602.02092 • Published Feb 2 • 18