PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models Paper • 2606.19534 • Published 14 days ago • 64
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 8 days ago • 140
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams Paper • 2606.21337 • Published 12 days ago • 73
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams Paper • 2606.21337 • Published 12 days ago • 73
DataClaw0: Agentic Tailoring Multimodal Data from Raw Streams Paper • 2606.21337 • Published 12 days ago • 73
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs Paper • 2605.10094 • Published May 12
ReMoT: Reinforcement Learning with Motion Contrast Triplets Paper • 2603.00461 • Published Mar 20 • 1
Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models Paper • 2408.10571 • Published Oct 10, 2024
Trajectory-Diversity-Driven Robust Vision-and-Language Navigation Paper • 2603.15370 • Published Mar 16
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation Paper • 2507.02691 • Published Jul 3, 2025
ProSR: Process-Shaped Spatial Reasoning for Reliable Chain-of-Thought in VLMs Paper • 2605.25524 • Published May 25
ReMoT: Reinforcement Learning with Motion Contrast Triplets Paper • 2603.00461 • Published Mar 20 • 1
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published May 12 • 194
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper • 2604.19636 • Published Apr 21 • 88
Runtime error Agents 9 AttentiveEraser - Object Remover 🚀 9 Unleashing Diffusion Model’s Object Removal Potential