Cosmos 3: Omnimodal World Models for Physical AI Paper โข 2606.02800 โข Published 17 days ago โข 121
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens Paper โข 2503.01710 โข Published Mar 3, 2025 โข 6
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper โข 2502.02492 โข Published Feb 4, 2025 โข 66