arxiv:2603.20691

SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

Published on Mar 21

Upvote

Authors:

Jiarong Liang ,

Zijie Liu ,

Xiangchao Chen ,

Ping Nie ,

Abstract

SWE-Next is a framework that efficiently collects scalable software engineering tasks by executing commit pairs from real pull requests and reusing repository environments to reduce costs.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Executable software engineering data is valuable for training SWE agents, but scaling it remains difficult for two reasons: only a small fraction of real repository changes yield verifiable, high-signal task instances, and naively building repository-specific environments quickly becomes the dominant systems cost. We present SWE-Next, an execution-grounded framework for scalable SWE task and trajectory collection. On the data side, SWE-Next mines real merged pull requests, executes candidate base/merged commit pairs, and retains only those that produce strict test improvements without regressions, yielding self-verifying instances. It also applies strict submission gating so that collected trajectories remain evidence-driven rather than speculative. On the systems side, SWE-Next introduces reusable repo-quarter profiles, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only 30 hours and 639GB of environment storage, SWE-Next processes 3,971 seed repositories and 102,582 candidate commit pairs mined from real merged PRs to construct a dataset of 2,308 self-verifying instances. Experiments show that SWE-Next improves downstream pass@1 with fewer or comparable training trajectories, indicating that its gains come not from a stronger trajectory generator, but from higher-signal execution-grounded supervision and more efficient data collection.

View arXiv page View PDF Add to collection

Community

lllqaq

Paper author Apr 7

•

edited Apr 7

🚀 Introducing SWE-Next: a scalable, execution-grounded framework for building SWE training data from real merged PRs. SWE-Next processes 3,971 repositories and 102K commit pairs to construct 2,308 verified instances, and collecting the full dataset takes just 30 hours and 639 GB.

🧩 Key idea: repo-quarter profiles — reuse a single environment across temporally nearby commits, cutting storage from over 30 TB to just 639 GB.

📈 SFT results: with only 3K+ high-quality trajectories, our models reach 17.4% on SWE-Bench Verified at 7B and 30.0% at 14B.

Building executable SWE environments is expensive:
❌ One Docker image per commit = storage explodes at scale
❌ Most real PRs don't yield verifiable training signal (~74.5% don't improve tests)
❌ Leaky prompts + weak submission gating → low-quality trajectories

🧩 Repo-quarter profiles: Instead of building a new environment per commit, we map each commit to a (repo, quarter) profile — a shared, reusable Docker image for that repo's dependency regime in that time window.
The image caches system packages + a venv but never bakes in source code.
At runtime: mount the commit snapshot → copy-on-start → run tests in isolation.
One image. Many commits. No rebuilding.

Everything is open
🗂️ Paper: arxiv.org/abs/2603.20691
🤗 Dataset: huggingface.co/datasets/TIGER-Lab/SWE-Next
🚀 SFT Trajectories: huggingface.co/datasets/TIGER-Lab/SWE-Next-SFT-Trajectories
🤖 SWE-Next-7B: huggingface.co/TIGER-Lab/SWE-Next-7B
🤖 SWE-Next-14B: huggingface.co/TIGER-Lab/SWE-Next-14B
💻 Code: github.com/TIGER-AI-Lab/SWE-Next