Papers
arxiv:2603.20691

SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

Published on Mar 21
Authors:
,
,

Abstract

SWE-Next is a framework that efficiently collects scalable software engineering tasks by executing commit pairs from real pull requests and reusing repository environments to reduce costs.

Executable software engineering data is valuable for training SWE agents, but scaling it remains difficult for two reasons: only a small fraction of real repository changes yield verifiable, high-signal task instances, and naively building repository-specific environments quickly becomes the dominant systems cost. We present SWE-Next, an execution-grounded framework for scalable SWE task and trajectory collection. On the data side, SWE-Next mines real merged pull requests, executes candidate base/merged commit pairs, and retains only those that produce strict test improvements without regressions, yielding self-verifying instances. It also applies strict submission gating so that collected trajectories remain evidence-driven rather than speculative. On the systems side, SWE-Next introduces reusable repo-quarter profiles, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only 30 hours and 639GB of environment storage, SWE-Next processes 3,971 seed repositories and 102,582 candidate commit pairs mined from real merged PRs to construct a dataset of 2,308 self-verifying instances. Experiments show that SWE-Next improves downstream pass@1 with fewer or comparable training trajectories, indicating that its gains come not from a stronger trajectory generator, but from higher-signal execution-grounded supervision and more efficient data collection.

Community

Paper author
โ€ข
edited Apr 7

๐Ÿš€ Introducing SWE-Next: a scalable, execution-grounded framework for building SWE training data from real merged PRs. SWE-Next processes 3,971 repositories and 102K commit pairs to construct 2,308 verified instances, and collecting the full dataset takes just 30 hours and 639 GB.

๐Ÿงฉ Key idea: repo-quarter profiles โ€” reuse a single environment across temporally nearby commits, cutting storage from over 30 TB to just 639 GB.

๐Ÿ“ˆ SFT results: with only 3K+ high-quality trajectories, our models reach 17.4% on SWE-Bench Verified at 7B and 30.0% at 14B.

Building executable SWE environments is expensive:
โŒ One Docker image per commit = storage explodes at scale
โŒ Most real PRs don't yield verifiable training signal (~74.5% don't improve tests)
โŒ Leaky prompts + weak submission gating โ†’ low-quality trajectories

๐Ÿงฉ Repo-quarter profiles: Instead of building a new environment per commit, we map each commit to a (repo, quarter) profile โ€” a shared, reusable Docker image for that repo's dependency regime in that time window.
The image caches system packages + a venv but never bakes in source code.
At runtime: mount the commit snapshot โ†’ copy-on-start โ†’ run tests in isolation.
One image. Many commits. No rebuilding.

Everything is open
๐Ÿ—‚๏ธ Paper: arxiv.org/abs/2603.20691
๐Ÿค— Dataset: huggingface.co/datasets/TIGER-Lab/SWE-Next
๐Ÿš€ SFT Trajectories: huggingface.co/datasets/TIGER-Lab/SWE-Next-SFT-Trajectories
๐Ÿค– SWE-Next-7B: huggingface.co/TIGER-Lab/SWE-Next-7B
๐Ÿค– SWE-Next-14B: huggingface.co/TIGER-Lab/SWE-Next-14B
๐Ÿ’ป Code: github.com/TIGER-AI-Lab/SWE-Next

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.20691
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 2

Datasets citing this paper 5

Browse 5 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.20691 in a Space README.md to link it from this page.

Collections including this paper 1