Papers
arxiv:2504.09184

Parameterized Synthetic Text Generation with SimpleStories

Published on Apr 12, 2025
Authors:
,
,
,
,

Abstract

We present SimpleStories, a large synthetic story dataset in simple language, consisting of 2 million stories each in English and Japanese. Our method employs parametrization of prompts with features at multiple levels of abstraction, allowing for systematic control over story characteristics to ensure broad syntactic and semantic diversity. Building on and addressing limitations in the TinyStories dataset, our approach demonstrates that simplicity and variety can be achieved simultaneously in synthetic text generation at scale.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2504.09184
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 3

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.