We introduce Flow-based Language Model (FLM) and Flow-map Language Models (FMLM), a method that enables one-step parallel text generation through continuous denoising, achieving an 8.3× speedup over existing approaches. FLM outperforms discrete diffusion in both quality and speed.
Language models based on discrete diffusion have attracted interest for their potential for faster generation than autoregressive models. In practice, however, they are challenged by a sharp degradation of sample quality in the few-step regime.
We show that language models leveraging flow-based continuous denoising can outperform discrete diffusion in both quality and speed. By revisiting the fundamentals of flows over discrete modalities, we build Flow-based Language Model (FLM) that performs Euclidean denoising on one-hot token representations. The model is trained to predict clean data via multi-token classification, leveraging a simple time reparameterization that greatly improves training and generation.
By distilling FLM into its associated flow map, we obtain Flow-map Language Model (FMLM) capable of few-step generation. On the LM1B and OWT language datasets, FLM attains generation quality outperforming state-of-the-art discrete diffusion models in many-step (512, 1024) regime. With FMLM, we outperform recent few-step language models across the board, achieving state-of-the-art performance in few-step (1, 2, 4, 8) regime, matching the 8-step quality of distilled discrete diffusion baselines in one-step. Our work calls into question the widely held hypothesis that discrete diffusion processes are necessary for generative modeling over discrete modalities, and paves the way toward accelerated flow-based language modeling at scale.

FLM applies the benefits of continuous image generation in discrete state spaces by encoding text as one-hot vectors and using flow matching to directly map noise to one-hot data. FLM uses the same multi-token classification objective as discrete diffusion models, but instead of updating the discrete state from one-token to another, FLM gradually denoises all tokens in parallel. This enables FLM to represent a superposition of sequences while also capturing correlations between tokens — a fundamental bottleneck (see below figure) with discrete diffusion models in the few-step regime.
We revisit the fundamentals of flows over discrete modalities and leverage a novel time-reparametrization to enable efficient training of FLM. We also introduce the methods to train a flow-map over language data. We use semigroup flow-map distillation to build an associated flow map (FMLM), enabling one-step language sequence generation, while discrete baselines catastrophically fails in the regime.
At 1024 sampling steps, FLM achieves the best generative perplexity among the diffusion baselines on both LM1B and OpenWebText.
| Model | LM1B | OpenWebText | ||
|---|---|---|---|---|
| Gen. PPL ↓ | Entropy | Gen. PPL ↓ | Entropy | |
| Dataset | — | 4.31 | — | 5.44 |
| RDLM | 268.21 | 4.33 | — | — |
| CANDI | 120.99 | 4.35 | 143.13 | 5.71 |
| MDLM | 109.21 | 4.32 | 121.09 | 5.65 |
| Duo | 98.14 | 4.31 | 77.69 | 5.55 |
| FLM (Ours) | 96.91 | 4.29 | 62.23 | 5.33 |
Even after distillation, discrete diffusion baselines often show either perplexity blow-ups or entropy collapse (highly repetitive text) in the few-step regime. FMLM remains stable throughout, with one-step generation matching the 8-step quality of distilled discrete diffusion baselines on LM1B and competitive at 4 steps on OpenWebText.
| Steps | Duo + DCD | Duo + Di4C | MDLM + SDTT | MDLM + Di4C | FMLM (Ours) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gen. PPL↓ | Ent. | Gen. PPL↓ | Ent. | Gen. PPL↓ | Ent. | Gen. PPL↓ | Ent. | Gen. PPL↓ | Ent. | |
| LM1B | ||||||||||
| 1 | 180.02 | 3.14 | 292.94 | 3.79 | 1429.48 | 4.31 | 1217.10 | 4.38 | 104.37 | 4.12 |
| 2 | 146.67 | 3.65 | 247.69 | 3.87 | 602.14 | 4.28 | 621.59 | 4.37 | 95.42 | 4.15 |
| 4 | 118.40 | 3.94 | 150.67 | 4.00 | 241.01 | 4.28 | 247.32 | 4.00 | 90.90 | 4.16 |
| OpenWebText | ||||||||||
| 1 | 47.13 | 2.80 | 97.77 | 3.36 | 1260.86 | 5.26 | 1298.80 | 5.29 | 129.32 | 4.53 |
| 2 | 96.59 | 3.77 | 165.81 | 4.65 | 877.22 | 5.34 | 758.23 | 5.35 | 134.26 | 5.07 |
| 4 | 108.21 | 4.82 | 150.67 | 4.81 | 339.73 | 5.38 | 239.27 | 5.40 | 76.37 | 5.05 |
Red values indicate degenerate entropy (<4.0) or generative perplexity (>500), signaling collapsed or incoherent generation.
Below are samples generated in a single forward pass on LM1B. FMLM is much more fluent, while discrete diffusion baselines either generate random text or show repetition of frequent tokens at one step.
Coming Soon
}