Skip to content

Sample Page

Unleashing LLM Training Efficiency: Multi-Token Prediction’s Near-Zero Overhead

July 22, 2025 by kamal

Table of Links

Abstract and 1. Introduction

3. Experiments on real data

4. Ablations on synthetic data

5. Why does it work? Some speculation

6. Related work

7. Conclusion, Impact statement, Environmental impact, Acknowledgements and References

A. Additional results on self-speculative decoding

B. Alternative architectures

C. Training speeds

E. Additional results on model scaling behavior

F. Details on CodeContests finetuning

G. Additional results on natural language benchmarks

H. Additional results on abstractive text summarization

I. Additional results on mathematical reasoning in natural language

J. Additional results on induction learning

K. Additional results on algorithmic reasoning

L. Additional intuitions on multi-token prediction

M. Training hyperparameters

C. Training speeds

:::info
Authors:

(1) Fabian Gloeckle, FAIR at Meta, CERMICS Ecole des Ponts ParisTech and Equal contribution;

(2) Badr Youbi Idrissi, FAIR at Meta, LISN Université Paris-Saclayand and Equal contribution;

(3) Baptiste Rozière, FAIR at Meta;

(4) David Lopez-Paz, FAIR at Meta and a last author;

(5) Gabriel Synnaeve, FAIR at Meta and a last author.

:::

:::info
This paper is available on arxiv under CC BY 4.0 DEED license.

:::

Categories Hackernoon, Tech

AI comes up with bizarre physics experiments, but they work

Unveiling Nuances: Multi-Token Prediction’s Impact on Llama 2 Finetuning

Leave a Comment Cancel reply

Comment

Name Email Website

Save my name, email, and website in this browser for the next time I comment.

Δ

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Search

Categories

AWSML
CoinDesk
CoinTelegraph
Crypto
Decrypt
Hackernoon
HN
Machine Learning
QuantaMagazine
Tech
TechCrunch
TheVerge
Uncategorized

Archives

© 2026 Kamal Reader • Built with GeneratePress