Multi-Token Prediction: Mastering Algorithmic Reasoning with Enhanced Resource Use
Table of Links Abstract and 1. Introduction 2. Method 3. Experiments on real data 4. Ablations on synthetic data 5. Why does it work? Some speculation 6. Related work 7. Conclusion, Impact statement, Environmental impact, Acknowledgements and References A. Additional results on self-speculative decoding B. Alternative architectures C. Training speeds D. Finetuning E. Additional results … Read more