Exploring Alternative Architectures for Multi-Token LLM Prediction

Table of Links Abstract and 1. Introduction 2. Method 3. Experiments on real data 4. Ablations on synthetic data 5. Why does it work? Some speculation 6. Related work 7. Conclusion, Impact statement, Environmental impact, Acknowledgements and References A. Additional results on self-speculative decoding B. Alternative architectures C. Training speeds D. Finetuning E. Additional results … Read more

Unleashing LLM Speed: Multi-Token Self-Speculative Decoding Redefines Inference

Table of Links Abstract and 1. Introduction 2. Method 3. Experiments on real data 4. Ablations on synthetic data 5. Why does it work? Some speculation 6. Related work 7. Conclusion, Impact statement, Environmental impact, Acknowledgements and References A. Additional results on self-speculative decoding B. Alternative architectures C. Training speeds D. Finetuning E. Additional results … Read more