Large Language Models on Memory-Constrained Devices Using Flash Memory: Results for OPT 6.7B Model
:::info Authors: (1) Keivan Alizadeh; (2) Iman Mirzadeh, Major Contribution; (3) Dmitry Belenko, Major Contribution; (4) S. Karen Khatamifard; (5) Minsik Cho; (6) Carlo C Del Mundo; (7) Mohammad Rastegari; (8) Mehrdad Farajtabar. ::: Table of Links Abstract and 1. Introduction 2. Flash Memory & LLM Inference and 2.1 Bandwidth and Energy Constraints 2.2 Read … Read more