Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Early-Exit Models
:::info Authors: (1) Yinwei Dai, Princeton University (Equal contributions); (2) Rui Pan, Princeton University (Equal contributions); (3) Anand Iyer, Georgia Institute of Technology; (4) Ravi Netravali, Georgia Institute of Technology. ::: Table of Links Abstract and 1 Introduction 2 Background and Motivation and 2.1 Model Serving Platforms 2.2 Early-Exit Models 2.3 Challenges 3 Design 3.1 … Read more