PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Predictor Analysis

:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Minghao Yan, University of Wisconsin-Madison;

(2) Hongyi Wang, Carnegie Mellon University;

(3) Shivaram Venkataraman, myan@cs.wisc.edu.

:::

Table of Links

Abstract & Introduction
Motivation
Opportunities
Architecture Overview
Proble Formulation: Two-Phase Tuning
Modeling Workload Interference
Experiments
Conclusion & References
A. Hardware Details
B. Experimental Results
C. Arithmetic Intensity
D. Predictor Analysis

D PREDICTOR ANALYSIS

We vary the latency SLO to assess how the predictor schedules the fine-tuning requests. We replay a 60-second stream where we initially set the latency SLO to 250ms for the first half (30 seconds), and then increase it to 700ms for the remainder. As shown in Figure 14, under stringent latency conditions, the predictor deduces that it is impractical to schedule fine-tuning requests while adhering to the latency SLO, hence no fine-tuning requests are scheduled. Conversely, when the latency SLO is more relaxed, the predictor determines that it is feasible to schedule fine-tuning requests and sequentially schedules each request once the preceding one is completed and has issued a completion signal.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.