Transformer Performance: Hopfield Theory & Cross-Entropy Loss Data

Table 1: Table of selected related works for Hopfield network, enumerating their domain, energy function, and memory capacity. For all the works above, n represents the dimension of the input vector. W is the outer product of the patterns. M is the matrix of patterns. r is the order of polynomial F(·), d is the number of patterns, and c is a positive constant.

Table 2: Large transformer-based language models and their reported cross-entropy loss.

:::info
Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::

:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

Table of Links

Appendix A. Deferred Tables

Leave a Comment Cancel reply