LogSumExp Function Properties: Lemmas for Energy Functions

We introduce some useful properties of the LogSumExp function defined below. This is particularly useful because The softmax function, widely utilized in the Transformer models, is the gradient of the LogSumExp function. As shown in (Grathwohl et al., 2019), the LogSumExp corresponds to the energy function of the a classifier.

Lemma 1 LogSumExp(x) is convex.

Proof

Consequently, we have the following smooth approximation for the min function.

B.1 Proof of Proposition 2

:::info
Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).

:::

:::info
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

Table of Links

Appendix B. Some Properties of the Energy Functions

B.1 Proof of Proposition 2

Leave a Comment Cancel reply