:::info
This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Andrey Zhmoginov, Google Research & {azhmogin,sandler,mxv}@google.com;
(2) Mark Sandler, Google Research & {azhmogin,sandler,mxv}@google.com;
(3) Max Vladymyrov, Google Research & {azhmogin,sandler,mxv}@google.com.
:::
Table of Links
Abstract and Introduction
Problem Setup and Related Work
HyperTransformer
Experiments
Conclusion and References
A Example of a Self-Attention Mechanism For Supervised Learning
B Model Parameters
C Additional Supervised Experiments
D Dependence On Parameters and Ablation Studies
E Attention Maps of Learned Transformer Models
F Visualization of The Generated CNN Weights
G Additional Tables and Figures