How Toto Reimagines Multi-Head Attention for Multivariate Forecasting
Table of Links Background Problem statement Model architecture Training data Results Conclusions Impact statement Future directions Contributions Acknowledgements and References Appendix 3 Model architecture Toto is a decoder-only forecasting model. This model employs many of the latest techniques from the literature, and introduces a novel method for adapting multi-head attention to multivariate time series data … Read more