Every model learned by gradient descent is approximately a kernel machine (2020) February 25, 2024 by Comments