Train ImageNet without Hyperparameters with Automatic Gradient Descent | by Chris Mingard | Apr, 2023
[ad_1] Towards architecture-aware optimisationTL;DR We’ve derived an optimiser called automatic gradient descent (AGD) that can train ImageNet without hyperparameters. This removes the need for expensive and time-consuming learning rate tuning,…