Paul Khuong on Nostr: "a conceptually simple and computationally cheap extrapolation scheme strictly ...
"a conceptually simple and computationally cheap extrapolation scheme strictly improves the worst-case convergence rate. […] improves the best possible worst-case performance by the same amount as conducting O(sqrt(N/log(N)) more gradient steps."
https://optimization-online.org/2024/02/on-averaging-and-extrapolation-for-gradient-descent/
https://optimization-online.org/2024/02/on-averaging-and-extrapolation-for-gradient-descent/