Does Gradient Flow Over Neural Networks Really Represent Gradient Descent?
Off the Convex Path,
TL;DR A lot was said in this blog (cf. post by Sanjeev) about the importance of studying trajectories of gradient descent (GD…
TL;DR A lot was said in this blog (cf. post by Sanjeev) about the importance of studying trajectories of gradient descent (GD…
This post brings together several themes I’ve been writing about lately: caching function evaluations, error estimation, and…