1,021 followers
@roydanroy In here, bottom of p. 422: "stochastic gradient descent directly optimizes the expected risk [..]" https://t.co/XwYwzTp8VB
@roydanroy In here, bottom of p. 422: "stochastic gradient descent directly optimizes the expected risk [..]" https://t.co/XwYwzTp8VB
@TaliaRinger I really wish some folks would come together to write a newer edition of this book: https://t.co/523NUSvARo
@_joaogui1 @_clashluke @_arohan_ f(x) = 1.7159*tanh (2/3*x) It satisfies the properties: f(1)=1 f''(x) has a maximum at 1. https://t.co/EuPKszk7Bw
RT @MLDawn2018: @compthink @emna_amor3 See how years ago, @ylecun came up with a very specific type of Sigmoid to avoid vanishing gradient.…