At work I'm working on anomaly detection using ML at the edge and want to move beyond bog-standard stochastic gradient descent to fit the model(s) in favor of methods that exploit the use of analytical Jacobians / Hessians. So I'm comparing and contrasting the various nonlinear (gradient-based) optimization methods for my use cases and trying to see how fast I can make them run.
At work I'm working on anomaly detection using ML at the edge and want to move beyond bog-standard stochastic gradient descent to fit the model(s) in favor of methods that exploit the use of analytical Jacobians / Hessians. So I'm comparing and contrasting the various nonlinear (gradient-based) optimization methods for my use cases and trying to see how fast I can make them run.