Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ghosts of Softmax: Zeros of the partition function explain training instability (github.com/piyush314)
4 points by g4omingron 19 days ago | hide | past | favorite | 3 comments


Author here. The short version: softmax's partition function has complex zeros — from e^{iπ}+1=0 — that are invisible on the real line but cap safe step sizes at ρₐ = π/Δₐ. One JVP to compute. The repo has Colab notebooks if you want to poke at it. Happy to answer questions.

Full paper https://arxiv.org/html/2603.13552v1


Nice work! The paper feels verbose at times and could use some editing to slim it down (also, equation 6 is just equation 5 in a box) but I enjoyed it a lot nonetheless.


How does it differ from traditional "small step good, big step bad" literature?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: