Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

* Yes I am aware I am not running R1, and I am running a distilled version of it.

If you have experience with tiny ~1B param models, its still head and shoulders above anything that has come before. IMO there have not been any other quantized/distilled/etc models as good at this size. It would not exist without the original R1 model work.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: