* Yes I am aware I am not running R1, and I am running a distilled version of it.
If you have experience with tiny ~1B param models, its still head and shoulders above anything that has come before. IMO there have not been any other quantized/distilled/etc models as good at this size. It would not exist without the original R1 model work.
If you have experience with tiny ~1B param models, its still head and shoulders above anything that has come before. IMO there have not been any other quantized/distilled/etc models as good at this size. It would not exist without the original R1 model work.