Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With a 128Gb Mac, you can even run 405b at 1-bit quantization - it's large enough that even with the considerable quality drop that entails, it still appears to be smarter than 70b.


Just to clarify, you are saying 1b-quantized 405b is smarter than 70b unquantized?


You need to quantize 70b to run it on that kind of hardware as well, since even float16 wouldn't fit. But 405b:IQ1_M seems to be smarter than 70b:Q4_K_M in my experiments (admittedly very limited because it's so slow).

Note that IQ1_M quants are not really "1-bit" despite the name. It's somewhere around 1.8bpw, which just happens to be enough to fit the model into 128Gb with some room for inference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: