With a 128Gb Mac, you can even run 405b at 1-bit quantization - it's large enoug...

ComputerGuru · on Sept 26, 2024

Just to clarify, you are saying 1b-quantized 405b is smarter than 70b unquantized?

int_19h · on Sept 26, 2024

You need to quantize 70b to run it on that kind of hardware as well, since even float16 wouldn't fit. But 405b:IQ1_M seems to be smarter than 70b:Q4_K_M in my experiments (admittedly very limited because it's so slow).

Note that IQ1_M quants are not really "1-bit" despite the name. It's somewhere around 1.8bpw, which just happens to be enough to fit the model into 128Gb with some room for inference.