Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Qwen 3.5 27B is dense, so (I think) should be compared to Gemma 4 31B.

Or Gemma-4 26B(-A4B) should be compared to Qwen 3.5 35B(-A3B)



Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.


Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.


If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).


The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: