Qwen 3.5 27B is dense, so (I think) should be compared to Gemma 4 31B. Or Gemma-...

redman25 · 2026-04-02T21:38:53 1775165933

Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.

swalsh · 2026-04-03T00:00:41 1775174441

Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.

petu · 2026-04-03T07:05:58 1775199958

If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).

daemonologist · 2026-04-03T01:43:37 1775180617

The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.