try here, I hate llms but this is crazy fast. https://chatjimmy.ai/

bmacho · 2026-02-20T11:30:16 1771587016

  "447 / 6144 tokens"
  "Generated in 0.026s • 15,718 tok/s"

This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now.

Lalabadie · 2026-02-20T11:31:37 1771587097

The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.

FergusArgyll · 2026-02-20T11:50:28 1771588228

Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild

machiaweliczny · 2026-02-20T13:12:41 1771593161

We need that for this chinese 3B model that think 45s for hello world but also solves math.

Bolwin · 2026-02-21T16:29:38 1771691378

Nanbeige. Yeah this seems ideal for models that scale test time compute

Serenacula · 2026-02-21T09:51:53 1771667513

Do we know anything about the method?