Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

try here, I hate llms but this is crazy fast. https://chatjimmy.ai/
 help



  "447 / 6144 tokens"
  "Generated in 0.026s • 15,718 tok/s"
This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now.

The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.

Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild

We need that for this chinese 3B model that think 45s for hello world but also solves math.

Nanbeige. Yeah this seems ideal for models that scale test time compute

Do we know anything about the method?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: