With a nvidia spark or 128gb+ memory machine, you can get a good speed up on the...

foobar10000 · 2026-04-13T02:11:41 1776046301

1 token ahead or 2?

It's interesting - imo we'll soon have draft models specifically post-trained for denser, more complicated models. Wouldn't be surprised if diffusion models made a comeback for this - they can draft many tokens at once, and learning curves seem to top out at 90+% match for auto-regressive ones so quite interesting..

electroglyph · 2026-04-13T10:10:25 1776075025

flow matching is making some strides right now, too