Hacker News

qeternity · 2026-03-01T12:44:55 1772369095

> With prompt caching, verbose context that gets reused is basically free.

But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.