Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

[dead]


> With prompt caching, verbose context that gets reused is basically free.

But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: