> I would also expect to see it taking exponentially longer to process a prompt. I don't believe LLMs work like that.
Try this out using a local LLM. You'll see that as the conversation grows, your prompts take longer to execute. It's not exponential but it's significant. This is in fact how all autoregressive LLMs work.
Yesterday I was playing around with Gemma4 26B A4B with a 3 bit quant and sizing it for my 16GB 9070XT:
Total VRAM: 16GB
Model: ~12GB
128k context size: ~3.9GB
At least I'm pretty sure I landed on 128k... might have been 64k. Regardless, you can see the massive weight (ha) of the meager context size (at least compared to frontier models).
> As a user, I _expect_ the cost of resuming X hours/days later to be no different to resuming seconds or minutes later.
As an informed user who understands his tools, I of course expect large uncached conversations to massively eat into my token budget, since that's how all of the big LLM providers work. I also understand these providers are businesses trying to make money and they aren't going to hold every conversation in their caches indefinitely.
I'd hazard a guess that there's a large gulf between proportion of users who know as much as you, and the total number using these tools. The fact that a message can perform wildly differently (in either cost, or behaviour if using one of the mitigations) based on whether I send it at t vs t+1 seems like a major UX issue, especially given t is very likely not exposed in the UI.
Haven't had a chance to test 4.7 much but one of my pet peeves with 4.6 is how eager it is to jump into implementation. Though maybe the 4.7 is smarter about this now.
The system prompt is always loaded in its entirety IIUC. It's technically possible to modify it during a conversation but that would invalidate the prefill cache for the big model providers.
Nope, the original tariffs were under IEEPA, then Supreme Court ruled they didn't have authority to use IEEPA, so they had to drop those tariffs and start working on refunds. It'd only have been illegal if they kept the tariffs after the ruling.
Lot of propaganda & emotions around this straightforward chain of events.
Under this reasoning, it's not illegal to just take things from stores (stores hate this one simple trick). If you're caught and your specific actions are then adjudicated to be illegal, at that point you can just start making a plan to bring the items back (even if some are used/damaged/etc) and everything is fine.
In reality of course, the actions were illegal the whole time. The big festering problem is that there is no actual punishment for government agents who break the law.
The existence of case law / precedent does not affect whether something is "already illegal", but rather only how strongly one can predict if something is illegal. The original tariffs were illegal from day 1.
The point of the analogy was exactly to point at something with a lot of case law where this dynamic is crystal clear (although if Trump starts petty shoplifting after he's done looting our government, it's even odds whether this corrupt "court" will find some way to excuse it. Anything for the cause, of course)
Wikipedia on IEEPA: "An Act with respect to the powers of the President in time of war or national emergency. "?
I mean thats very wishi washi. So are we both aligned that it looks like missuse? Because if its only about a word definition of no its not illegal what he did but a clear missconduct than it feels like word play.
Eh, YMMV. I was using rocm for minor AI things as far back as 2023 on an "unsupported" 6750XT [0]. Even trained some LoRAs. Mostly the issues were how many libs were cuda only.
Try this out using a local LLM. You'll see that as the conversation grows, your prompts take longer to execute. It's not exponential but it's significant. This is in fact how all autoregressive LLMs work.
reply