Using larger contexts often costs more in the APIs or consume more of your quota but this is becoming less of a problem with models using more clever attention mechanisms and not just full attention on all layers.
This is also something of a non issue because as context grows and attention gets diluted, the models perform worse. It'll cost Anthropic more to run your 900k context session, yes, but it's in your interest not to have a 900k session in the first place.
Fil-C is much slower, no free lunch, if you want the language to be fast and memory safe you need to add restrictions to allow proper static analysis of the code.
MiniMax M2.7, MiMo-V2-Pro, GLM-5, GLM5-turbo, Kimi K2.5, DeepSeek V3.2, Step 3.5 Flash (this last one is particularly cheap while still being powerful).
It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.
The main problem I think that it was extremely slow.
reply