The point is that open weights turns puts inference on the open market, so if your model is actually good and providers want to serve it, it will drive costs down and inference speeds up. Like Cerebras running Qwen 3 235B Instruct at 1.4k tps for cheaper than Claude Haiku (let that tps number sink in for a second. For reference, Claude Opus runs ~30-40 tps, Claude Haiku at ~60. Several orders of magnitude difference). As a company developing models, it means you can't easily capture the inference margins even though I believe you get a small kickback from the providers.
So I understand why they wouldn't want to go open weight, but on the other hand, open weight wins you popularity/sentiment if the model is any good, researchers (both academic and other labs) working on your stuff, etc etc. Local-first usage is only part of the story here. My guess is Qwen 3.5 was successful enough that now they want to start reaping the profits. Unfortunately most of Qwen 3.5's success is because it's heavily (and successfully!) optimized for extremely long-context usage on heavily constrained VRAM (i.e. local) systems, as a result of its DeltaNet attention layers.
It is partly the medium's fault. A lot of the sins of CD/digital mastering wont fly on vinyl because there's physical constraints around what you can literally press into the record groove.
I've had a Framework 13 for several years now, so I'm excited to see this kind of thing start to happen. Praying the next one out is a GPU/tensor workload unit so I'm not stuck at home on my desktop when I want to mess around with local AI models...
wasn't this the attention sink concept to some degree? I mean it doesn't seem out of the realm of possibility that if the latency overhead isn't signifigant, that frontier models start adopting similar to DeepSeek OCR tech
I see a lot of debate about whether AI is "conscious" or has "consciousness", with people talking past each other without a firm grasp on their own stances, so I made a quick quiz to help you locate where you stand.
I prefer to write blog posts in markdown, especially if I'm on the go, and historically I've manually reformatted it in Substack's editor. Figured, why not automate this? So I built a simple utility. Everything happens in your browser with the `marked` library.
Most important feature for me is converting from ASCII to "fancy" quotes and apostrophes, which Substack inserts automatically in its editor. Some more advanced features are probably broken, because Substack's editor is a bit wonky, but the basic stuff all works. Give it a try!
I think that is extremely common in adverts (most famously here, the M&S "It's not just bread. It's our stone-baked, hand-shaped artisanally-molested bread"), and narrative media like that distinctive kind of journalism written like a storybook, and which (I think) then bled into popular media like true-crime podcasts: "It wasn't just Tuesday. It was the last Tuesday he would ever see. <intro music>".
Also this kind of short sentence construction is used in the incredibly annoying and pervasive style of headlines for opinion pieces: X is Y. And it's Z. (where Z is often "not OK" or "OK").
I assume all this overuse is where LLMs picked it up and weighted it highly.
Copy paste? Use something like desktop commander and just let it edit the files for you directly. It'll even run commands to test it out. Or go further and use Cline/RooCode and if you're building a webapp it'll load your page in a small browser, screenshot the contents, and send that to the model. The copy-paste stuff is beginner mode.
A2A isn't a layer of abstraction over MCP, it functions in parallel and they complement each other. MCP addresses the Agent-to-Environment question, how can Agents "do things" on computers. A2A addresses the Agent-to-Agent question, how can Agents learn about other Agents and communicate with them. You need both.
You CAN try and build "the one agent that does everything" but in scenarios where there's many simultaneous data streams, a better approach would be to have many stateful agents handling each stream via MCP, coupled with a single "executive" agent that calls on each of the stateful agents via A2A to get the high-level info it needs to make decisions on behalf of its user.
To my understanding of this protocol it looks like it's an entity exposing a set of capabilities. Why is that different and complementary to an MCP server exposing tools? Why would you be limited to an "everything agent" in MCP?
I am struggling to see the core problem that this protocol addresses.
Much debated question but if we run with your definition, then A2A adds communication capabilities alongside tool-calling, which is ultimately a set of programmatic hooks. Like "phone a friend" if you don't know the answer given what you have available directly (via MCP, training data, or context).
My assumption is that the initial A2A implementation will be done with MCP, so the LLM can ask your AI directory or marketplace for help with a task via some kind of "phone a friend" tool call, and it'll be able to immediately interop and get the info it needs to complete the task.
reply