Am I reading the article wrong? It appears that the author did not test the claims of the proof. Wouldn't a "bug" in this case mean she found an input that did not survive a round trip through the compression algorithm?
Update: Actually, I guess this may have been her point: "The two bugs that were found both sat outside the boundary of what the proofs cover." So then I guess the title might be a bit click baity.
Hi! Author here. When we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.
If a buffer overflow causes the system to be exploited and all your bitcoins to be stolen, I don't think the fact that the bug being in the language runtime is going to be much consolation. Especially if the software you were running was advertised as formally verified as free of bugs.
Secondly, I did find a bug in the algorithm. in Archive.lean, in the parsing of the compressed archive headers. That was the crashing input.
OK, so now I see the shadow edit you did for the code source, thanks. Unfortunately, it shows that you are incorrect. For one, the function is a private function and can only be called by local code. Everywhere that the function is called, the size given to it is verified by the program; there is even a note that says it limits the maximum zip file size to avoid a zip bomb. In addition, the code you are quoting isn't even the final code; it is an interim step from what Claude was iterating on. Sucks that this got so much traction, as you are purposely being deceptive in trying to say that this is a bug. You intentionally removed the 'private' keyword in the function signature, as you knew that it would tip off most people to then check when it is actually used.
> I think it's fair to consider the entire binary a fair target.
Yes, it's still very much a bug. But it has nothing to do with your program being formally verified or not. Formal verification can do nothing about any unverified code you rely on. You would really need a formal verification of every piece of hardware, the operating system, the runtime, and your application code. Short of that, nobody should expect formal verification to ensure there are no bugs.
I read it as that’s also the point. Adding formal verification is not a strict defense against bugs. It is in a way similar to having 100% test coverage and finding bugs in your untested edge cases.
I don’t think the author is attempting to decry formal verification, but I think it a good message in the article everyone should keep in mind that safety is a larger, whole system process and bugs live in the cracks and interfaces.
You're right. It just seems as though it should be self-evident. Especially to those sophisticated enough to understand and employ formal verification.
It does seem that way doesn't it? But as software bugs are becoming easier to find and exploit, I'm expecting more and more people, including those not "sophisticated enough" to understand and employ formal verification to start using it
Then it would help to not introduce any confusion into the ecosystem by using a click-baity title that implies you found a bug which violated the formal specification.
> When we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.
Yeah, I would actually agree. We wouldn't want to advertise that a system is formally verified in some way if that creates a false sense of security. I was just pointing out that, by my reading, the title appears to suggest that the core mechanism of the Lean proof is somehow flawed. When I read the title, I immediately thought, "Oooh. Looks like someone demonstrated a flaw in the proof. Neat." But that's not what is shown in the article. Just feels a bit misleading is all.
sorry to hijack the thread. Really cool post. How long did the whole exercise including porting zlib to lean take?
i have a hard real time system that i would love to try this on, but that's a lot of tools to learn and unclear how to model distributed systems in lean.
also, please add rss so i could subscribe to your blog
Lean-zip was not my project but one by others in the lean community. I'm not sure about the methodological details of their process - you might want to check with the original lean-zip authors (https://github.com/kim-em/lean-zip)
Wouldn't it be more correct to call the article "critical" and not "incendiary"? I looked it over and I don't remember seeing any calls to violence. Altman needs to remember that he holds an incredible amount of power in this moment. He and other current AI tech leaders are effectively sitting on the equivalent of a technological nuclear bomb. Anyone in their right mind would find that threatening.
Right, but the picture those statements painted collectively was not flattering. And that was certainly intended by the authors. Thus, critical, but not at all "incendiary."
Update: To clarify, my personal stance is that the critical tone was both intended by the authors and, in my opinion, appropriate given how much power Mr. Altman holds. If he has a history of behaving inconsistently, that deserves daylight.
Are you arguing that because the authors knew the pattern they were documenting was unflattering, the piece is somehow compromised? That they clearly had an agenda? That's called reporting. They called a hundred-plus named sources and the picture those sources independently painted was damning. Altman has a history of telling repeated, easily-checked lies, followed by fresh lies when caught in the first ones.
Are you suggesting that they should have "both sides"-ed by reporting company PR and Sam-friendly sources and giving them equal weight? Sometimes the facts point in one direction.
> Are you arguing that because the authors knew the pattern they were documenting was unflattering, the piece is somehow compromised?
Uh, no? Lol, I'm on your side, bud. Put away the pitchfork. I thought it was a really good and fair article. I am not the adversary you're looking for.
> my personal stance is that the critical tone was both intended by the authors
You may think we are on the same side. You don't understand what side I'm on. "Lol".
Your "personal stance" is that you can get inside the heads of the reporters? Obviously not. So you're going by the idea that an article that leads to critical conclusions is inherently slanted. This is an insidious and damaging idea. It has led to the belief by journalists and editors that they need to twist themselves into pretzels to present "both sides", which is easily exploited by people of bad faith to launder outright lies. There's a direct line between this and authoritarianism. I'm quite serious about this. The fact that you agree with the authors in this case is completely orthogonal.
Every article is inherently biased due to the fact that there are inclusions and omissions. This is just a fact.
You're injecting your own personal view into GP's statement by adding a lot of weight into the distinction between the words "critical" and "incendiary" and "neutral", when GP made a very neutral and not as charged statement.
I love reading stuff like “Critical, slanted, and compromised mean the same thing. They are interchangeable words.”
Given that, it looks like your position on davesque’s posts is slanted. Your take is critical of those posts, which means your assessment is compromised, and as such should not be taken as valid.
And I love seeing sentiments attributed to me, in quotes even, that I didn't state or imply, and certainly don't believe. "Critical" by itself is not a synonym for "slanted". However the post I was commenting on was:
> Right, but the picture those statements painted collectively was not flattering. And that was certainly intended by the authors. Thus, critical, but not at all "incendiary."
The key there is "certainly intended by the authors". The full sentiment here IS equivalent to "slanted".
I'm honestly completely in favor of this. Anthropic obviously budgets their capacity based on projected human usage patterns coming through their native app suite (Claude Cowork, Claude Code, etc.). They should not be expected to shoulder the burden of community tools like OpenClaw that are effectively designed to strategically max out usage windows on the plan. That has clearly caused issues with uptime in the past couple of months and I've gotten pretty fed up with the degraded service quality while I'm just trying to use Claude Code as it's intended to be used. I'm happy to see they're doing something about this. Seems like a totally fair move to me. I'd rather that Claude Code functions well when I'm using it according to its design.
I wish these AI vendors would quit publishing comparisons with the previous generation of their competitors's models. It's just such a glaringly bad look and no one is fooled by it, even if their achievements deserve praise in their own right. The Qwen models are great and don't deserve the reputational hit that comes from dodgy marketing tactics.
I'm not sure this is the issue. I asked Claude Code a simple question yesterday. No sub agents. No web fetches. Relatively small context. Outside of peak hours. Burned 8% of my Max 5x 5hr usage limit. I've never seen anything like this before, even when the cache is cold.
You don't. Most of the time (after the first prompt following a compaction or context clear) the context prefix is cached, and you pay something like 10% of the cost for cached tokens. But your total cost is still roughly the area under a line with positive slope. So increases quadratically with context length.
Yeah, the viz for polar quantization is straight up nonsensical. Okay, so some colors are converted into clocks and then into a bigger box with a pink box inside of it. Got it. Even understanding what polar coordinates are doesn't help you make sense out of it.
Yeah, and some parts of the article are just bizarre:
> Instead of looking at a memory vector using standard coordinates (i.e., X, Y, Z) that indicate the distance along each axis, PolarQuant converts the vector into polar coordinates using a Cartesian coordinate system. This is comparable to replacing "Go 3 blocks East, 4 blocks North" with "Go 5 blocks total at a 37-degree angle”
Why bother explaining this? Were they targeting the high school and middle school student reader base??
Update: Actually, I guess this may have been her point: "The two bugs that were found both sat outside the boundary of what the proofs cover." So then I guess the title might be a bit click baity.
reply