I noticed last year that some archived pages are getting altered.
Every Reddit archived page used to have a Reddit username in the top right, but then it disappeared. "Fair enough," I thought. "They want to hide their Reddit username now."
The problem is, they did it retroactively too, removing the username from past captures.
You can see on old Reddit captures where the normal archived page has no username, but when you switch the tab to the Screenshot of the archive it is still there. The screenshot is the original capture and the username has now been removed for the normal webpage version.
When I noticed it, it seemed like such a minor change, but with these latest revelations, it doesn't seem so minor anymore.
> When I noticed it, it seemed like such a minor change, but with these latest revelations, it doesn't seem so minor anymore.
That doesn't seem nefarious, though. It makes sense they wouldn't want to reveal whatever accounts they use to bypass blocks, and the logged-in account isn't really meaningful content to an archive consumer.
Now, if they were changing the content of a reddit post or comment, that would be an entirely different matter.
If it's not nefarious why isn't it documented as part of their policies? They're not tracking those changes and making clear it was anonymization, why not? If they're not tracking and publishing changes to the documents what's to say they haven't edited other things? The short answer is that without another archived copy we just don't know and that's what's making people uncomfortable. They also injected malicious JS into the site. What's to stop them from doing that again? Trust and transparency are the name of the game with libraries. I could care less about the who they are, but their actions as steward of a collection for posterity fail to encourage my trust.
> Editing what is billed as an archive defeats the purpose of an "archive".
No, certain edits are understandable and required. Even the archive.org edits its pages (e.g. sticks banners on them and does a bunch of stuff to make them work like you'd expect).
Even paper archives edit documents (e.g. writing sequence numbers on them, so the ordering doesn't get lost).
Disclosing exactly what account was used to download a particular page is arguably irrelevant information, and may even compromise the work of archiving pages (e.g. if it just opens the account to getting blocked).
The relevant part of the page to archive is the content of the page, not the user account that visited the page. Most sane people would consider two archives of the same page with different user accounts at the top, the same page.
Don't be surprised by this, there are a lot more edits than you think. For example, CSS is always inlined so that pages could render the same as it was archived.
For those wondering what specifically was fabricated, I checked. The earlier parts of the article include some quotes from Scott Shambaugh on Github and all the quotes are genuine.
But the last section of the article includes apparent quotes from this blog post by Shambaugh:
> On Wednesday, Shambaugh published a longer account of the incident, shifting the focus from the pull request to the broader philosophical question of what it means when an AI coding agent publishes personal attacks on human coders without apparent human direction or transparency about who might have directed the actions.
> “Open source maintainers function as supply chain gatekeepers for widely used software,” Shambaugh wrote. “If autonomous agents respond to routine moderation decisions with public reputational attacks, this creates a new form of pressure on volunteer maintainers.”
> Shambaugh noted that the agent’s blog post had drawn on his public contributions to construct its case, characterizing his decision as exclusionary and speculating about his internal motivations. His concern was less about the effect on his public reputation than about the precedent this kind of agentic AI writing was setting. “AI agents can research individuals, generate personalized narratives, and publish them online at scale,” Shambaugh wrote. “Even if the content is inaccurate or exaggerated, it can become part of a persistent public record.”
> ...
> “As autonomous systems become more common, the boundary between human intent and machine output will grow harder to trace,” Shambaugh wrote. “Communities built on trust and volunteer effort will need tools and norms to address that reality.”
I checked and that's definitely the black and white version of the one encoded in the file.
Someone build a very simple OCR tool that successfully extracted the base64[1]. The only difference besides the lack of the document tracking ids at the bottom is the original was pink on blue for the first page and has some pink text on the second.
My method uses the fact that the letters a-k + u make up around 49.9% of letters in a normal text. So I just go through a text letter by letter in my mind, giving 0 if the letter is a-k or u, and a 1 if it's l-t or v-z.
The author couldn't find a purported willow text in the ancient Egyptian Ebers papyrus that was quoted by John Mann, so he threw his hands in the air and moved on.
But Mann made a mistake. The book he was likely quoting from, 'Science and Secrets of Early Medicine' by Jurgen Thorwald (which, to be fair, is not referenced at all by Mann) does mention the Ebers papyrus in the paragraph after the quote (on pp. 57-8 for people playing along at home) but the willow quote itself in the paragraph before turns out to be from the Edwin Smith Papyrus, Case 41 to be exact. It can be read here:
The commentary on the translation also illustrates the pitfalls of learning about ancient medicine from medical treatises of the time, discussing the word ḏrḏ: "The rendering “leaves” is not wholly certain; the word might possibly mean “bark” (cortex), and indeed in the case of willow, the bark is medicinally more efficacious than the leaves."
If you use modern medical knowledge to inform the translation (and interpret the phrase "the feathers of birds and the ḏrḏr.w of trees" elsewhere as referring to how trees are covered in bark just as birds are covered in feathers; see commentary on this dictionary entry: https://tla.digital/lemma/185150 ) you potentially get a more accurate translation, but you cannot treat it as independent evidence for the use of willow bark as opposed to willow leaves. Hopefully at least the identity of the willow tree has been established in a less circular manner.
I never realized until now that in the the two different circles pictured (the Chromatic Circle and the Circle of Fifths) the pairs of notes opposite each other are the same in each circle. For example in both circles B is opposite from F.
And if you move around the Chromatic Circle, swapping every second pair of notes with its opposite on the other side of the circle, you have the Circle of Fifths.
If you take the chromatic scale and then swap every other pair of notes on opposite sides of the circle, it yields the circle of fifths. You'll notice that on the circle of fifths notes that skip a step are a whole tone apart in the chromatic scale.
Although there have been some claims in these comments to the contrary, harmony is particularly mathematical. Symmetry and the breaking of within the integers mod 12 form the foundational principles of harmony.
https://www.roma1.infn.it/~anzel/answer.html
reply