Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi, I run the datacenter/infrastructure team at the Internet Archive! We would love to see you at our various events this fall but if paying for the ticket is difficult for you, please email me (in bio) and we'll get you in (if possible).


Are they distributed events all around the world of just in wherever the team is gathered (San Francisco I guess?)

By the way, thank you all the teams in IA, what you provide is such an important thing for humanity.


Thanks for helping to run my favorite library on earth.


Hey, Q., so what's the size of the internet archive?


For the purposes of ballpark, between 150-200 petabytes of unique data, probably on the lowish end of that last I checked.


it is large enough that I am wondering if the data captured by the actual physical magnetic charges has a heft, that a person could feel. obviously the hardware would fill a house or something, but at what point does the worlds data become a discernable physical reality, at least in theory


I'm betting exabyte or close maybe


Most of all, i'm curious about how you reliably and securely store or host so many archived pages. Would you mind briefly explaining such a huge undertaking? Also, total congratulations on the fantastic achievement of this. You guys are my go-to for so much information.

Edit: And how many terabytes it all amounts to.


We all know the NSA has access to servers hosted in the U.S. How are you protecting the archive from malicious tampering? Are you using any form of immutable storage? Is it post-quantum secure?


Why would they do that? Have you previously seen a case where they "maliciously tampered" with anyone's website?


I just question the integrity and immutability of the data IA is archiving, that's all

You want to know why they'd tamper data?

https://seclab.cs.washington.edu/2017/10/30/rewriting-histor...

https://blog.archive.org/2018/04/24/addressing-recent-claims...

NSA already paid to back-door RSA, got caught shiping pre-hacked routers, can rewrite pages mid-flight with QUANTUM, penetrate and siphon data from remote infected machines.. what else could they do?

https://www.amnesty.org/en/latest/news/2022/09/myanmar-faceb...


IA themselves could tamper with the data, no? It was never meant to be an official historical snapshot to be pulled up for any serious or official purposes. Although it has been used that way for high profile internet drama. It's just a matter of time (maybe during an election) before it's surreptitiously altered and referenced for nefarious purposes.


I would love to work for IA but openings are rare


If you are in Europe, consider Software Heritage (similar to IA but for source code) too:

https://www.softwareheritage.org/jobs/


Internet Archive now have a presence in Amsterdam


What events are we talking about here?



would love technical details around this feat. ex: how you even crawl to begin with, storage, etc




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: