Show HN: DiskerNet – Browse the Internet from Your Disk, Now Open Source

rickcarlino · on July 17, 2023

Looks interesting. It would be great if the README explained how it works. It sounds like it is some sort of proxy-like web service that does aggressive local caching?

keepamovin · on July 17, 2023

Sorry, I should update that. Basically yes, but not a proxy.

Alifatisk · on July 17, 2023

Imagine this + some sort of p2p network where all our local archive becomes available to each other.

keepamovin · on July 17, 2023

Yeah I love that idea. Plus search. Imagine building your own Google, based on a specialist expert archive of the stuff you're really good at and sharing that out to whoever wants it...and federating it into a giant search engine.

Like Yahoo

felipeqq2 · on July 17, 2023

You should check out https://yacy.net: a global, P2P web search engine, where each peer can build and share its own index, etc.

password4321 · on July 17, 2023

YaCy discussed almost a year ago: https://news.ycombinator.com/item?id=32597309

Alifatisk · on July 17, 2023

Cool find, this seems to be a good option.

SearX also came to my mind but isn't exactly p2p.

Alifatisk · on July 17, 2023

Good addition, in this way, we all would be a gigantic crawler.

I think the trade-off here is the latency and data-loss? Or data-loss could be resolved by introducing some sort of redudancy. Like how seeding works in torrents.

orangepurple · on July 17, 2023

Congratulations, you have re-discovered DC++

Get a client: https://sourceforge.net/projects/eiskaltdcpp/

And get started with a public hub: https://vacuum.name/en/dc/hubs

johnnyworker · on July 17, 2023

I wanted to mention Beaker Browser, but sadly, it's been archived: https://github.com/beakerbrowser/beaker/blob/master/archive-...

myself248 · on July 17, 2023

Sounds like a perfect use case for NNCP. It works over any transport, from normal networks to radio links to sneakernet.

Alifatisk · on July 17, 2023

What's NNCP?

Alifatisk · on July 17, 2023

chaxor · on July 17, 2023

You can't mention that on HN! You'll get aggro from all the 'anti-crypto' people!

Clearly, this is an awful idea. Because something something web3 might be related, and something something 'crypto'! No clear analysis is needed, because I used the bad word. Everyone must now stop looking into this immediately.

/s

On a more positive note, I totally agree and there are many options to get it to work. One possible (and probably awful in many ways) start on the idea would be an automatic IPFS backed directory from the single-file extension. Would be nice just for a friend's and family type of project anyway.

mmcclure · on July 17, 2023

This makes me miss when browsers just had “offline mode” built right in. It certainly wasn’t foolproof, but it was often enough.

Dylan16807 · on July 17, 2023

These days just trying to save an image redownloads it most of the time. Browsers cache use has really regressed.

beagle3 · on July 17, 2023

Much of it, perhaps all of it, is a result of trackers/advertisers abusing caching for tracking.

We can’t have nice things.

Dylan16807 · on July 18, 2023

There is the non-default paranoid option of disabling caches entirely. But in general the mitigation only requires isolation between sites, which doesn't have too much of an impact and shouldn't affect saving at all.

beagle3 · on July 18, 2023

Isolation between sites is not sufficient - collaborating sites (and everyone colllaborates with e.g. Google analytics or the hundred other tracking sites) could have an ifrane that would abuse cache. You need isolation by primary site, and i don’t remember the exact issues, but there’s some obstacles over there that need addressing (and that browsers do address, but which result in the cache-hardly-ever-used situation we are in today).

Dylan16807 · on July 19, 2023

Right, that's the isolation I was thinking of.

Other than jquery or google font, what does that stop from caching? Most assets only get loaded from a single site.

xnx · on July 17, 2023

Chrome on Android is pretty good at this

hiatus · on July 17, 2023

This sounds cool but there are no instructions beyond how to download/install. After installing with npm, I tried to invoke it with `diskernet` and got some inscrutable error. https://pastebin.com/7kmcG9Fp

andix · on July 17, 2023

Sounds like an awesome idea. To make it even more useful it should be a Http proxy that I can set up easily on all my devices (smartphone, tablet, pcs, ...). All my efforts in this direction suffered from the problem that I'm using a lot of different devices and platforms for browsing.

Tomte · on July 17, 2023

It's working as a proxy, I suppose? Do you have some set-up information beyond building it?

armitron · on July 17, 2023

It hooks browser requests/responses and saves responses as-is under a key based on the request. It can then replay the original response.

Has been done multiple times before, the model is cumbersome / hard to manage, has plenty of annoying edge cases where it completely fails and doesn't work well with streaming audio/video.

keepamovin · on July 17, 2023

True that it completely fails with audio and video streaming (tho really that's probably achievable), but if you want that content there's many good tools. The advantage is that it just saves content to disk and revives it, so the browser seems as if it's online, when it may not be.

It doesn't alter the content in anyway, and doesn't need to compress or rewrite anything in order to fit into a single file or strange archive format, or whatever. I'm sure there are uses for those, but this is not that.

It just saves each resource to disk as it receives it.

Actually, I think the edge cases of this "high fidelity" (but not necessarily "broadband", as it--for now--excludes audio/video streaming), are less than with archive formats where you need to rewrite or otherwise alter the content. But the web is vast, you probably have a different experience! :)

As far as the browser is concerned, I think all the CORS, HTTPS-mandatory stuff just works (at least it used to!).

BTW - I was not aware of this having been done before? Do you have any links? Very interested!

mxuribe · on July 17, 2023

It sounds similar to ArchiveBox (https://archivebox.io/) - which is not a bad thing; if it is, then the more, the merrier!

freedomben · on July 17, 2023

Similar in some ways, quite different in others. They can both be used to archive/snapshot web content, but they take very different approaches.

keepamovin · on July 17, 2023

Good point re instructions. I've added an issue for that. IIRC you can basically just clone it and run it with node. Shouldn't be too hard. :)

Not a proxy tho. More better! :)

Tomte · on July 17, 2023

I still don‘t understand how it tracks what I‘m surfing. Is it watching cache folders of Edge and Firefox?

tenebrisalietum · on July 17, 2023

There is some project number that begins with the number 2 and is five digits (refers to a port number) - it is a Python app that uses Chrome debug features to save everything you are doing. I don't know how it would work with Firefox unless the debug functionality there is the same. Memory is hazy on this so I might have some details wrong.

marcescence · on July 17, 2023

I think this task is better accomplished by Zotero (at least the offline bookmarking part), but it's always nice to have alternatives.

Alifatisk · on July 17, 2023

Found this which seems a bit similar, archivebox.io

yewenjie · on July 17, 2023

Was this not on GitHub for long time now?

chillbill · on July 17, 2023

Does it provide full text search?

How does it handle content behind paywalls or content requiring auth?

keepamovin · on July 17, 2023

Well, it piggybacks on your browsing (obviously you are "opted in" because you are using it knowingly to archive what you are browsing!), so everything you can see, it can see. If you don't have access over the great paywall, neither does it. But a cool thing is, it can index the content you can auth to. The full text search is all local. This is not the "local client" of some service. It's just a local app, there is not server that anything is sent to (maybe one day there could be like a federated search, or a way to publish and share your archives, but there is nothing like that now).

Full text search is provided by a couple of different libraries. Fuzzy search works.