Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: DiskerNet – Browse the Internet from Your Disk, Now Open Source (github.com/dosyago)
61 points by keepamovin on July 17, 2023 | hide | past | favorite | 36 comments


Looks interesting. It would be great if the README explained how it works. It sounds like it is some sort of proxy-like web service that does aggressive local caching?


Sorry, I should update that. Basically yes, but not a proxy.


Imagine this + some sort of p2p network where all our local archive becomes available to each other.


Yeah I love that idea. Plus search. Imagine building your own Google, based on a specialist expert archive of the stuff you're really good at and sharing that out to whoever wants it...and federating it into a giant search engine.

Like Yahoo


You should check out https://yacy.net: a global, P2P web search engine, where each peer can build and share its own index, etc.


YaCy discussed almost a year ago: https://news.ycombinator.com/item?id=32597309


Cool find, this seems to be a good option.

SearX also came to my mind but isn't exactly p2p.


Good addition, in this way, we all would be a gigantic crawler.

I think the trade-off here is the latency and data-loss? Or data-loss could be resolved by introducing some sort of redudancy. Like how seeding works in torrents.


Congratulations, you have re-discovered DC++

Get a client: https://sourceforge.net/projects/eiskaltdcpp/

And get started with a public hub: https://vacuum.name/en/dc/hubs


I wanted to mention Beaker Browser, but sadly, it's been archived: https://github.com/beakerbrowser/beaker/blob/master/archive-...


Sounds like a perfect use case for NNCP. It works over any transport, from normal networks to radio links to sneakernet.


What's NNCP?


NNCP


You can't mention that on HN! You'll get aggro from all the 'anti-crypto' people!

Clearly, this is an awful idea. Because something something web3 might be related, and something something 'crypto'! No clear analysis is needed, because I used the bad word. Everyone must now stop looking into this immediately.

/s

On a more positive note, I totally agree and there are many options to get it to work. One possible (and probably awful in many ways) start on the idea would be an automatic IPFS backed directory from the single-file extension. Would be nice just for a friend's and family type of project anyway.


This makes me miss when browsers just had “offline mode” built right in. It certainly wasn’t foolproof, but it was often enough.


These days just trying to save an image redownloads it most of the time. Browsers cache use has really regressed.


Much of it, perhaps all of it, is a result of trackers/advertisers abusing caching for tracking.

We can’t have nice things.


There is the non-default paranoid option of disabling caches entirely. But in general the mitigation only requires isolation between sites, which doesn't have too much of an impact and shouldn't affect saving at all.


Isolation between sites is not sufficient - collaborating sites (and everyone colllaborates with e.g. Google analytics or the hundred other tracking sites) could have an ifrane that would abuse cache. You need isolation by primary site, and i don’t remember the exact issues, but there’s some obstacles over there that need addressing (and that browsers do address, but which result in the cache-hardly-ever-used situation we are in today).


Right, that's the isolation I was thinking of.

Other than jquery or google font, what does that stop from caching? Most assets only get loaded from a single site.


Chrome on Android is pretty good at this


This sounds cool but there are no instructions beyond how to download/install. After installing with npm, I tried to invoke it with `diskernet` and got some inscrutable error. https://pastebin.com/7kmcG9Fp


Sounds like an awesome idea. To make it even more useful it should be a Http proxy that I can set up easily on all my devices (smartphone, tablet, pcs, ...). All my efforts in this direction suffered from the problem that I'm using a lot of different devices and platforms for browsing.


It's working as a proxy, I suppose? Do you have some set-up information beyond building it?


It hooks browser requests/responses and saves responses as-is under a key based on the request. It can then replay the original response.

Has been done multiple times before, the model is cumbersome / hard to manage, has plenty of annoying edge cases where it completely fails and doesn't work well with streaming audio/video.


True that it completely fails with audio and video streaming (tho really that's probably achievable), but if you want that content there's many good tools. The advantage is that it just saves content to disk and revives it, so the browser seems as if it's online, when it may not be.

It doesn't alter the content in anyway, and doesn't need to compress or rewrite anything in order to fit into a single file or strange archive format, or whatever. I'm sure there are uses for those, but this is not that.

It just saves each resource to disk as it receives it.

Actually, I think the edge cases of this "high fidelity" (but not necessarily "broadband", as it--for now--excludes audio/video streaming), are less than with archive formats where you need to rewrite or otherwise alter the content. But the web is vast, you probably have a different experience! :)

As far as the browser is concerned, I think all the CORS, HTTPS-mandatory stuff just works (at least it used to!).

BTW - I was not aware of this having been done before? Do you have any links? Very interested!


It sounds similar to ArchiveBox (https://archivebox.io/) - which is not a bad thing; if it is, then the more, the merrier!


Similar in some ways, quite different in others. They can both be used to archive/snapshot web content, but they take very different approaches.


Good point re instructions. I've added an issue for that. IIRC you can basically just clone it and run it with node. Shouldn't be too hard. :)

Not a proxy tho. More better! :)


I still don‘t understand how it tracks what I‘m surfing. Is it watching cache folders of Edge and Firefox?


There is some project number that begins with the number 2 and is five digits (refers to a port number) - it is a Python app that uses Chrome debug features to save everything you are doing. I don't know how it would work with Firefox unless the debug functionality there is the same. Memory is hazy on this so I might have some details wrong.


I think this task is better accomplished by Zotero (at least the offline bookmarking part), but it's always nice to have alternatives.


Found this which seems a bit similar, archivebox.io


Was this not on GitHub for long time now?


Does it provide full text search?

How does it handle content behind paywalls or content requiring auth?


Well, it piggybacks on your browsing (obviously you are "opted in" because you are using it knowingly to archive what you are browsing!), so everything you can see, it can see. If you don't have access over the great paywall, neither does it. But a cool thing is, it can index the content you can auth to. The full text search is all local. This is not the "local client" of some service. It's just a local app, there is not server that anything is sent to (maybe one day there could be like a federated search, or a way to publish and share your archives, but there is nothing like that now).

Full text search is provided by a couple of different libraries. Fuzzy search works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: