Hacker Newsnew | past | comments | ask | show | jobs | submit | logicallee's commentslogin

Thanks so much for sharing this. It looks fantastic. A couple of questions, if you don't mind: what license are you releasing this under, if any? Is there any way to download it? The reason someone might want to download it is for use as training data.

Wikisource has the original scans available in the public domain, and their enriched text under CC-BY-SA: https://en.wikisource.org/wiki/EB1911

Thanks!

The underlying text (1911 edition) is public domain, but the structured version here — the parsing, reconstruction, and linking — is something I put together for this site. Right now there isn’t a bulk download available. I’m considering exposing structured access (API or dataset) in some form, but haven’t decided exactly how that will work yet.

If you have a specific use case in mind (especially for training), I’d be interested to hear more.


I've wanted to do something like this for The Encyclopédie, a hugely relevant text to the Enlightenment. If you ever get around to adding a rough "How I (generally) Made This" section, that'd be appreciated! Site looks great :)

Regarding the specific use case, I was thinking this: I had Gemma 4 (a small but highly capable offline model released by Google) make a public domain cc0 encyclopedia of some core science and technology concepts[1]. I thought it was pretty good.

Separately, I've fine-tuned the Gemma 4 model[2], it was very quick (just 90 seconds), so I think it could be interesting to train it to talk like 1911 Encyclopedia Britannica.

I would use the entries as training data and train it to talk in the same style. There isn't a specific use case for why, I just think it would be interesting. For example, I could see how it writes about modern concepts in the style of 1911 Britannica.

[1] https://stateofutopia.com/encyclopedia/

[2] To talk like a pirate! https://www.youtube.com/live/WuCxWJhrkIM


That’s a fun idea — I can see the appeal of that style.

The underlying text is public domain, but the structured version here is something I put together for the site. I haven’t released a bulk dataset yet.

If you end up experimenting with it, I’d love to hear how it turns out — and I’m still figuring out what structured access might look like.


> Is there any way to download it? The reason someone might want to download it is for use as training data.

Another reason would be to able to keep running/using it even if the main site were to go down for whatever reason eventually; or, to operate a mirror of it, for redundancy (linking back to the original, of course).


Those who like playing with this sort of thing might like to play with this superconductor-coil-as-a-battery exploration where electricity just goes round as storage![1]

[1] https://stateofutopia.com/experiments/wheeeeeloop/wheeeeeloo...


there are a lot of payment providers. what features do you like about Stripe that keeps you with Stripe?

If you use them for subscriptions you are effectively locked with them. The cost of migrating to a different provider (including existing subscriptions AND payment methods) plus the risk of something breaking the renewals is too high for most companies.

Maintain two billing providers and migrate new subscriptions as they onboard. Then slowly migrate existing subscriptions.

It's also a good bargaining chip.


Are there any good Stripe Connect competitors? I haven't found any.

Zoneless is an open-source Stripe Connect alternative

Only Stripe offers a service doing international VAT for you.

Isn't that largely the selling point of every MoR? Paddle, Polar.sh and so on.

>Honest question: how can VCs consider the 'star' system reliable?

Founders need the ability to get traction, so if a VC gets a pitch and the project's repo has 0 stars, that's a strong signal that this specific team is just not able to put themselves out there, or that what they're making doesn't resonate with anyone.

When I mentioned that a small feature I shared got 3k views when I just mentioned it on Reddit, then investors' ears perked right up and I bet you're thinking "I wonder what that is, I'd like to see that!" People like to see things that are popular.

By the way, congrats on 200 stars on your project, I think that is definitely a solid indicator of interest and quality, and I doubt investors would ignore it.


I made this in response to a comment I saw here asking for it, since people are sharing a lot of experiments like 3 GB browser-side Gemma demos. The idea is that you can serve something temporarily just as long as there is interest in it, and once everyone leaves and closes that tab it disappears. This is good for one-off experiments, videos or files, browser-side Linux VM's in wasm, large model demonstrations, anything cool that you made and want to share temporarily. View source to see how easy it is to use.

>can someone make a cdn for it or some sort u uberfast downloader? just throw some claude credits against it ty!

Okay, I did so. I realize that in your later followup comment you might want something different (like for Chrome itself to cache these downloads or something) but for now I made what you asked for, here you go:

https://stateofutopia.com/experiments/ephemeralcdn/

It's an ultrafast temporary CDN for one-off experiments like this. Should be lightning fast. By including the script, you can include any file this CDN serves.


haha this is awesome! this is fantastic.

I love this idea. Unfortunately, it says "Unsupported browser/GPU" for me. This is Desktop Chrome version 147 (page says it requires 134+) and I have a 1060 card with 6 GB of RAM on this specific device, so it should fit. I have more than 4 GB of free RAM as well.

sorry it’s not working for you. I built this as a personal project for self-learning, but I plan to take a look at this issue next weekend. you can check out a video demo of it here: https://github.com/user-attachments/assets/71ae6e5c-a5ec-4d0...

That's amazing. Very good result. Thanks for sharing.

Would you be okay with it using your upload at the same time, then a p2p model would work. (This is potentially a good match for p2p because edge connections are very fast, they don't have to go across the whole Internet). You could be downloading from uploaders in your region. Let me know if you would be okay with uploading at the same time, then this model works and I can build it for you for people to use this way.

what kind of specs does your laptop have? do you know how many tokens/second you get on it?

What kind of hardware does this require to run locally, and how many tokens/seconds does it produce?

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: