More

tamnd · 2026-03-19T10:52:53 1773917573

That’s a silly bug in the "dynamic" README, fixing it now.

tamnd · 2026-03-19T10:51:48 1773917508

5 minutes is the sweet spot for many (enterprise) data pipelines too, in my experience.

tamnd · 2026-03-19T10:50:17 1773917417

Will do. The code is currently messy, bundled with 3.6M LOC across 100+ other tools.

tamnd · 2026-03-19T10:49:21 1773917361

"Near real-time" already covers almost 99% of production data needs.

If you need fresher data, let me know. I will open source the whole pipeline later.

tamnd · 2026-03-19T10:47:48 1773917268

I intentionally kept it lightweight. Just Parquet files + simple partitioning + commits on Hugging Face. That already covers most of what I need, without introducing a heavier stack or extra dependencies.

Also, I wanted something that is easy to consume anywhere. With this setup, you can point DuckDB or Polars directly at the data and start querying, no catalog or special tooling required.

tamnd · 2026-03-19T10:44:20 1773917060

[Author here] The whole pipeline runs on a single ~$10/month VPS, but it can process hundreds of TB even with just 12GB RAM and a 200GB SSD.

The main reason I built this was to have HN data that is easy to query and always up to date, without needing to run your own pipeline first. There are also some interesting ideas in the pipeline, like what I call "auto-heal". Happy to share more if anyone is interested :)

A lot of the choices are trade-offs, as usual with data pipelines. I chose Parquet because it is columnar and compressed, so tools like DuckDB or Polars can read only the columns they need. This matters a lot as the dataset grows. I went with Hugging Face mainly because it is simple and already handles distribution and versioning. I can just push data as commits and get a built-in history without managing extra infrastructure (and, more conveniently, if you read the README, you can query it directly using Python or DuckDB).

The pipeline is incremental. Instead of rebuilding everything, it appends small batches every few minutes using the API. That keeps it fresh while staying cheap to run. The data is also partitioned by time, so queries do not need to scan the entire dataset (and I use very simple tech, just a Go binary running in a "screen" session, using only a few MB of RAM for the whole pipeline).

asalahli · 2026-03-19T13:37:37 1773927457

Where are you getting a ~$10/month VPS with 12GB RAM from?

tamnd · 2026-03-19T13:50:23 1773928223

https://contabo.com/en/vps/cloud-vps-20/ - $8/month, 6 vCPU, 12 GB RAM, 200 GB SSD (or Hetzner servers, which offer good hourly pricing).

In my ongoing project, with 10 servers like this, I could index the large part internet (about 10 billion pages) using vector and full-text search.

tamnd · 2026-03-19T10:35:22 1773916522

@keepamovin thanks, your project was a big inspiration for this.

I built my own pipeline with a slightly different setup. I use Go to download and process the data, and update it every 5 minutes using the HN API, trying to stay within fair use. It is also easy to tweak if someone wants faster or slower updates.

One part I really like is the "dynamic" README on Hugging Face. It is generated automatically by the code and keeps updating as new commits come in, so you can just open it and quickly see the current state.

The code is still a bit messy right now (I open sourced it together with around 3.6M lines across 100+ other tools, hidden in a corner of GitHub, anyone interested can play Sherlock Holmes and find it :) ), but I will clean it up, and open source as clearer new repository and write a proper blog post explaining how it works.

keepamovin · 2026-03-19T13:44:10 1773927850

Wow tamnd that is lovely to hear. I’m so glad you told me it was an inspiration.

Your big download plus quick refreshes is smart. Is your Background in data/AI?

Because i don’t know much about huggingface beyond its a hub for that.

tamnd · 2026-03-19T13:57:57 1773928677

Connecting directly with the author of the project that inspired me is awesome.

Let's collaborate and see how we can make our two projects work together. DuckDB has a feature that can write to SQLite: https://duckdb.org/docs/stable/core_extensions/sqlite. Starting from Parquet files, we could use DuckDB to write into SQLite databases. This could reduce ingress time to around five minutes instead of a week.

If I have some free time this weekend, I would definitely like to contribute to your project. Would you be interested?

As for my background, I focus on data engineering and data architecture. I help clients build very large-scale data pipelines, ranging from near real-time systems (under 10 ms) to large batch processing systems (handling up to 1 billion business transactions per day across thousands of partners). Some of these systems use mathematical models I developed, particularly in graph theory.

Happy to chat.

keepamovin · 2026-03-19T14:09:09 1773929349

One of the things that i got interested in from the comments on my show was parquet. Everyone raving about it. Happy to see a project using that today.

Would be happy to connect more :)

keepamovin · 2026-03-20T02:55:03 1773975303

If you have time I welcome your contribution.

tamnd · 2026-02-16T09:07:01 1771232821

My AI answer: https://ai.go-mizu.workers.dev/thread/4dmp7n9g

--- You should drive the car to the car wash, but you should walk yourself.

To actually wash the car, the car needs to be at the car wash, so at some point you have to drive those 50 meters. A sensible approach is:

- Drive the car the 50 meters to the wash bay. - Park or queue as required, then get out and do the wash. - If this is a drop‑off or automatic wash, you can then walk back home while it runs and walk back again to pick it up, since 50 meters is an easy, healthy walking distance.

tamnd · 2026-01-29T16:53:38 1769705618

Another "vibe" coding-as-a-service? https://news.ycombinator.com/item?id=46781516

tamnd · 2026-01-27T17:12:05 1769533925

https://www.linkedin.com/in/nick-kuntz-61551869/

DevSecOps Engineer United States Army Special Operations Command · Full-time

Jun 2022 - Jul 2025 · 3 yrs 2 mos

Honestly, it is a little scary to see someone with a serious DevSecOps background ship an AI project that looks this sloppy and unreviewed. It makes you question how much rigor and code quality made it into their earlier "mission critical" engineering work.

alex_sf · 2026-01-27T17:54:43 1769536483

Tbf, there is no one with a ‘serious DevSecOps background’. It’s an incredibly strong hint that the person is largely a goof.

esseph · 2026-01-27T18:13:10 1769537590

Maybe, but the group of people they are/were working with are Extremely Serious, and Not Goofs.

This person was in communications of the 160th Special Operations Aviation Regiment, the group that just flew helicopters into Venezuela. ... And it looks like a very unusual connection to Delta Force.

godelski · 2026-01-28T02:10:47 1769566247

Considering how many times I've heard "don't let perfection be the enemy of good enough" when the code I have is not only incomplete but doesn't even do most of the things asked (yet), I'd wager quite a lot