More

hinkley · 2026-03-11T08:43:26 1773218606

Wait, hold on.

Are you trying to tell me that a Large LANGUAGE Model is better at text than at pictures? What are you going to tell me next? That the sidewalk is hot on a sunny day?

hinkley · 2026-03-11T01:02:00 1773190920

Accidentally mutating an input is always a 'fun' way to trigger spooky action at a distance.

hinkley · 2026-03-11T01:01:25 1773190885

suggestion: TeDD talk.

hinkley · 2026-03-11T01:00:51 1773190851

BDD helps with this as it can allow you to get the setup out of the tests making it even cheaper for someone to yeet a defunct test.

hinkley · 2026-03-11T00:58:38 1773190718

My best unit tests are 3 lines, one of them whitespace, and they assert one single thing that's in the requirements.

These are the only tests I've witnessed people delete outright when the requirements change. Anything more complex than this, they'll worry that there's some secondary assertion being implied by a test so they can't just delete it.

Which, really is just experience telling them that the code smells they see in the tests are actually part of the test.

meanwhile:

    it("only has one shipping address", ...

is demonstrably a dead test when the story is, "allow users to have multiple shipping addresses", as is a test that makes sure balances can't go negative when we decide to allow a 5 day grace period on account balances. But if it's just one of six asserts in the same massive tests, then people get nervous and start losing time.

hinkley · 2026-03-11T00:36:32 1773189392

It's also delusional.

hinkley · 2026-03-11T00:34:25 1773189265

Hurry up and wait.

hinkley · 2026-03-11T00:10:22 1773187822

The better to make paperclips, my dear!

hinkley · 2026-03-10T19:46:48 1773172008

> After we started hiring, it became a disaster.

When it stopped being two people he still forbade tests. In this decade. That is fucking nuts.

Fun fact: the guy I worked a 2 man project with and I had a rock solid build cycle, and when we got cancelled to put more wood behind fewer arrows, he and I built the entire CI pipeline. On cruisecontrol. And if you don’t know what that is, that is Stone Age CI. Literal sticks and rocks. Was I ahead of a very big curve? You bet your sweet bippy. But that was more than twenty years ago.

oraphalous · 2026-03-10T20:00:48 1773172848

Did anyone here actually look at the product they were actually building? It's an AI agent bug discovery product. Their whole culture is probably driven at a fundamental philosophical level about the problems of bug discovery. As he says: he wanted to rely on dogfooding - using their product as the way of spotting bugs.

That may have been spectactular naivete but it's not insanity.

The point I keep coming back to here that everyone is fighting me so hard on is that these blanket statements of: NO TESTS IS NUTS... absent of an understanding of the business context... is harmful.

hinkley · 2026-03-10T20:10:08 1773173408

What ends up happening is that your most fundamental features end up rotting because manual testing has biases. Chief among them is probably Recency Bias. It is in fact super easy to break a launch feature if it’s not gating any of the features you’re working on now. If you don’t automate those, yes, you’re nuts.

One of the worst ones I ever encountered was learning that someone broke the entire help system three months prior, and nobody noticed. Because developers don’t use the help system. I convinced a team of very skeptical people that E2E testing the help docs was a higher priority than automating testing of the authentication because every developer used that eight times a day or more. In fact on a previous project with trunk based builds, both times I broke login someone came to tell me so before the build finished.

Debugging is about doing cheap tests first to prune the problem space, and slower tests until you find the culprit. Testing often forgets that and will run expensive tests before fast ones. Particularly in the ice cream cone.

In short, if you declare an epic done with zero automation, you’re a fucking idiot.

oraphalous · 2026-03-10T20:34:47 1773174887

I think maybe - this conversation is more about giving some more acknowledgement to the other side of this issue.

It's not that I disagree with you essentially - or particularly with respect to your analysis of your specific examples. 100% in the cases you describe. Those sound like beneficial tests. Particularly because your example SPEAKS to the business case - users were using the help docs (I think you mean users anyway). So yeah - that's important.

But I don't know why it's so hard extracting a simple acknowledgement of what I'm pointing out - specifically that the decisions like implementing tests IS a cost-benefit decision dependent on business context.

Funny you mention auth testing though. One time both me and the tech lead broke one of the auth flows in production within the space of a week of one another. Yep - no tests. Feel free to judge us insane. But here's how we thought about it - and when I say "we" that includes the business. First of all the auth flow was not actually used by any active users, so damage was low. Two man dev team. Complexity up until that point had been low, pre-product market fit, sales were dogshit, and cash had been low for some time. Feature shipping was the 110% priority. Ok - but these bugs were a sign complexity had increased beyond what we could manage without some tests. And given the importance of auth, it was now easy to make the case to leadership that implementing an e2e test suite was worth it. So we did.

If you still think a decision making process like that is insane - because we didn't immediately implement tests for every shipped feature. Well - I just think you're wrong.

hinkley · 2026-03-10T20:51:36 1773175896

There is supposedly a famous video series of Uncle Bob trying and failing to solve sudoku with TDD. He did not read any guides on solving it and tried from first principles instead, and bounced off of it.

It’s clear to me that if you don’t know what you’re building, testing it first has rubber duck value that can easily be overshadowed by Sunk Cost. I always test my pillars - the bits of the problem that are definite and which I will build off of.

Yes, starting with tests without market fit can also be fatal. But calling anything done without tests is just a slower poison. Before you airlift your brain to another unrelated problem you need to codify some of your assumptions. If you’re good at testing you can write them in a manner that makes it easy to delete them when requirements change. But that takes practice a lot of people don’t have because they avoid writing tests or they write the exact same kinds of tests for years at a time without every stretching their skills.

If you’re not writing tests you’re not writing good ones when you do. Testing is part of CI and the whole philosophy of CI is do the painful parts until you either grow callouses or get fed up and file off the scratchy bits. To avoid testing is to forget the face of your father.

oraphalous · 2026-03-10T22:03:06 1773180186

> Yes, starting with tests without market fit can also be fatal. But calling anything done without tests is just a slower poison.

I think we are pretty close to agreement here. I'd be interested in what you have experienced in the realm of front-end testing though - whether you think things are just as cut and dried in that realm (that's another discussion though).

And I'll also accept the point about skill in test writing that improves the cost-benefit analysis. I'll also cop to not having that kind of practiced ability at testing to the level I would personally like. But it's chicken / egg. A lot of folks get their start at scrappy start ups that can't attract the best talent. And just can't afford to let their devs invest in their skills in this way. Hell - even established companies just grind their devs without letting them learn the shit they need to learn.

I feel a victim of this to some degree - and am combating it with time off work which I can afford at the moment. One of the things I'm working on is just understanding testing better - y'know, so I can in the future write a SKILL.md file that tells Claude what sort of tests it should write. lol...

hinkley · 2026-03-10T23:16:29 1773184589

Testing is hard. No, testing is fucking hard. I've had more mentors in testing than any other two disciplines combined. And I still look at my own tests and make faces. But to a man everyone who has claimed testing is not hard has written tests that made me want to push them into traffic.

Every problem is easy if you oversimplify it.

I send people who come to me struggling with their tests away with permission to fail at them but not permission to give up on getting better. You're gonna write broken tests. Just don't keep writing the same broken tests.

If anyone looking for a PhD or book idea is reading along with this, my suspicion is that it's so difficult because we are either 1) fundamentally doing it wrong (in which case there's room for at least 2 more revolutions in testing method) 2) someone will prove mathematically that it's an intractable problem, Gödel-style, and then someone will apply SAT solvers or similar to the problem and call it a day. Property based testing already pretty much does Monte Carlo simulation...

For backend tests, the penalty at each level of the testing pyramid is about 8x cost for a single test (and IME, moving a test down one level takes 5x as many tests for equivalent coverage, so moving a test down 2 layers reduces the CPUtime by half but also allows parallel execution).

For frontend I think that cost is closer to 10x. So you want to push hard as you can to do component testing and extract a Functional Core for unit tests even harder than you do for backend code. Karma is not awesome but is loads better than Selenium, particularly once you have to debug a failing test. I've been on long projects for a minute so I can't really opine on Puppeteer, but Selenium is hot flaky garbage. I wouldn't tolerate more than 100 E2E tests in it, even on a half million line project. Basically a smoke test situation if you use it like that, but you won't have constant red builds on the same eight tests on shuffle.

I want to say we had 47 E2E tests on a project I thought was going swimmingly from a SDLC perspective. But it might have been 65.

oraphalous · 2026-03-11T01:29:28 1773192568

Great comment... and I feel after reading it, you're probably a pretty great person to work with. Acknowledging the pain / difficulty of a task, but inspiring folks, and giving them the opportunity to perservere is rare to find these days.

Jtsummers · 2026-03-10T21:00:12 1773176412

Not Bob Martin, sudoku with TDD was Ron Jeffries.

https://news.ycombinator.com/item?id=3033446 - Linking to this old comment because it links to each of Ron's articles, a discussion about it, and Norvig's version.

hinkley · 2026-03-10T23:02:10 1773183730

Hah! That's probably why I can never find it.

Norvig's solution is a work of art. When people ask me for examples of intrinsic versus accidental complexity, his sudoku solver is the best one I have. My only note is that he gives up and goes brute force early. When I first encountered it I had a lot of fun layering other common solving strategies on top of his base without too much extra trouble.

What I did not have fun with is porting it to elixir. That was a long journey to get to a solution I could keep adding stuff too. Immutable data is rough, particularly when you're maintaining 4 distinct views on the same data.

hinkley · 2026-03-10T19:40:00 1773171600

I ran into some serious struggles when we got far enough into accepting most of the tenets of XP as standard practice that most jobs didn’t even debate half of them and then landed at places that still thought they were stupid. I’d taken for granted I wasn’t going to have to fight those fights and forgotten how to debate them. Because I Said So is not a great look.