More

tieTYT · 2026-05-09T20:21:38 1778358098

Before I read some of the study, I thought that was relevant too, but each "step [was] conducted as an independent, single-turn session."

rao-v · 2026-05-12T00:18:39 1778545119

Good point!

tieTYT · 2026-05-09T20:07:42 1778357262

> a human will DO worse then a 25% degradation

As I was reading this article, a similar thought occurred to me: "I wonder if that's better or worse than a human?" Unfortunately, there was no human baseline in this study. That said, there are studies that compare LLM to human performance. Usually, humans perform much better (like 5-7x better) at long-running tasks.

In other words, a human would probably do better than an LLM on this task.

Humans lose to LLMs in narrow, well-specified text/symbolic reasoning tasks where the model can exploit breadth, speed, and search. Usually, the LLM performed ~15% better than humans, but I saw studies that were as high as 80%. To my surprise, these studies were usually about "soft skills" like creativity and persuasion.

threethirtytwo · 2026-05-09T20:54:30 1778360070

You can do a baseline study right now. Read this entire thread and make an edit of changing every E to an I.

Show your edit by regurgitating this entire thread by hand on a paper. Don't use any additional tools like Find and replace.

Boom there's your baseline. I can simulate the result in my head.

Guys I'm basically saying the experiment is innaccurate to the practical reality of how LLMs are actually used.

tieTYT · on Nov 8, 2017

I'm interested in hearing more details. You ever write blog articles that go into more depth?

chatmasta · on Nov 8, 2017

At some point I will (the story is pretty crazy) but for now I’d like to keep certain details of my life private, so I avoid blogging. I used to think I’d need a blog to attract consulting clients, but I’ve had no problem without one so far, and thus I haven’t gotten around to it. My personal website is basically Lorem Ipsum lol.

If you’re really interested, shoot me an email. I’m happy to talk one on one.

tieTYT · on Oct 31, 2017

> Given that you've at minimum doubled the code (and doubled the bugs), it seems like a really bad long-term trade off.

I'd say you've at maximum doubled the code. The test ensures you write only what you need to get the test pass. Without them, devs get distracted and wander until the feature works. Usually distracting themselves with tons of YAGNI violations along the way. In my experience, untested code bases have a ton of unnecessary code.

I don't understand how it would double the bugs. The article has references saying it reduces them. But, even thinking about it, I don't see why you'd say that.

tieTYT · on Oct 31, 2017

I'm not hearing anything negative here. Why is any of this innately bad?

Tests shaping your code is bad, because? Because you already have "enough constraints"? Sorry, I don't get it.

tieTYT · on Oct 31, 2017

Maybe it's only difficult to test end-to-end? I would assume there's code in there that's algorithm-like. Give it these inputs, it should return these outputs. If so, it should be possible to isolate that code and test it.

But, I dunno, I haven't looked at the source code. It might be very difficult to maintain.

tieTYT · on July 6, 2017

> Why does this matter? It matters for the exact same reason why memory leaks are bad in general: They consume more memory than necessary.

Thing is, this doesn't usually matter. I have never gotten an out of memory error from a leak in Java. Now compare that to all the development time I've saved by not having to deal with pointer arithmetic. I consider it a huge win. It's all about the type of apps you're making.

majewsky · on July 6, 2017

When I play Minecraft on my notebook, I first shut down all nonessential system services (I have a handy line in the shell history for that). This allows me to get around 45 minutes out of Minecraft (instead of 30 minutes) before it gets struck by the OOM killer.

khedoros1 · on July 6, 2017

Hmm, weird. I've never had Minecraft killed for OOM, since starting to play in the alpha days. In fact, I can watch the memory pool slowly grow, then a drop in framerate and increase in free memory when the GC is triggered every 30 seconds or so.

I don't do anything special to play; my sessions used to run into multiple hours, during my peak years of play.

I suspect that it's not fair to blame Java for whatever problem you were having, though.

tieTYT · on July 6, 2017

I've used tons of programs that have problems and crash for various reasons. Is this an argument against the language? I don't think there'd be any left to use with this line of thinking.

Besides, there's too many variables in your anecdote. Is it a laptop from 1995? Is the OOM from a bug that could/should be fixed?

majewsky · on July 7, 2017

The notebook is from 2012 and has 4GB RAM. Minecraft stands out because most other programs of similar complexity (e.g. Portal 2) work fine.

I've heard a story that Minecraft's RAM consumption got a lot worse after Notch stepped down. The new developers refactored the code for OOP best practices (such as passing a 3D coordinate as an object rather than "int x, int y, int z"), which tremendously increased the number of allocations and thus GC pressure and memory usage. So it's fair IMO to blame this to a language. Having good practices lead to such consequences is terrible design.

detaro · on July 7, 2017

Have you tried limiting the heap size with -Xmx <size>? If it has a bad (too large for your system) default it might grow to much before running GC.

AstralStorm · on July 6, 2017

You haven't used anything nontrivial then. Say, IntelliJ eating 2 GB of RAM, quite a lot for a glorified editor...

seanmcdirmid · on July 6, 2017

Just because IntelliJ is using 2GB of RAM doesn't mean they have a memory leak. As long as that 2GB profile is stable, it isn't going to bring down the system.

tieTYT · on July 6, 2017

I didn't build that. How are you so sure it would have less leaks if it were written in a language like C++? I'd bet it would be worse.

tieTYT · on July 5, 2017

> Place your test files next to the implementation.

Java background is probably biasing me, but I don't like this. It interferes with my ability to find code because the file list in every directory is twice as large.

What are the benefits of putting it together?

dntrkv · on July 5, 2017

If you are creating a separate directory for each "module" then it would just be one more file in each directory.

Component

- index.js

- Component.js

- component.scss

- Component.spec.js

This structure has worked really well for me, you have everything you need in a single directory so you don't have to jump around to another directory to find the test or styling or whatever.

BinaryIdiot · on July 5, 2017

Putting them together is AMAZING. So when I used to do Java development I thought the organization was great but every time I had to find a unit test I had to dig through a folder structure.

Then I tried GO. GO puts them together. I was amazed at how such a simple concept could keep my code so much better organized. I could immediately open both the code and the unit tests for said code. Just don't put a ton of files in any one directory (if you have a lot of files in a directory you're actively coding it I'd argue your file structure is not optimal).

Now when I do JavaScript development I have adopted 3 extensions:

- .aspec.js is for unit tests that should work in all environments

- .nspec.js is for unit tests that only work in node.js

- .cspec.js is for unit tests that only work on a client like a web browser

If you're board you can poke at some of my open source projects that use this pattern https://github.com/KrisSiegel/msngr.js

epicide · on July 5, 2017

Keeps things that are coupled closer together. This is beneficial for several reasons:

a) forces parity between your code structure and test structure.

b) restructuring takes potentially half the time.

c) easily see and navigate to tests for a file.

d) less named things in your project since you aren't duplicating your structure.

If your directories are getting too lengthy to look at with tests in them, you should restructure them.

Edit: formatting and additional reason.

jaywunder · on July 5, 2017

As a node developer I agree with you, I think the benefit with nice might be that relative paths are shorter in node imports?

tieTYT · on June 29, 2017

> TDD is for unit tests

Never heard anyone say this. Kent Beck has said the opposite on podcasts.

eropple · on June 29, 2017

I think it probably depends on what your definition of "unit" is. As I stress functional units that encapsulate logic and data model but don't permute state, my units are probably significantly larger (and simpler) than those of people writing more imperative/traditional-OO units (where something under TDD may encapsulate many units due to mounting complexity/complication and so require further decomposition).

tieTYT · on June 29, 2017

That's not how you're supposed to do it. Write one test, get it to pass, repeat.

taeric · on June 30, 2017

Right. In that, you still write all of the tests before the code. If you are objecting only to the rhetoric of "wall" of tests. That just depends on your size of units. Think of it more as hurdles of tests. :)

tieTYT · on June 30, 2017

> write all of the tests before the code

Everyone is interpreting this as, "write 10 tests then try to get them all to pass at once". That is not how you TDD. You write one, then get it to pass, then write another.

Maybe you mean, "write the test before the code", but when you say "write all tests before the code", it's not interpreted the same way.