Every time I say I don't see the productivity boost from AI, people always say I'm using the wrong tool, or the wrong model. I use Claude with Sonnet, Zed with either Claude Sonnet 4 or Opus 4.6, Gemini, and ChatGPT 5.2. I use these tools daily and I just don't see it.
The vampire in the room, for me, seems to be feeling like I'm the only person in the room that doesn't believe the hype. Or should I say, being in rooms where nobody seems to care about quality over quantity anymore. Articles like this are part of the problem, not the solution.
Sure they are great for generating some level of code, but the deeper it goes the more it hallucinates. My first or second git commit from these tools is usually closer to a working full solution than the fifth one. The time spent refactoring prompts, testing the code, repeating instructions, refactoring naive architectural decisions and double checking hallucinations when it comes to research take more than the time AI saves me. This isn't free.
A CTO this week told me he can't code or brainstorm anymore without AI. We've had these tools for 4 years, like this guy says - either AI or the competition eats you. So, where is the output? Aside from more AI-tools, what has been released in the past 4 years that makes it obvious looking back that this is when AI became available?
Many engineers get paid a lot of money to write low-complexity code gluing things together and tweaking features according to customer requirements.
When the difficulty of a task is neatly encompassed in a 200 word ticket and the implementation lacks much engineering challenge, AI can pretty reliably write the code-- mediocre code for mediocre challenges.
A huge fraction of the software economy runs on CRUD and some business logic. There just isn't much complexity inherent in any of the feature sets.
Complexity is not where the value to the business comes from. In fact, it's usually the opposite. Nobody wants to maintain slop, and whenever you dismiss simplicity you ignore all the heroic hard work done by those at the lower level of indirection. This is what politics looks like when it finally places its dirty hands on the tech industry, and it's probably been a long time coming.
As annoying as that is, we should celebrate a little that the people who understand all this most deeply are gaining real power now.
Yes, AI can write code (poorly), but the AI hype is now becoming pure hate against the people who sit in meetings quietly gathering their thoughts and distilling it down to the simple and almost poetic solutions nobody else but those who do the heads down work actually care about.
> A huge fraction of the software economy runs on CRUD and some business logic.
You vastly underestimate the meaning of CRUD applied in such a direct manner. You're right in some sense that "we have the technology", but we've had this technology for a very long time now. The business logic is pure gold. You dismiss this not realizing how many other thriving and well established industries operate doing simple things applied precisely.
Most businesses can and many businesses do run efficiently out of shared spreadsheets. Choosing the processes well is the hard part, but there's just not much computational complexity in the execution, nor more data than can be easily processed by a single machine.
That's a false dilemma. If that's what you want, you absolutely can use the AI levers to get more time and less context switching, so you can focus more on the "simple and poetic solutions".
I am with you on this, and you can't win, because as soon as you voice this opinion you get overwhelmed with "you dont have the sauce/prompt" opinions which hold an inherent fallacy because they assume you are solving the same problems as them.
I work in GPU programming, so there is no way in hell that JavaScript tools and database wrapper tasks can be on equal terms with generating for example Blackwell tcgen05 warp-scheduled kernels.
There's going to be a long tail of domain-specific tasks that aren't well served by current models for the foreseeable future, but there's also no question the complexity horizon of the SotA models is increasing over time. I've had decent results recently with non-trivial Cuda/MPS code. Is it great code/finely tuned? Probably not but it delivered on the spec and runs fast enough.
I have done it, its not GPU-code, you are optimizing a toy compiler for a fictional framework. There is some SIMD mechanics but you cant call it GPU. There is a lot of such real challenges though - KernelBench, Project Popcorn, FlashInfer, Wafer, Standard Kernel.
Yeah, the argument here is that once you say this, people will say "you just dont know how to prompt, i pass the PTX docs together with NSight output and my kernel into my agent and run an evaluation harness and beat cuBLAS". And then it turns out that they are making a GEMM on Ampere/Hopper which is an in-distribution problem for the LLMs.
It's the idea/mindset that since you are working on something where the tool has a good distribution, its a skill issue or mindset problem for everyone else who is not getting value from the tool.
Another thing I've never got them to generate is any G code. Maybe that'll be in the image/3d generator side indirectly, but I was kind of hoping I could generate some motions since hand coding coordinates is very tedious. That would be a productivity boost for me. A very very niche boost, since I rarely need bespoke G code, but still.
Oh HELL no. :P Gcode is (at least if you’re talking about machining) the very definition of something you want to generate analytically using tried and tested algorithms with full consideration taken for the specifics of the machine and material involved.
I guess if you just want to use it to wiggle something around using a stepper motor and a spare 3D printer control board, it might be OK though. :)
I also don’t believe the hype. The boosters always say I would believe if I were to just experience it. But that’s like saying all I have to do is eat a hamburger to experience how nutritious it is for me.
I love hamburgers, and nothing in my experience tells me I shouldn’t eat them every day. But people have studied them over time and I trust that mere personal satisfaction is insufficient basis for calling hamburgers healthy eating.
Applied to AI: How do you know you have “10x’d?” What is your test process? Just reviewing the test process will reverse your productivity! Therefore, to make this claim you probably are going on trust.
I you have 10x the trust, you will believe anything.
I don't understand what including the time of "4 years" does for your arguments here. I don't think anyone is arguing that the usefulness of these AIs for real projects started at GPT 3.5/4. Do you think the capabilities of current AIs are approximately the same as GPT 3.5/4 4 years ago (actually I think SOTA 4 years ago today might have been LaMDA... as GPT 3.5 wasn't out yet)?
> I don't think anyone is arguing that the usefulness of these AIs for real projects started at GPT 3.5/4
Only not in retrospect. But the arguments about "if you're not using AI you're being left behind" did not depend on how people in 2026 felt about those tools retrospectively. Cursor is 3 years old and ok 4 years might be an exaggeration but I've definitely been seeing these arguments for 2-3 years.
Yeah. I started integrating AI into my daily workflows December 2024. I would say AI didn't become genuinely useful until around September 2025, when Sonnet 4.5 came out. The Opus 4.5 release in November was the real event horizon.
> I use these tools daily and I just don't see it.
So why use them if you see no benefit?
You can refuse to use it, it's fine. You can also write your code in notepad.exe, without a linter, and without an Internet connection if you want. Your rodeo
I didn't say I see no benefit, I said I don't see the productivity boost people talk about. I conceded they are good for some things, and presumably that's what I use them for.
> You can refuse to use it, it's fine
Where do you work? Because increasingly, this isn't true. A lot of places are judging engineers by LoC output like it's 2000, except this time the LoC has to come from AI
I have the same experience and still use it. It's just that I learned to use it for simplistic work. I sometimes try to give it more complex tasks but it keeps failing. I don't think it's bad to keep trying, especially as people are reporting insane productivity gains.
After all, it's through failure that we learn the limitations of a technology. Apparently some people encounter that limit more often than others.
> I have the same experience and still use it. It's just that I learned to use it for simplistic work.
OP said "I don't see the productivity boost from AI" and that they don't "believe the hype" without any qualification, but then went on to say that they use it every day. This makes no sense to me.
Isn't this like saying "I don't get anything out of reading books" immediately followed by "I read books for 4 hours every night"?
to be fair, every other comment is usually screaming about how if you aren't able to utilize LLMs effectively, you will be without a job soon. most people want to keep their job, or be employable, so if LLMs are a required tool to know, they're trying to become fluent in it by using it.
> to be fair, every other comment is usually screaming about how if you aren't able to utilize LLMs effectively, you will be without a job soon
I think a lot of these "it's all overhyped crap" posts are a hypocritical.
If someone wants to be consistent with their "it's crap" argument, they wouldn't be using it for anything. Period.
If someone says they need it for their job, then they are admitting that it's useful for their job. Because it would otherwise be irrational to use a tool that makes them worse at their job.
Perhaps the out of job prediction is actually reversed. True, LLMs will become an efficiency increasing tool. But in terms of job security, doesn't that mean that if your whole job can be driven by an LLM then demand for that job decreases?
In other words, people claiming these high productivity increases may be the ones at actual risk. Why employ 3 people when 1 can write the prompts?
1. Copy-pasting existing working code with small variations. If the intended variation is bigger then it fails to bring productivity gains, because it's almost universally wrong.
2. Exploring unknown code bases. Previously I had to curse my way through code reading sessions, now I can find information easily.
3. Google Search++, e.g. for deciding on tech choices. Needs a lot of hand holding though.
... that's it? Any time I tried doing anything more complex I ended up scrapping the "code" it wrote. It always looked nice though.
>> 1. Copy-pasting existing working code with small variations. If the intended variation is bigger then it fails to bring productivity gains, because it's almost universally wrong.
This does not match my experience. At all. I can throw extremely large and complex things at it and it nails them with very high accuracy and precision in most cases.
Here's an example: when Opus 4.5 came out I used it extensively to migrate our database and codebase from a one-Postgres-schema-per-tenant architecture to a single schema architecture. We are talking about eight years worth of database operations over about two dozen interconnected and complex domains. The task spanned migrating data out of 150 database tables for each tenant schema, then validating the integrity at the destination tables, plus refactoring the entire backend codebase (about 250k lines of code), plus all of the test suite. On top of that, there were also API changes that necessitated lots of tweaks to the frontend.
This is a project that would have taken me 4-6 months easily and the extreme tediousness of it would probably have burned me out. With Opus 4.5 I got it done in a couple of weeks, mostly nights and weekends. Over many phases and iterations, it caught, debugged and fixed its own bugs related to the migration and data validation logic that it wrote, all of which I reviewed carefully. We did extensive user testing afterwards and found only one issue, and that was actually a typo that I had made while tweaking something in the API client after Opus was done. No bugs after go-live.
So yeah, when I hear people say things like "it can only handle copy paste with small variations, otherwise it's universally wrong" I'm always flabbergasted.
Interesting. I've had it fail on much simpler tasks.
Example: was writing a flatbuffers routine which translated a simple type schema to fbs reflection schema. I was thinking well this is quite simple, surely Opus would have no trouble with it.
Output looked reasonable, compiled.. and was completely wrong. It seemed to just output random but reasonable looking indices and offsets. It also inserted in one part of the code a literal TODO saying "someone who understands fbs reflection should write this". Had to write it from scratch.
Another example: was writing a fuzzer for testing a certain computation. In this case, there was existing code to look at (working fuzzers for slighly different use cases), but the main logic had to be somewhat different. Opus managed to do the copy paste and then messed up the only part where it had to be a bit more creative. Again, showing the limitation of where it starts breaking. Overall I actually considered this a success, because I didn't have to deal with the "boring" bit.
Another example: colleague was using Claude to write a feature that output some error information from an otherwise completely encrypted computation. Claude proceeded to insert a global backdoor into the encryption, only caught in review. The inserted comments even explained the backdoor.
I would describe a success story if there was one. But aside from throwing together simple react frontends and SQL queries (highly copy-pasteable recurring patterns in the training set) I had literally zero success. There is an invisible ceiling.
I find LLMs to be absolutely worst at "take this content and put (a copy) there" tasks. They slightly subtly mutate the content while doing that! I keep having to e.g. restore some explanatory comments.
I'm an AI hipster, because I was confusing engagement for productivity before it was cool. :P
TFA mentions the slot machine aspect, but I think there are additional facets: The AI Junior Dev creates a kind of parasocial relationship and a sense of punctuated progress. I may still not have finished with X, but I can remember more "stuff" happening in the day, so it must've been more productive, right?
Contrast this to the archetypal "an idea for fixing the algorithm came to me in the shower."
What things (languages etc.) do you work with/on primarily?
I don't know what to say, except that I see a substantial boost. I generally code slowly, but since GPT-5.1 was released, what would've taken me months to do now takes me days.
Admittedly, I work in research, so I'm primarily building prototypes, not products.
> The vampire in the room, for me, seems to be feeling like I'm the only person in the room that doesn't believe the hype. Or should I say, being in rooms where nobody seems to care about quality over quantity anymore.
If in real life you are noticing the majority of peers that you have rapport with tending towards something that you don't understand, it usually isn't a "them" problem.
It's something for you to decide. Are you special? Or are you fundamentally missing something?
To say I'm the only one is an exaggeration, it's probably more around 50/50 with the 50% in the pro camp being very vocal to the point where it's almost insulting. Being given basic tasks (like, find the last modified file) with "use Claude to get the command" said straight after.
I perfectly accept that it might be a me problem, and this is why I keep exposing myself to these tools, I try to find how they can help me, and I do see it, I just feel like a lot of people ignore the ways these tools harm productivity (and here I mean directly, not some vague "you'll get worse at learning").
I accept your point, and I do take it to heart, and I do keep wondering if I'm missing something
The vampire in the room, for me, seems to be feeling like I'm the only person in the room that doesn't believe the hype. Or should I say, being in rooms where nobody seems to care about quality over quantity anymore. Articles like this are part of the problem, not the solution.
Sure they are great for generating some level of code, but the deeper it goes the more it hallucinates. My first or second git commit from these tools is usually closer to a working full solution than the fifth one. The time spent refactoring prompts, testing the code, repeating instructions, refactoring naive architectural decisions and double checking hallucinations when it comes to research take more than the time AI saves me. This isn't free.
A CTO this week told me he can't code or brainstorm anymore without AI. We've had these tools for 4 years, like this guy says - either AI or the competition eats you. So, where is the output? Aside from more AI-tools, what has been released in the past 4 years that makes it obvious looking back that this is when AI became available?