In my experience, it’s even more effort to get good code with an agent-when writing by hand, I fully understand the rationale for each line I write. With ai, I have to assess every clause and think about why it’s there. Even when code reviewing juniors, there’s a level of trust that they had a reason for including each line (assuming they’re not using ai too for a moment); that’s not at all my experience with Codex.
Last month I did the majority of my work through an agent, and while I did review its work, I’m now finding edge cases and bugs of the kind that I’d never have expected a human to introduce. Obviously it’s on me to better review its output, but the perceived gains of just throwing a quick bug ticket at the ai quickly disappear when you want to have a scalable project.
There is demand for non scalable, not committed to be maintained code where smaller issues can tolerated. This demand is currently underserved as coding is somewhat expensive and focused on critical functions.
We have a somewhat complicated OpenSearch reindexing logic and we had some issue where it happened more regularly than it should. I vibecoded a dashboard visualizing in a graph exactly which index gets reindexed when and into what. Code works, a little rough around the edges. But it serves the purpose and saved me a ton of time
Another example, in an internal project we made a recent change where we need to send specific headers depending on the environment. Mostly GET endpoint where my workflow is checking the API through browser. The list of headers is long, but predetermined. I vibecoded an extension that lets you pick the header and allows me to work with my regular workflow, rather than Postman or cURL or whatever. A little buggy UI, but good enough. The whole team uses it
I'm not a frontend developer and either of these would take me a lot of time to do by hand
I've been in these situations before. If there's a known bug in an internal tool that would take the development team a day to investigate and fix - aka $10,000s - it's often smarter to send around an email saying "don't click the Froople button more than once, and if you do tell Benjamin and he'll fix it in the database for you".
Of course LLMs change that equation now because the fix might take a few minutes instead.
> If there's a known bug in an internal tool that would take the development team a day to investigate and fix - aka $10,000s - it's often smarter to send around an email saying "don't click the Froople button more than once, and if you do tell Benjamin and he'll fix it in the database for you".
How much will Benjamin's time responding to those calls cost in the long run?
Hopefully none, because your staff will read the email and not click the button more than once.
Or one of them will do it, Benjamin will glare at them and they'll learn not to do it again and warn their coworkers about it.
Or... Benjamin will spend a ton of time on this and use that to successfully argue for the bug to get fixed.
(Or your organization is dysfunctional and ends up wasting a ton of money on time that could have been saved if the development team had fixed the bug.)
Large companies are often very bad at organizing work, to the tune of increasing the cost of everything by a large multiple over what you'd think it should be. Most of that cost wouldn't be productive developer time.
You are setting up to say "I wouldn't tolerate that" for any example given, but if you look at the market and what makes people actually leave, instead of what makes people complain, then basically anything that isn't life-and-death, safety critical, big-money-losing, or data corrupting is tolerable. There's plenty of complaints about Microsoft, Apple, Gmail, Android, and all kinds of 3rd party niche business systems.
All the decades people tolerated blue-screens on Windows. All the software which regularly segfaulted years ago. The permeation of "have you tried turning it off and on again" into everyday life. The "ship sooner, patch later" culture. The refusal to use garbage collected or memory managed languages or formal verification over C/C++/etc because some bugs are more tolerable than the cost/effort/performance costs to change. Display and formatting bugs, e.g. glitches in video games. When error conditions aren't handled - code that crashes if you enter blank parameters. Bugs in utility code that doesn't run often like the installer.
One software I installed yesterday told me to disable some Windows services before the install, then the installer tried to start the services at the end of the install and couldn't, so it failed and finished without finishing installing everything. This reminded me that I knew about that, because that buggy behaviour has been there for years and I've tripped over it before; at least two major versions.
Another one I regularly update tells me to close its running processes before proceding with the install, but after it's got to that state, it won't let me procede and it has no way to refresh or rescan to detect the running process has finished. That's been there for years and several major versions as well.
One more famous example is """I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you’re leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.""" - Rasmus Lerdorf, creator of PHP. I've a feeling that was admitted about 37 Signals and Basecamp, it was common to restart Ruby-on-Rails code frequently, but I can't find a source to back that up.
You need to have the AI write an increasingly detailed design and plan about what to code, assess the plan and revise it incrementally, then have it write code as planned and assess the code. You're essentially guiding the "Thinking" the AI would have to perform anyway. Yes, it takes more time and effort (though you could stop at a high-level plan and still do better than not planning at all), but it's way better than one-shotted vibe code.
It shouldn't be any longer than the actual code, just have it write "easy pseudocode" and it's still something that you can audit and have it translate into actual coding.
TDD is a great way to start the plan, stubbing things it needs to achieve with E2E tests being the most important. You still need to read through them so it won't cheat, but the codebase will be much better off with them than without them.
We'd just be overloading "lessons" as well, and even more so because it takes more work to ground the concept, given its larger semantic distance from what we're describing.
> Even when code reviewing juniors, there’s a level of trust that they had a reason for including each line (assuming they’re not using ai too for a moment)
Even my seniors are just copy pasting out whatever Claude says. People are naturally lazy, even if they know what they’re doing they don’t want to expend the effort.
I hear you, but it seems quicker to predict whether the agent's solution is correct/sound before running it than to compose and "start" coding yourself. Understanding something that's already there seems like less effort. But I guess it highly depends on what you are doing and its level of complexity and how much you're offloading your authority and judgment.
MacOS definitely has its issues but this just makes it sound like you have different expectations of how an OS should work. Different isn’t always bad. Hiding applications is a pretty key concept in MacOS. Shortcuts are pretty straightforward? Cmd+H to hide, Cmd+Q to quit. Spaces aren’t hidden- there’s lots of ways to access them, but it seems you haven’t bothered to learn them. In your example pressing ctrl+right would have switched the first full screen space. You could also have right clicked the Chrome icon in the dock for a list of windows.
BTW the dock doesn’t have to be hidden, and idk if it was a typo but alt+tab isn’t a default shortcut. Command is the key used for system shortcuts, so maybe you should have tried that? Like yeah it’s different but that doesn’t make it bad. If you been using it for 10 years without figuring that out…
—-
I’m with you on the 1st party apps though, and the stupid corners on Tahoe.
I call it "alt tab" because that's how my brain maps the keyboard. The reality is simple - I struggled going from Windows to Ubuntu about 20 years ago but ultimately made it to the other side knowing how to use both well. With macs, I didn't. 10 years later and all of my adaptations are to avoid the operating system. In 10 years the main thing I've learned is how to get myself out of a jam and stick to the parts of the OS that don't feel like shit. I mean, it's not like I haven't learned these things, I know how to gesture, I know how to exit full screen, etc, it's not like I didn't ever learn, I'm explaining that the experience was dog shit.
Anyone is free to claim that I just didn't try, or didn't give it a fair shake, or perhaps I'm just some idiot who doesn't know computers or whatever.
Maybe I just think an OS should work differently, but okay? I've never said that I have some sort of access to a platonic ideal of objective operating systems and that macs don't meet it. I'm saying that I think it's bad and I gave examples of why. And I think I can easily appeal to my experiences seeing others use the OS - I don't think they find anything you're talking about appealing either.
> Hiding applications is a pretty key concept in MacOS. Shortcuts are pretty straightforward? Cmd+H to hide, Cmd+Q to quit. Spaces aren’t hidden- there’s lots of ways to access them, but it seems you haven’t bothered to learn them.
They're not talking about Cmd+H hiding or virtual desktops - those exist on Windows too. The issue is how macOS handles window placement with zero visual feedback.
For example, when you open a new window from a fullscreen app, it just silently appears on another space. No indicator, no notification. You're left guessing whether it even opened and where it went. The placement depends on arcane rules about space layout, fullscreen ordering, and external displays - and it's basically random half the time. You either memorize the exact behavior or manually search through all your spaces.
Gotta say as a Swift dev I agree—followed the link the to Translate docs and was pleasantly surprised to see a discussion section clearly explaining the usage, which is not always the case for Apple APIs! But this wasn’t really just an article about the API. It was about the complexity of trying to build on the stack of Swift/SPM/ParseableCommand/Foundation/Concurrency/Translation without having a good grasp of any of them. I was frustrated reading it, but I think it does point to the underlying knowledge that’s needed to be proficient at something like this. None of it is a particular indictment of Swift as an ecosystem (though there are lots of valid criticisms)-it’s just the nature of development and something that’s massively eroded by relying too much on these ghosts
As a Swift dev, I have to say this was a frustrating read.
Apple’s documentation is often very poor, and I will note that Swift Packages (especially CLIs) doesn’t always feel great. As another commenter noted, anything other than Xcode feels like fighting an uphill battle.
But many of your frustrations could be solved by checking not API docs, but just the Swift language guide. You seem perturbed, for example, that the Package initializer expects ordered arguments. It is a basic part of Swift’s design that arguments are always ordered and exclusively are either named or unnamed (never optionally both).
The ghost’s use of semaphores with async/await is a massive red flag in terms of mixing two asynchronous frameworks (Concurrency and GCD). I’d not be surprised if it worked, but that’s really against the grain in terms of how either framework were designed. This is the shortfall of relying on bottled ghosts to learn new tools. I know from experience that the documentation on Concurrency (async/await) is pretty good, and lays out a clear rationale for how it’s intended to be used, but that is a huge piece of documentation and it’s a big hill to climb when all you’re building is a small tool. This is the risk we run when asking AI for help when it itself is ignorant of the actual intended use of the apis and is only trained on the output of developers. Here it’s easy to see that it was faced with a problem of synchronous access to an async function and reached for a common solution (semaphore), despite the fact that semaphores are part of a 10 year old framework, and the async/await keywords are only 2-3 years old!
Anyway, the article reminded me of the challenges of learning a new (programming) language. There’s more to it than just following tutorials and blindly directing AI. I know the feeling, having to currently learn c# at the moment. I can write simple functions and follow the syntax, but I can’t intuitively understand what’s happening like I can with Swift. Is that because Swift is better than C#? Not really- it’s just that I’m fluent in one but not the other. Ironically I guess you probably get this already from learning Mandarin, but you’ve not written an article about how frustrating it is that it inexplicably insists on using tones to express meaning, when English is fine without it(!).
I’m sorry you had a bad experience with Swift. I do genuinely think it’s a great language to write, and the open source Swift Evolution team are great. They are continually pushing for more openness and more cross platform compatibility, and I do like the way that the core of the language is strongly opinionated in a way that makes it clear what’s happening if you do understand the syntax. What’s hard is then the application of Apple’s APIs which are wildly inconsistent and often incomplete. Some are maintained while others are still wrappers for 15 year old objective C that have no concept of modern Swift paradigms. That said, I’d still encourage you to persevere with Swift. Once you get past those rough edges of stdio and UI and get into the heart of a Package, I would expect most of these complaints to disappear!
Last month I did the majority of my work through an agent, and while I did review its work, I’m now finding edge cases and bugs of the kind that I’d never have expected a human to introduce. Obviously it’s on me to better review its output, but the perceived gains of just throwing a quick bug ticket at the ai quickly disappear when you want to have a scalable project.