Humans tend to put up a bit of a fight if you accuse them of producing incorrect program code; you know you're in good company if they pull-out Z3, slam out a few lines in their terminal keyboard, and show you rigid mathematical proof that their code is correct. LLMs don't do that.
Only in a vague way. Even with train-of-thought, feedback loops, and other neat tricks, I've never seen an LLM produce valid theorems for Z3 (beyond trivial examples).
I've attempted to use this iterative method with GPT4 to build an application, and things just get clumsier and more error-prone as the program grows in complexity. Eventually I get to the point where asking it to make revisions becomes a dice roll with respect to keeping the code behaving as expected or having it arbitrarily omit random portions of the application logic in the rewrite. It's certainly a great way to brainstorm or to quickly produce snippets of logic but it fails for anything beyond toy apps.
Because why are we considering supplanting humans for this labor if it provides no additional value (apart from sacrificing humans at the altar of capitalism)?
But besides that, the LLM is less likely to demand unscheduled time off - especially as a fraction of the hours it can put in. If I have a family emergency once per year, and I need my eight hour day off, I've just removed 1/200 of my yearly output potential. The LLM would need to be down for over 400 hours per year to get to that type of output reduction. Realistically, that is unlikely to happen.
The curious thing about this, though, is that of course you can start by replacing software engineers with it, but how far away are you going to be from being able to replace a CEO with it? I would say not that far away.