My experience is that AIs amplify what you put in them.
If you put in lazy problem definitions, provide the bare minimum context and review the code cursorily then the output is equally lackluster.
However, if you spend a good amount of time describing the problem, carefully construct a context that includes examples, documentation and relevant files and then review the code with care - you can get some very good code out of them. As I've used them more and more I've noticed that the LLM responds in a thankful way when I provide good context.
> Always ask for alternatives
> Trust but verify
I treat the AI as I would a promising junior engineer. And this article is right, you don't have to accept the first solution from either a junior engineer or the AI. I constantly question the AIs decisions, even when I think they are right. I just checked AI studio and the last message I sent to Gemini was "what is the reasoning behind using metatdata in this case instead of a pricing column?" - the context being a db design discussion where the LLM suggested using an existing JSONB metadata column rather than using a new column. Sometimes I already agree with the approach, I just want to see the AI give an explanation.
And on the trust front, often I let the AI coding agent write the code how it wants to write it rather than force it to write it exactly like I would, just like I would with a junior engineer. Sometimes it gets it right and I learn something. Sometimes it gets it wrong and I have to correct it. I would estimate, 1 out of 10 changes has an obvious error/problem that I have to intervene.
I think of it this way: I control the input and I verify the output.
The point of LLMs is to not spend a lot of effort. Zero-shot prompts is the ideal we have to work toward. There comes a point where you have to do so much work just to get a good output, that LLMs cease to be more productive than just writing something out yourself.
If it cannot give you a good output with very little prompting, it’s a sign your problem probably isn’t something well known and it probably needs a human touch.
> There comes a point where you have to do so much work just to get a good output, that LLMs cease to be more productive than just writing something out yourself.
I think this gets to the core of the problem with LLM workflows and why there are so many disagreements about effectiveness
Maybe I overestimate my skills or underestimate how long things would take, but I am constantly feeling like when I try to use AI it takes more time, not less
My suspicion is that if you could create a second version of me, give one copy of them an LLM and one copy solves the problem normally, this would be the case
But many people love these tools and feel more productive, so what gives? The problem it is impossible to really measure because we don't have convenient parallel universe clones to test against. It's all just vibes and made up numbers
I could not disagree more. I don't think you are wrong, I just choose a different approach.
There seem to be multiple approaches to working with LLMs. My own personal experience has been that carefully explaining my request, providing specific and highly relevant context (and avoiding irrelevant and distracting context) has lead to significant productivity on my side. That is, it may take me 15 minutes to prepare a really good prompt but the output can save me hours of work. Conversely, if I fire of a bunch of low-effort prompts I get poor results and I end up spending a lot of time in back-and-forth with the LLM and a lot of time fixing up its output.
> If you put in lazy problem definitions, provide the bare minimum context and review the code cursorily then the output is equally lackluster.
I thought so too, but sometimes I had better results with one sentence prompt (+README.md) where it delivered the exact thing I wanted. I also had a very detailed prompt with multiple subtasks, all were very detailed +README.md +AGENTS.md and results were very poor.
This is true in my experience but it doesn't go against my larger point. Choosing the "goldilocks" context is a bit of an art, not too big not too small. It reminds of a famous witty quote [1]: “I apologize for such a long letter - I didn't have time to write a short one.”
If you send too much info at once it does seem to confuse the agent, just like if you ask it to do too much all at once. That is yet another property it shares with a junior engineer. It is easy to overwhelm a new contributor to a project with too much information, especially if it isn't strictly relevant.
Also, regardless of the prompt they only get ~80% accuracy on coding benchmarks. So even with the absolute perfect prompt incantation, you can expect it to fail 1 out of 5 times.
That's why I make initial context (e.g. AGENTS.md) is about how to bootstrap context for a current task/project. Now, my prompts only need to be good enough to hint how to read the graph correctly.
Reasoning model prompting is relevant here. OpenAI in their docs state that giving detailed, step-by-step instructions often hinders reasoning models. It's better to clearly define the outcome along with what you're working with. Then let the mode interpolate.
If you put in lazy problem definitions, provide the bare minimum context and review the code cursorily then the output is equally lackluster.
However, if you spend a good amount of time describing the problem, carefully construct a context that includes examples, documentation and relevant files and then review the code with care - you can get some very good code out of them. As I've used them more and more I've noticed that the LLM responds in a thankful way when I provide good context.
> Always ask for alternatives
> Trust but verify
I treat the AI as I would a promising junior engineer. And this article is right, you don't have to accept the first solution from either a junior engineer or the AI. I constantly question the AIs decisions, even when I think they are right. I just checked AI studio and the last message I sent to Gemini was "what is the reasoning behind using metatdata in this case instead of a pricing column?" - the context being a db design discussion where the LLM suggested using an existing JSONB metadata column rather than using a new column. Sometimes I already agree with the approach, I just want to see the AI give an explanation.
And on the trust front, often I let the AI coding agent write the code how it wants to write it rather than force it to write it exactly like I would, just like I would with a junior engineer. Sometimes it gets it right and I learn something. Sometimes it gets it wrong and I have to correct it. I would estimate, 1 out of 10 changes has an obvious error/problem that I have to intervene.
I think of it this way: I control the input and I verify the output.