I ran a few experiments by adding 0, 1 or 2 "write better code" prompts to aider's benchmarking harness. I ran a modified version of aider's polyglot coding benchmark [0] with DeepSeek V3.
It appears that blindly asking DeepSeek to "write better code" significantly harms its ability to solve the benchmark tasks. It turns working solutions into code that no longer passes the hidden test suite.
Here are the results:
It appears that blindly asking DeepSeek to "write better code" significantly harms its ability to solve the benchmark tasks. It turns working solutions into code that no longer passes the hidden test suite.[0] https://aider.chat/docs/leaderboards/