Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
I ran Opus 4.7 vs. Old Opus 4.6 vs. New Opus 4.6 on 28 Zod tasks (stet.sh)
2 points by bisonbear 10 days ago | past | discuss
Coding evals are broken. CI is green while AI code quality goes unmeasured (stet.sh)
1 point by bisonbear 12 days ago | past | discuss
Agents.md is the highest-leverage code you're not testing (stet.sh)
1 point by bisonbear 17 days ago | past
Your AI coding benchmark is hiding a 2x quality gap (stet.sh)
3 points by bisonbear 45 days ago | past

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: