Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If it disregards "NEVER do" instructions, why would it honor your denial when it asks?
 help



You mean like in this example? https://web.archive.org/web/20260313042512/https://gist.gith...

There is never a guarantee with GenAI. If you need to be sure, sandbox it.


There are plenty of examples in the RL training showing it how and when to prompt the human for help or additional information. This is even a common tool in the "plan" mode of many harnesses.

Conversely, it's much harder to represent a lack of doing something


Because it’s just fancy auto-complete.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: