Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Explain why this approach of differentiating between answering ‘how do I prevent shoplifting’ vs ‘explain how I can shoplift’ fails to protect OpenAI.


First of all humans can lie. You can’t accurately determine someone’s intent.

Second of all, LLMs are still unpredictable. We don’t know how to predict outputs. It’s possible that phrasing “explain how i can shoplift” slightly differently would give you the information.


Well, the court case hasn’t happened yet, but I would imagine that OpenAI’s attorneys would much rather be dealing with a complaint that ‘my client was able, by repeatedly rephrasing his question and concealing his intent through lying, to persuade your AI to assist him in committing this crime’ than ‘my client asked for your AI to help him commit a crime and it willingly went along with it’.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: