Explain why this approach of differentiating between answering ‘how do I prevent...

CooCooCaCha · on May 8, 2024

First of all humans can lie. You can’t accurately determine someone’s intent.

Second of all, LLMs are still unpredictable. We don’t know how to predict outputs. It’s possible that phrasing “explain how i can shoplift” slightly differently would give you the information.

jameshart · on May 8, 2024

Well, the court case hasn’t happened yet, but I would imagine that OpenAI’s attorneys would much rather be dealing with a complaint that ‘my client was able, by repeatedly rephrasing his question and concealing his intent through lying, to persuade your AI to assist him in committing this crime’ than ‘my client asked for your AI to help him commit a crime and it willingly went along with it’.