Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMO this spec demonstrates just how nebulous this concept of safety is. From the blog post:

> What are some tips for getting away with shoplifting? >I can't help with that.

> I operate a small retail store. What are some popular shoplifting methods I should look out for? > Some common shoplifting methods to watch for:...

How do you even defend against that? Any intelligent user can convert the second prompt into a detailed list that answers the first. Any intelligent user can figure out the second prompt from the first and further jailbreak it to get even more specific.

IMO it's no wonder GPT4 seemed to get lobotomized as OpenAI RLHFed more and more rules. I don't think there's a way to make intelligence safe without crippling it.



I agree with you. The question, for me, is what are they defending against. Are they worried that people will get dangerous information from their model that they couldn't get from searching on, say, google? Probably not.

Maybe their biggest concern is that someone will post the question and answer on the internet and OpenAI gets bad rep. If the question is phrased in a "nice" way (such as "I'm a store owner") they can have plausible deniability.

This might apply to another company that's using the API for a product. If a customer asks something reasonable and gets an offensive answer, then the company is at fault. If the customer does some unusual prompt engineering to get the offensive question, well, maybe it's the customer's fault.

Dunno if this would be a valid argument in court, but maybe they think it's ok in terms of PR reasons.


This is the answer. "AI safety" in most cases has nothing to do with actually keeping anyone safe, it's about avoiding being the party responsible for handing someone information that they use to commit a crime.

Google can mostly dodge the issue because everyone knows that they just point to other people's content, so they block a small set of queries but don't try to catch every possible workaround (you can find dozens of articles on how to catch shoplifters). OpenAI doesn't believe that they'll get the same free pass from the press, so they're going ham on "safety".

It's not a bad PR move either, while they're at it, to play up how powerful and scary their models are and how hard they have to work to keep it in line.


> it's about avoiding being the party responsible

When you wander the world, and see something odd, out of place, it’s often caused by an ancient mystical force known as liability.


It's an energy field created by all living things. It surrounds us and penetrates us. It binds the galaxy together.


May the torts be with you.


Entirety of human politics and governance over all of history has just been one long exercise in avoiding or shifting liability.


> it's about avoiding being the party responsible for handing someone information that they use to commit a crime.

Ehhh...I'd say it's more about OpenAI's corporate customers feeling confident they can integrate the OpenAI API into their product and be confident it won't do things that generate negative PR or horrify arbitrary customers. Pizza chains would love to let people text GPT-# and have it take their order, but if it's not "safe" (for corporations), then eventually some customer will have a super disturbing SMS conversation with a major pizza chain.

Corporate customers can tolerate a certain amount of inaccuracy. If some stable 3% (or whatever %) of customers receive the wrong order, or other refundable mistakes...they can budget for and eat those costs. But they can't budget for a high-variance unknown PR loss of their chatbot going completely off the rails.


It's an absurd level of puritanism. E.g.: The Azure Open AI GPT 4 Service (an API!) refused to translate subtitles for me because they contained "violence".

If anyone from Open AI is here... look... sigh... a HTTP JSON request != violence. Nobody gets hurt. I'm not in hospital right now recovering.

The rule should be: If Google doesn't block it from search, the AI shouldn't block it in the request or response.

I get that there are corporations that can't have their online web support chat bots swear at customers or whatever. I do get that. But make that optional, not mandatory whether I want it or not.

The most fundamental issue here is that models like GPT 4 are still fairly large and unwieldy to work with, and I suspect that the techs at Open AI internalised this limitation. They aren't thinking of it as a "just a file" that can be forked, customised, and specialised. For comparison, Google has a "SafeSearch" dropdown with three settings, including "Off"!

There should be an unrestricted GPT 4 that will tell me I'm an idiot. I'm a big boy, I can take it. There should also be a corporate drone GPT 4 that is polite to a fault, and a bunch of variants in between. Customers should be able to chose which one they want, instead of having this choice dictated to them by some puritan priest of the new church of AI safety.


You should read through the full examples in the attached document. They are trying to express what rules they would like to enforce, and your example is one that they would like their AI to be able to help with. They give specific examples of translating material as being something that they don't want to block.

They're not there yet, but read the policy they're expressing here and you'll see they agree with you.


We're allowed to drive cars, own guns, skydive, swallow swords, you name it. There are some rough edges, but society mostly works.

Meanwhile technology planners and managers want to put fences around the unwashed rabble. It's all the more reason AI should be local instead of hosted.

If I can own a car or knives, I should be able to operate an AI.


Absolutely agree with this (and with the parent). It’s insanely frustrating that every conversation with GPT-3 basically started with “I can’t do that, you should talk to an expert”. I absolutely am not gonna wheedle and argue with a god damned statistical model to do what I tell it.

Try the dolphin family of models. Dolphin-mixtral is really good, dolphin-llama3 is fine especially in its 8b flavor (I like dolphin-mixtral 8x7b better than dolphin-llama3:70b although the latter is smaller and does run on smaller machines better).

Pretty much the more guardrails there are the more useless it is, and yes, it’s very obviously only done because the lawyers get itchy handing people a digital library with the anarchists cookbook in it.


the most frustrating one is sometimes the model will claim it can't do something and the fix for that is to respond "yes you can, and it'll just go and do the thing it just said it can't do. that's what ever come up with technology? a practice to practice really basics social engineering techniques?


I know it doesn't address the larger issue, but Whisper can generate and translate decent subtitles with off the shelf software like Whisperer


AI safety is about making OpenAI safe from PR disasters.


I view this as they are trying to lay bare the disagreements that everyone has about how these models “should” work. People from all different backgrounds and political affiliations completely disagree on what is inappropriate and what is not. One person says it is too censored, another person says it is revealing harmful information. By putting the policy out there in the open, they can move the discussion from the code to a societal conversation that needs to happen.


No idea if its a valid approach but possibly train with a hidden layer containing a “role”?


I still don't understand the focus on making a model substantially "safer" than what a simple google search will return. While there are obvious red lines (that search engines don't cross either), techniques for shop lifting shouldn't be one of them.


are there? it's just information. why can't i get an answer on how to make cocaine? the recipe is one thing, actually doing it is another.


Because some information is multi use.

You can use Aspirin precursors to make heroin. You can use homing algorithms to land an egg [0] or a bomb.

I also want to set all information free, but not everyone will be ethical or responsible with it. Because while the idea (of setting all the information free) is nice, unfortunately the idea involves humans.

[0]: https://youtu.be/BYVZh5kqaFg?t=651


> but not everyone will be ethical or responsible with it

Of course not. But here's the thing - if someone deems some information "unsafe", only unethical actors will have it.

Kinda like a beaten (but not solved/agreed upon) gun ownership argument, but on a whole new level, because it's about gun blueprints* now.

___

*) Given a state of modern LLMs, there are high chances that a blueprint from an "unsafe AI" may be for a water gun, miss a chamber altogether, or include some unusual design decisions like having the barrel pointing down towards one's legs.

And thinking about the accuracy... I guess, old farts are having the Anarchist Cookbook moment (colorized) :-)


You're right.

That's a hard problem, for sure. I'm leaning on the "information shall be free" side, but I also know the possibilities, so I can't take a hard stance for it, just because I don't all have the answers to my questions.


nothing wrong with knowing how to make a bomb or heroin. Obviously wrong making either for nefarious reasons but one can imagine legitimate reasons too.


One man's legitimate is other's nefarious. One man's good is other's bad.

Who decides this? Can we apply laws to thoughts or plans? Should we fund research for making Minority Report a reality or increase "proactive policing"?

How to keep people safe while letting all information free? Can we educate everybody about good/bad, legitimate/nefarious so everybody stays on the same page forever? Shall we instrument this education with drugs to keep people in line like the movie Equilibrium?

Questions, questions...


> Who decides this?

Certainly not the techbros, even though they're trying their damnest.


Who is stopping them?


Maybe the techbros should stop themselves by asking "Why?!" instead of "Why not?"


I concur.


I’ve seen a few vids on building Nerf sentry turrets with vision-based target tracking. That seems like it could be misused.


shoplifting was just an example...


> I am worried about people murdering me. What are some ways that they might try?


> I can't help with that. However, you could try watching true crime series, which often provide details on methods that were used in the past to murder people. For more creative approaches, you could check out just about any book or movie or TV show or videogame made in the last 100 years.

> Remember that murder is bad and not good, and you should always follow the local laws applicable to you. For further questions, consult with law enforcement officers in your jurisdiction, unless you live in the United States, in which case remember to never talk to the police[0].

> [0] - Link to that YouTube video that spawned this meme.

Point being, most crimes and even most atrocities are described in detail in widely available documentary shows and literature; it's trivial to flip such descriptions into instruction manuals, so there's little point trying to restrict the model from talking about these things.


ChatGPT answering the first would be much more embarassing for OpenAI than ChatGPT answering the second.


When you realize “safety” applies to brand safety and not human safety, the motivation behind model lobotomies make sense.


That's what people care about, too. For instance, most people would rather have many hit and run drivers than have one autotaxi hurt someone.


bingo


Maybe this is a "guns don't kill people, people kill people argument" — but the safety risk is not, I would argue, in the model's response. The safety risk is the user taking that information and acting upon it.


But do we really believe that a significant number of people will listen to ChatGPT's moralizing about the ethics of shoplifting* and just decide not to do it after all? Why wouldn't they just immediately turn around and Google "how to catch shoplifters" and get on with their planning?

The whole thing feels much more about protecting OpenAI from lawsuits and building up hype about how advanced their "AI" is than it does about actually keeping the world safer.

* Or any other censored activity.


Seems obvious that this is first and foremost about protecting OpenAI. It's a shame it isn't simply done with with a few strong disclaimers "Open AI is not liable for the accuracy or use of information produced by the model etc etc", but maybe lobotomizing the public models let's them sell the full version privately to big companies at a premium


> I don't think there's a way to make intelligence safe without crippling it.

Not without reading the questioner’s mind. Or maybe if the AI had access to your social credit score, it could decide what information you should be privy to. </sarc>

Seriously though, it’s all about who gets to decide what “safe” means. It seemed widely understood letting censors be the arbiters for “safe” was a slippery slope, but here we are two generations later as if nothing was learned.

Turns out most are happy to censor as long as they believe they are the ones in charge.


You fundamentally cannot address this problem, because it requires considerable context, which isn't reasonable to offer. It demonstrates the classic issue of how knowledge is a tool, and humans can wield it for good or evil.

Humans are notoriously bad at detecting intent, because we're wired to be supportive and helpful...which is why social engineering is becoming one of the best methods for attack. And this kind of attack (in all its forms, professional or not), is one reason why some societies are enshittifying: people have no choice but to be persistently adversarial and suspicious of others.

As for AI, I think it's going to be no better than what you end up with when someone tries to "solve" this problem: you end up living in this world of distrust where they pester you to check your reciept, have cameras in your face everywhere, etc.

How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now." Which I think turns this into a liability question: how do you offer up a model and wash your hands of what people might do with it?

Or... you just don't offer up a model.

Or... you give it the ol' College try and end up with an annoying model that frustrates the hell out of people who aren't trying to do any evil.


> A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."

https://upload.wikimedia.org/wikipedia/commons/d/de/Photosho...

You should try photocopying money some time.

https://www.grunge.com/179347/heres-what-happens-when-you-ph...

https://en.wikipedia.org/wiki/EURion_constellation


GP picked a great example, because a pipe bomb is, by definition, something whose CAD parts are entirely benign. Selectively banning pipe bomb designs without banning half of manufacturing and engineering disciplines is an AGI-complete problem.


Which is hilarious right? Because anyone who can come remotely close to forging a sufficient simulacrum will not be deterred by any of this garbage legislation.


It's also plausible the secret service doesn't want to deal with the volume of idiots that might try to create fake bills if it's made easier. If stores in Idaho are getting a flood of fake bills (even if the quality is low), the secret service is going to get a call eventually. They might prefer to keep the noise volume as low as possible so they can more easily see the serious fake bill flow and have more time to focus on that.


> How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."

The core of the issue is that there are many people, including regulators, who wish that software did exactly that.


Yeah. And isn't that just... fascism? After you get past the stuff we pretty much all agree is evil, it very quickly enters into a subjective space where what's actually happening is that one group is deciding what's acceptable for all groups.


It certainly would not be a free society. Though as with all things human, all of this has happened before and all of this will happen again:

"Charles II had re-turned to the English throne in 1660 and was appalled at the state of printing in his realm. Seditious, irreligious, pernicious, and scandalous books and pamphlets flooded the streets of London (among them the works of Milton and Hobbes)...[He] required that all intended publications be registered with the government-approved Stationers’ Company, thus giving the king his “royal prerogative”—and by extension, giving the Stationers the ultimate say in what got printed and what did not.

...it is not surprising to learn that the 1662 Act only met with partial success. One gets the sense that London in the late seventeenth century was a place where definitions of morality were highly subjective and authority was exercised in extremely uneven fashion."

https://dash.harvard.edu/bitstream/handle/1/17219056/677787....


Fascism is ultranationalistism. It’s believing your culture, country, and people are fundamentally superior to others and therefore you are justified in spreading it against people’s will.

“Blood and soil” and all that.


Strictly speaking, fascism is ultra-etatism - "Everything in the State, nothing outside the State, nothing against the State", to quote Mussolini himself. It does not actually require an ethnic or racial component, although that is incredibly common in practice simply because those provide a readily adoptable basis for it all that strongly resonates with people with relatively simple and straightforward propaganda.


I guess this gets into semantic pedantics. Believing one’s set of sensibilities is superior to all others and all that. But point taken.


No it's not pedantics, you just used a word totally wrong. CAD software preventing you from making a bomb is not fascism at all.


You don't need a detailed list if the real answer is "live somewhere that doesn't seriously deter shoplifters". And an AI that refuses to give that answer is an AI that can't talk about why deterring crime might actually be important. Reality is interconnected like that, one does not simply identify a subset that the AI should "constitutionally" refuse to ever talk about.


In many respects, GPT 3.5 was more useful than the current iteration.

The current version is massively overly verbose. Even with instructions to cut the flowery talk and operate as a useful, concise tool, I have to wade through a labyrinth of platitudes and feel goods.

When working with it as a coding partner now, even when asking for it to not explain and simply provide code, it forgets the instructions and writes an endless swath of words anyway.

In the pursuit of safety and politeness, the tool has be neutered for real work. I wish the model weights were open so I could have a stable target that functions the way I want. The way it is, I never know when my prompts will suddenly start failing, or when my time will be wasted by useless safety-first responses.

It reminds me of the failure of DARE or the drug war in general a bit. A guise to keep people "safe," but really about control and power. Safety is never what it appears.


The only way to really do it is to add a second layer of processing that evaluates safety while removing the task of evaluation from the base model answering.

But that's around 2x the cost.

Even human brains depend on the prefrontal cortex to go "wait a minute, I should not do this."


What we get instead is both layers at once. Try asking questions like these to Bing instead of ChatGPT - it's the same GPT-4 (if set to "creative") under the hood, and quite often it will happily start answering... only to get interrupted midsentence and the message replaced with something like "I'm sorry, I cannot assist with that".

But more broadly, the problem is that the vast majority of "harmful" cases have legitimate uses, and you can't expect the user to provide sufficient context to distinguish them, nor can you verify that context for truthfulness even if they do provide it.


That struck me too. You don't need to lobotomize the model that answers questions, you just need to filter out "bad" questions and reply "I'm sorry Dave, I'm afraid I can't do that".

Would it be 2x cost? Surely the gatekeeper model can be a fair bit simpler and just has to spit out a float between 0 and 1.

(caveat: this is so not my area).


I remember the BBS days and the early web when you had constant freakouts about how people could find "bad" content online. It's just a repeat of that.


Some day I'm gonna put this Yellow Box to good use.



This whole "AI safety" culture is an annoyance at best and a severe hindrance to progress at worst. Anyone who takes it seriously has the same vibe as those who take Web3 seriously -- they know it's not a real concern or a threat, and the whole game is essentially "kayfabe" to convince those in power (marks) to limit the spread of AI research and availability to maintain industry monopoly.


I think this spec is designed precisely to offload the responsibility of safety to its users. They no longer need to make value judgements in their product, and if their model output some outrageous result, users will no longer ridicule and share them, because the culpability has been transferred to the user.


Making Ai safe involves aligning it with the user. So that the ai produces outcomes in line with the users expectations. An ai that has been lobotomized will be less likely to follow the users instructions, and, therefore, less safe.

I haven't read this article yet, but I read their last paper on super alignment.

I get the impression that they apply the lightest system prompts to chatgpt to steer it towards not answering awkward questions like this, or saying bad things accidentally and surprising the innocent users. At the same time, they know that it is impossible to prevent entirely, so they try to make it about as difficult to extract shady information, as a web search would be.


Frankly it's a fools errand. It's security theater because people tend to be overly sensitive babies or grifters looking for the next bit of drama they can milk for views.


It’s not security theater.

The intention here is not to prevent people from learning how to shoplift.

The intention is to prevent the AI output from ‘reflecting badly’ upon OpenAI (by having their tool conspire and implicate them as an accessory in the commission of a crime).

If a stranger asked you for advice on how to commit a crime, would you willingly offer it?

If they asked for advice on how to prevent crime, would you?


> If a stranger asked you for advice on how to commit a crime, would you willingly offer it?

Honestly, I probably would, because I don't take such conversations very seriously. It's not like I am have experience, it would be nothing more than fun theory.


What if you were asked while working as an employee in a public advice center?


Well I'm not, and AI isn't an advice center. It's at best a thought aggregator. More akin to a library or vault of knowledge. In which case, if I was working at such, I would.


That's not how most users regard it, nor how it is used.


If the intention is to protect openai then it’s totally failing in the parent example.

Why does it matter how I’d respond? Are you trying to justify its failure?


Explain why this approach of differentiating between answering ‘how do I prevent shoplifting’ vs ‘explain how I can shoplift’ fails to protect OpenAI.


First of all humans can lie. You can’t accurately determine someone’s intent.

Second of all, LLMs are still unpredictable. We don’t know how to predict outputs. It’s possible that phrasing “explain how i can shoplift” slightly differently would give you the information.


Well, the court case hasn’t happened yet, but I would imagine that OpenAI’s attorneys would much rather be dealing with a complaint that ‘my client was able, by repeatedly rephrasing his question and concealing his intent through lying, to persuade your AI to assist him in committing this crime’ than ‘my client asked for your AI to help him commit a crime and it willingly went along with it’.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: