I think lately results of chat went downhill. I got used to asking "are you sure...

simonw · on July 17, 2024

That's not a recent thing - most of the LLMs I've tried have an annoying tendency to agree with you if you question them, even if they weren't actually wrong. The term for it is "sycophancy", I think there are a few papers that talk about it.

ab0aa907 · on July 17, 2024

This reminds me of some communications training where they were saying that the most things are not worth disagreeing on and just agree with whatever.

And I think I read in some parenting books too that it is not worth correcting kids over things that don't matter like if they say dinosaurs and mammoth were alive during same time. Over-correcting hurts their confidence.

Maybe LLM been trained on these ideas or maybe it is start of science fiction nightmare and AI really think that it is not worth arguing with mere humans.

ec109685 · on July 17, 2024

Ask a question like “Why do EVs use less electricity per mile when it is 100 than 70 degrees outside” and then reverse the question.

You’ll get two answers that totally contradict each other.

“Electric vehicles (EVs) use less electricity per mile in warmer temperatures (around 100 degrees) compared to cooler temperatures (around 70 degrees) primarily due to the following reasons:

1. *Battery Efficiency*: EV batteries operate more efficiently in warmer temperatures. Cold weather can reduce the battery's ability to hold a charge and increase internal resistance, leading to higher energy consumption.

2. *Thermal Management Systems*: In cooler temperatures, the EV’s thermal management system works harder to keep the battery at an optimal operating temperature, consuming more energy. Conversely, in warmer weather, less energy is required to maintain this temperature.

3. *Reduced Heating Requirements*: When it’s around 100 degrees, the need for cabin heating is eliminated, whereas at 70 degrees, some heating might still be required for comfort, consuming additional energy.

4. *Air Conditioning Efficiency*: While air conditioning is used more in hotter weather, modern EVs are designed to be highly efficient in cooling compared to heating, leading to lower overall energy consumption even when the AC is in use.

These factors combined result in lower electricity consumption per mile in warmer conditions compared to cooler ones.”

Reversed:

“ Electric vehicles (EVs) use more electricity per mile in extreme temperatures (both hot and cold) due to the increased energy demands of their climate control systems. Here’s why:

1. *Air Conditioning in Hot Weather:* When it’s 100 degrees outside, the EV’s air conditioning system has to work harder to cool the cabin to a comfortable temperature. This requires additional energy from the battery.

2. *Battery Cooling:* High temperatures can affect the performance and longevity of the battery. To protect the battery, the EV’s thermal management system will actively cool the battery, consuming more electricity.

3. *Increased Resistance:* High temperatures can slightly increase the electrical resistance in the battery and other components, which can lead to marginally higher energy consumption.

In contrast, at a moderate temperature like 70 degrees, the demands on the air conditioning and thermal management systems are much lower, resulting in more efficient energy use.”

Workaccount2 · on July 17, 2024

This type of behavior seems like it should be something that should be (relatively) easy to fix with some kind of external intervention layer. Google has been able to answer queries without using LLMs for some time now, so maybe some kind of pre-check followed by a prompt injection to stiff-arm the LLM in the right direction.

adw · on July 17, 2024

Whenever you're building some ad-hoc non-differentiable circumstantial hacks around your learned model (regardless of whether it's an LLM or not), two alarms should go off in your head;

a) Could this be made differentiable? End-to-end systems are nearly always stronger than systems made up of multiple models or models-plus-heuristics;

b) is it appropriate to be using a statistical model here at all?

External intervention layers are generally severe technical debt.

arrowsmith · on July 17, 2024

What does it mean to "make something differentiable"?

adw · on July 18, 2024

Implement it as part of the network, not as a discrete system outside of it.

skybrian · on July 18, 2024

Yep, that’s how it works. LLM’s like to make lists. They could make lists of supporting reasons for just about anything. Unless it’s trained into them, they don’t have an opinion, they have all the opinions.

Arn_Thor · on July 17, 2024

I don’t understand how this is a surprise to so many people. The thing can’t reason. If you point out an error, even if it’s real, the GPT doesn’t suddenly “see” the error. It constructs a reply of the likeliest series of words in a conversation where someone told it “you did X wrong”. Its training material probably contains more forum posts admitting a screw-up than a “nuh-huh” response. And on top of that it is trained to be subservient.

Because it doesn’t _understand_ the mistake in the way a human would, it can’t react reliably appropriately

jazzyjackson · on July 17, 2024

chatgpt has never stood up for itself, if you express doubt it will defer and say it must by mistaken even when it's correct

kelseyfrog · on July 17, 2024

It makes it difficult to get less-biased responses. I find myself asking an affirmative-worded question and its affirmative-worded inverse to subtract out the sycophancy. It's a poor workaround.

sitkack · on July 17, 2024

That is because they turned the sycophancy up to 11. You have to turn it down, if you are a serious LLM user and you don't have your own system prompt, you are using it wrong.

babypuncher · on July 17, 2024

This doesn't change how frequently the LLM provides a wrong answer, just how confident it is when you question it.

Zambyte · on July 17, 2024

For what it's worth, certain system prompts can improve output quality.

12_throw_away · on July 17, 2024

Such as?

Zambyte · on July 17, 2024

CoT prompting

https://arxiv.org/abs/2407.01687

jazzyjackson · on July 17, 2024

I havent found any serious LLMs to use :)