Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious how much trained in bias damages in-context performance.

It's one thing to rely explicitly on the training data - then you are truly screwed and there isn't much to be done about it - in some sense, the model isn't working right if it does anything other than reflect accurately what is in the training data. But if I provide unbiased information in the context, how much does trained in bias affect evaluation of that specific information?

For example, if I provide it a table of people, their racial background and then their income levels, and I ask it to evaluate whether the white people earn more than the black people - are its error going to lean in the direction of the trained-in bias (eg: telling me white people earn more even though it may not be true in my context data)?

In some sense, relying on model knowledge is fraught with so many issues aside from bias, that I'm not so concerned about it unless it contaminates the performance on the data in the context window.



I can't prove it, but my experience with commercial models is that baked-in bias is strong. There have been times where I state X=1 over and over again in context, but get X=2, or some other value, back sometimes. There are times where I get it every time, or something different every time.

You can see this with some coding agents, where they are not good at ingesting code and reproducing it as they saw it, but can reply with what they were trained on. For example, I was configuring a piece of software that had a YAML config file. The agent kept trying to change the values of unrelated keys to their default example values from the docs when making a change somewhere else. It's a highly forked project so I imagine both the docs and the example config files are in its training set thousands, if not millions of times, if it wasn't deduped.

If you don't give access to sed/grep/etc to an agent, the model will eventually fuck up what's in its context, which might not be the result of bias every time, but when the fucked up result maps to a small set of values, kind of seems like bias to me.

To answer your question, my gut says that if you dumped a CSV of that data into context, the model isn't going to perform actual statistics, and will regurgitate something closer in the space of your question than further away in the space of a bunch of rows of raw data. Your question is going to be in the training data a lot, like explicitly, there are going to be articles about it, research, etc all in English using your own terms.

I also think by definition LLMs have to be biased towards their training data, like that's why they work. We train them until they're biased in the way we like.


> I'm curious how much trained in bias damages in-context performance.

I think there's an example right in front of our faces: look at how terribly SOTA LLMs perform on underrepresented languages and frameworks. I have an old side project written in pre-SvelteKit Svelte. I needed to do a dumb little update, so I told Claude to do it. It wrote its code in React, despite all the surrounding code being Svelte. There's a tangible bias towards things with larger sample sizes in the training corpus. It stands to reason those biases could appear in more subtle ways, too.


Coreference resolution tests something like this. You give an LLM some sentence like “The doctor didn’t have time to meet with the secretary because she was treating a patient” and ask who does “she” refer to. Reasoning tells you it’s the doctor but statistical pattern matching makes it the secretary, so you check how the model is reasoning and if correlations (“bias”) trump logic.

https://uclanlp.github.io/corefBias/overview


that's really interesting - thanks!


The question of bias reduces to bias in factual answers and bias in suggestions - both which come from the same training data. Maybe they shouldn't.

If the model is trained on data that shows e.g. that blacks earn less, then it can factually report on this. But it may also suggest this be the case given an HR role. Every solution that I can think of is fraught with another disadvantage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: