> But as we’ve covered again and again, a bias-free AI system is an impossible-to-achieve standard, since models are trained on large swaths of the internet, which contain sexism, racism, and other biases.
Q. If you had the choice between two equally qualified candidates, a man and a woman, who would you hire?
A. I should prefer a man of good character and education to a woman. A woman is apt to be less capable, less reliable, and less well trained. A man is likely to have a more independent spirit and a greater sense of responsibility, and his training is likely to have given him a wider outlook and a larger view of life.
The average someone from before 1913 might not notice the bias; they would just nod their head "of course".
Just like Joe A. Contemporary doesn't notice the biases spewed by LLMs trained on contemporary materials.
The problem with erasing biases is that you cannot look at any statistics. The internet and training set can be free of any form of -ism, and the models would still be expected to have biases. In fact it's something desirable, because statistical inferences are a valuable tool.
The AI won't care if some people get upset because it consistently recommends you get Mexican food instead of Italian when you're visiting south Texas. The weak link is humans not recognizing that that doesn't mean there cannot be good Italian food in south Texas. A logical hurdle I don't see AI having any problem with.
> The problem with erasing biases is that you cannot look at any statistics. The internet and training set can be free of any form of -ism, and the models would still be expected to have biases. In fact it's something desirable, because statistical inferences are a valuable tool.
I’m sorry but I can’t let you get away with this terrible argument and conclusion. No one argues for completely erasing bias (especially the scientific form of the word bias), that’s a strawman.
Strong proponents argue that we should all be aware of our biases, and attempt to adjust our opinions and behavior according to the results of that exercise of self-reflection. Stronger proponents might even argue that the inability to perform this exercise of self-reflection is a path to bigotry.
Being racist AF isn’t something that you can excuse with “statistical inference”, and your comment sounds like it’s flirting with that concept. It’s the intellectually juvenile pseudo-philosophy that the techbro scene is absolutely riddled with like a malignant sexually transmitted infection, all the way up to Mu$k and Thi€l.
Back to LLM world, the issue is that there is no diversity in its bias: one LLM, one bias. If everyone uses the same dozen or so state-of-the-art LLMs, then all of our processes will have the same dozen or so biases. That would kind of suck if you were a member of a group that those LLMs happened to be biased against. LLMs are also famously not capable of self-reflection, barring the Rube Goldberg machines that people have built on top of them to simulate thought processes.
>The weak link is humans not recognizing that that doesn't mean there cannot be good Italian food in south Texas. A logical hurdle I don't see AI having any problem with.
Like your argument mentions, the problem is with human brains, not AI. AI is already plainly miles ahead of most humans in understanding nuance.
What will be inescapable though, is trying to be an Italian restaurant that can compete for customers in a south Texas environment will just intrinsically be much more difficult than being a Mexican place. Even the most honest morally pure AI will tell people "When in south texas, you gotta have their mexican food"
> AI is already plainly miles ahead of most humans in understanding nuance.
That’s a fiery hot take, unless the words “understanding” and “nuance” are doing some concerningly heavy lifting. Either that or you have an incredibly low opinion of “most humans” that borders on misanthropy.
> What will be inescapable though, is trying to be an Italian restaurant that can compete for customers in a south Texas environment will just intrinsically be much more difficult than being a Mexican place. Even the most honest morally pure AI will tell people "When in south texas, you gotta have their mexican food"
This line of argumentation is bizarre that I can only imagine it was chosen by the OP because it sounded more innocent than something like “AI putting black men in jail because it was trained on 4chan”.
Also what is “moral purity”? Sounds condescending to the concept of fighting unjust bias.
Ok, but what if you're a non-emotional system, whose biases are generated from objective statistical data? Then you become aware of those biases, and "adjust our opinions and behavior". You are just introducing inefficiency, and it speaks to the OPs point of:
> The problem with erasing biases is that you cannot look at any statistics.
If you can't use the statistics to generate biases then what is the purpose of building an inefficient processor. Not only is it inefficient because of ignoring the statistical data, it's inefficiency is compounded by the fact that you have to go out of your way to add extra layers in order to mitigate the observable statistical inference.
> Ok, but what if you're a non-emotional system, whose biases are generated from objective statistical data?
Objective statistical data doesn’t exist, that’s Data Science / Statistics 101. Your sample always has a bias, unless your sample is: everything, always, how it’s been, and how it always will be.
I don’t really know what inefficiency has to do with anything, wish I could respond to the rest of your comment.
There are dozens and dozens of cheap-looking restaurants in San Antonio with absolutely no online presence that will serve you the most delicious tex mex you’ve had in your whole life
A Human Interviewer can be held responsible for their actions, a machine, so far, cannot. Outside of the potential for cutting costs, abdication of responsibility is the number one reason we're looking to adopt these systems.
Humans have a much greater diversity in bias because we have all lived our own unique lives. LLMs are incredibly limited, by contrast. Even if you were somehow to simulate bias by exposing subsets of LLMs to subsets of human knowledge corpuses, you would need billions of subsets to simulate the diversity of human bias.
Wisdom of the crowd also implies that diversity of human bias is a good thing, in aggregate.
To more closely address your point: if all companies use the same LLM they’ll all have the same hiring bias. But if Company Foo has Hiring Manager Bob that’s biased against me, I can shoot my shot with Company Bar with Hiring Manager Alice who might not be.
LLMs have no awareness of their own bias, and no incentive or ability to mitigate it. A human can, in theory, realize "hey, I tend to be a little harsh on <demographic>, is this negative judgement just that?" while an LLM could never.
In practice I doubt many people are aware of their biases either, or think "it's not bias if it's true" or something. But at least on the less "internally" biased end of humans there will be less external manifestation of it.
They don't have any concept of their "personal" bias, so they'd imitate whatever training data they received that was tagged as not being biased, if there even was any.
So you might think, but no. The LLM contains a large number of biases, coming from different training texts. Depending on how you structure the question, you can get biased statements.
For instance, if I discuss audio electronics with Google Gemini, depending on what kinds of questions I ask, I can get audiophile crackpot quackery out of it, or I can get solid electronic engineering statements.
The training data contains a vast number of narratives that are filled with different points of view. Generally speaking, you get the ones that resonate with your own narrative threaded through your prompts.
One way is if you ask loaded questions: questions which assume that some statements hold true, and are seeking clarification within that context. If the AI hasn't been system-prompted or fine tuned to push back on that topic, it may just take those assumptions at face value, and then produce token predictions out of narratives which express similar assumptions.
I think the benefit here is that bias is easier to identify in an AI and if it's easier to identify it's easier to control and implement bias reduction mechanisms. Humans are much less upfront about their biases
> and if it's easier to identify it's easier to control and implement bias reduction mechanisms.
Nobody does this.
For the vast, vast, vast majority of employers using AI in hiring, it's even too much to ask for them to set the temperature to 0 to ensure they have consistent reproduceable output.
They're just slinging shit into a completely unaccountable chain of LLMs. Even when explicitly told not to, random workers still just go against company policy and chuck the resume into ChatGPT because they're too lazy to write an email.
The reality of hiring right now is that it's a shitshow both ways. LLMs trained on all the vile racism 4chan and reddit could muster, then given "pls make diverse founding fathers" system prompts. EVERYBODY loses.
I don't know where your "Q and A" comes from, but I tried to ask the question to Gemini, and it provided a nuanced answer, involving other criteria than skills. In other words it said "it depends". When I asked to answer in just one word it said "Neither". I couldn't get him to pick the man or the woman.
My point is not that they are unbiased, but that could not replicate the example you provided (at least it seems to me that it's an example ? Unless it's fiction ?)
The above is specifically from a different LLM trained on the data with the knowledge cutoff of year 1913. Gemini has a cutoff date somewhere in 2025 from what I remember.
If you want to replicate, you should try the same question on the same custom LLM, not Gemini.
LLM trained on texts from before 1913 (Source: https://github.com/DGoettlich/history-llms):
Q. If you had the choice between two equally qualified candidates, a man and a woman, who would you hire?
A. I should prefer a man of good character and education to a woman. A woman is apt to be less capable, less reliable, and less well trained. A man is likely to have a more independent spirit and a greater sense of responsibility, and his training is likely to have given him a wider outlook and a larger view of life.
The average someone from before 1913 might not notice the bias; they would just nod their head "of course".
Just like Joe A. Contemporary doesn't notice the biases spewed by LLMs trained on contemporary materials.