As if we didnβt have enough in-person racism to deal with, it turns out the chatbots might be racist too. Troubling reporting from the Washington Post revealed that one of the most widely-used data sets used to train AI chatbots contains a on of right-wing content.
For folks unfamiliar with artificial intelligence, AI programs like ChatGPT canβt literally think for themselves. Instead, companies feed AI programs a massive amount of data scraped from all over the internet. The AI uses this data set to mimic human thought. So if youβre robot friend starts trying to share 9/11 conspiracy theories, chances the data set had a little too much of Alex Jones.
Suggested Reading
And that, my friends, is precisely where the problem begins. According to theΒ Washington Post investigation, the news websites used in one of the most widely used AI data sets include a ton of far-right and non-reputable sources. The data set in question is Googleβs C4 data set, which powers some of the largest AI models in the world, including Facebook and Googleβs AI models.
So where exactly are they getting their news from? Well, Breitbart is definitely on the list. The Russian-state propaganda website RT.com is also on there, alongside the anti-immigration group Vdare.com.
You donβt have to take our word for whether Breitbart is pushing racism. In 2016, right-wing commentator Ben Shapiro expressed disdain for the website, saying it pushed βwhite ethno-nationalismβ content. And if youβre too far right for Ben Shapiro... you might want to start asking some tough questions. A massive concern is that AI programs donβt always cite their sources, which means you could ask an AI a question and not know the idea that the answer is coming from a right-wing site spewing hate.
MSNBCβs Sarah Posner, who covers the right, called attention to just how dangerous having these inputs in the algorithm can be:
Anyone who has searched the web for information on a topic knows that it can sometimes land them on a site spewing bigoted content or disinformation. The building blocks of chatbots have been scraped from the same internet. An offended user can navigate away from a toxic site in disgust. But because the data collection for LLMs is automated, such content gets included in the βinstructionβ for them. So if an LLM includes information from sites like Breitbart and VDare, which publish transphobic, anti-immigrant and racist content, that information β or disinformation β could be incorporated in a chatbotβs responses to your questions or requests for help.
The problem with AI (other than the inevitable day it takes over the world) is that itβs a product of our own biases and judgments. And until we get a much better handle on that or at least put up better guard rails, the racist chatbots might be here to stay.
Straight From
Sign up for our free daily newsletter.