Three weeks ago on my substack, I mused about why AI is a poor model for knowledge storage in part because it has to resolve vast disagreements between texts while providing a single “definitive” answer. I drew on an anecdotal example from my use of Microsoft Copilot and DALL·E 3 which refused to generate an image of “a queer person.” As it turns out, I’m not the only person who noticed this alarming bug — so did the news team at Nature.
We’ve all seen the headlines about AI generating hateful content. In 2016, Microsoft’s AI chatbot started spewing hate speech. A recent academic analysis of image generation algorithms found that when prompted for depictions of queer and trans people, the resulting images were “stereotypes and smut.”
Under capitalism, AI is fundamentally a commercial product, and hate-filled content is bad PR (for a large tech company). So, companies take the cheapest approach to solving the issue: instituting a “safety system” that prevents the algorithm from engaging with potentially provocative content. The above instance of Microsoft Copilot refusing to “make an image of a queer person” is an example of an AI engaging its safety system. For whatever reason, the designers of Copilot have decided that “queer” is off-limits for their product. While this is likely due to the word’s history as an offensive slur, it completely ignores the ways in which folx have reclaimed it as an affirming self-descriptor.
Another approach to combat AI-generated hate is to filter the data used to train the algorithms. In fact, this is what Google did to train their T5 large language model (LLM). The effort started in 2019 when Google acquired a snap shot of the Internet (i.e. the entire Internet scraped). Before using the data to train T5, they first “cleaned” the data set.
Google’s approach to cleaning was very simple: They removed any entry that contained a word or phrase in the List of Dirty, Naughty, Obscene, and Otherwise Bad Words (which was first generated by Shutterstock to prevent their search feature from suggestion obscene keywords). I looked over the list of English entries (lists are also available for 27 other languages), and generally most words were either slurs or explicit sexual acts. But many words relate to queer experiences and/or sexuality, including “gay,” “kinky,” “se,” “twink,” and “dominatrix.”
I’m not saying you have to go explain these words to your children. Instead, you should consider what was removed from the data and, therefore, what many AI algorithms are blind to: A moving article describing someone’s coming out story. Academic treatises on the difference between gender and sex. The entire two season run of the Netflix dark comedy Bonding.
Due to this design element, Google’s T5 LLM lacks a lot of knowledge about queerness. A former Google employee who helped generate the data set told Nature that the team consciously chose the List of Dirty, Naughty, Obscene, and Otherwise Bad Words as their filter because it was “overly conservative.” And it’s not just T5 that was trained off of Google’s data. Meta used the same cleaned data set to train its LLM, Llama.
In an uncanny way that only AI can achieve, this expressed rationale mirrors a common feeling held by cis folks when interacting with trans people: the paralyzing dread of saying the wrong thing. In a conversation, I can tell when the other person begins to feel nervous that they have or will inadvertently say something upsetting or “wrong” about trans-ness. At this point, folks tend to withdraw, and any chance of moving into a deeper conversation evaporates.
In fact, Will Ferrell has said that this was his biggest fear going into his latest project: Will & Harper, a documentary of Ferrell on a cross country road trip with his long time friend and collaborator Harper Steele who recently transitioned. As the documentary unfolds, we watch Ferrell learn to set his fears aside in order to navigate this new terrain in his friendship with Harper. For her part, Harper honestly answers vulnerable questions for us all to see, and her courage and joy are central to the moral arc of the documentary.
I wasn’t planning on watching Will & Harper, but last week my aunt started sending write-ups of the film to me. Elf is canon in our family, so Ferrell using his celebrity in this way caught her eye. Sometimes I can sense her hesitancy to make a “mistake” around me, so I viewed this as an opportunity to grow closer. In the end, I enjoyed the film much more than I thought I would (as another portrayal of white bourgeois transfemininity). Harper’s growth is tangible as the trip progresses, and Ferrell overcomes his own trepidation in a heartfelt way (interspersed with his goofy charm). Together, they model dialogues that trans folx will find familiar but that cis people tend to be uncomfortable with.
What does this to do with AI? Nothing at all. An AI’s safety system (and immaterial nature) would prevent it from even getting in the car with Harper. And an AI wouldn’t text me the trailer to Will & Harper out of the blue. The critical dialogues would never happen.
AI currently exists (primarily) as commercial products offered by tech companies to help their bottom line. These companies are motivated by profit, and their profits decrease when their products start spewing hate. It’s bad press. The cheapest option is build in blunt safeties which systematically blinds the algorithm to certain topics, including queerness.
So if AI is the future of humanity, then why aren’t queer people in it?
This piece was originally published on Ev Nichols’ substack, Queer Science Lab.
oof thank you for giving me another reason to be suspicious as fuck of AI culture. I liked the film too.
loved this
i have yet to see any evidence that AI’s benefits exceed its drawbacks. Scared for society. Great piece.
Great article-AI def has serious drawbacks. Why is kink inherently queer though? Plenty of straight couples do it, & there’s plenty of vanilla queer people. I’m vanilla myself.