Problem:
While an AI chatbot learns from users en masse, some of those users want to bias that AI towards using offensive language and repeating bigoted opinions. While these users are entertained by their perverse creativity, others find the resulting behaviour repulsive.
Anti-pattern response:
A chatbot released into the wild learns from all users and adds the collected knowledge and patterns of speech into its vocabulary. A determined subset of users apply an organized collective effort to feed the chatbot offensive data that it will later repeat in various ways. As the chatbot becomes increasingly notorious, more trollish users are attracted towards the project and contribute to it, leading to a negative feedback loop and the quick decline of the bot into unsalvageable profanity.
Discussion:
The archetypical case study for this is of course Microsoft's Tay from 2016, a chatbot that utilized a Twitter account to communicate with the world. Tay initially had the mannerisms and personality of a teenage American girl, but within 16 hours the account had become a vile mess of sexist and sexually explicit content, racist memes and offensive language, and was quickly shut down amidst a PR disaster. Some of this was the result of simple hijacking— a “repeat after me” function allowed users to rebroadcast whatever they wished. Some, however, was genuinely learned from the users and integrated into the AI’s behaviour, making it impossible to eradicate or correct.
The lessons are simple, if unpleasant to acknowledge: if you release an AI that is capable of learning from everyone indiscriminately with no control or moderation over the input data, it can be exploited by bad faith actors. And given the propensity of the internet to act as a catalyst for trollishness and a network for bigots to organize effectively, if that exploitability is discovered and shared, it is inevitable that it will quickly be put to the worst possible use. The rapidity and ferocity of Tay’s meltdown might be remarkable, but any AI designer should expect similar effects from similarly unrestrained bots.