‘Ten students of diverse backgrounds’ helped developed algorithm
Scientists at the University of California, Berkeley, are developing a tool that uses artificial intelligence to identify “hate speech” on social media, a program that researchers hope will out-perform human beings in identifying bigoted comments on Twitter, Reddit and other online platforms.
Scientists at Berkeley’s D-Lab “are working in cooperation with the [Anti-Defamation League] on a ‘scalable detection’ system—the Online Hate Index (OHI)—to identify hate speech,” the Cal Alumni Association reports.
In addition to artificial intelligence, the program will use several different techniques to detect offensive speech online, including “machine learning, natural language processing, and good old human brains.” Researchers aim to have “major social media platforms” one day utilizing the technology to detect “hate speech” and eliminate it, and the users who spread it, from their networks.
Current technology mainly involves the use of “keyword searches,” one researcher states, which are “fairly imprecise and blunt.” Current algorithms can be fooled by simply spelling words differently, for instance:
The OHI intends to address these deficiencies. Already, their work has attracted the attention and financial support of the platforms that are most bedeviled—and that draw the most criticism—for hate-laced content: Twitter, Google, Facebook, and Reddit…
D-Lab initially enlisted ten students of diverse backgrounds from around the country to “code” the posts, flagging those that overtly, or subtly, conveyed hate messages. Data obtained from the original group of students were fed into machine learning models, ultimately yielding algorithms that could identify text that met hate speech definitions with 85 percent accuracy, missing or mislabeling offensive words and phrases only 15 percent of the time.
Though the initial ten coders were left to make their own evaluations, they were given survey questions (e.g. “…Is the comment directed at or about any individual or groups based on race or ethnicity?) to help them differentiate hate speech from merely offensive language. In general, “hate comments” were associated with specific groups while “non-hate” language was linked to specific individuals without reference to religion, race, gender, etc. Under these criteria, a screed against the Jewish community would be identified as hate speech while a rant—no matter how foul—against an African-American celebrity might get a pass, as long as his or her race wasn’t cited.
One researcher warned against the possibility of inadvertent censorship: “Unless real restraint is exercised, free speech could be compromised by overzealous and self-appointed censors.” The lab is thus “working to minimize bias with proper training and online protocols that prevent operators from discussing codes or comments with each other.”
IMAGE: Ashley Marinaccio / Flickr.com