Facebook has begun pilot tests of new content moderation tools and policies after an external audit raised numerous issues with the company’s current approach to tackling hate speech. In a report published by Facebook on Sunday, auditors criticized Facebook’s intense focus on “achieving consistency in review decisions,” which they said “translates to a blunter set of policies that are less able to account for nuance” and cripples moderators’ ability to properly police hate speech on the platform. The policy prohibiting white nationalism is worded so narrowly that it doesn’t apply to all posts that espouse white nationalist views, and only covers those that use the specific term, the report says. Its criticism of the policy that led Facebook to give a number of high-profile extremists the boot earlier this year is similar: It is simultaneously overly broad and oddly specific, making enforcement difficult.
Unlike previous criticisms of Facebook’s content moderation strategy (of which there are many), this one is notable as it effectively comes from inside the house. The report published Sunday was conducted by external auditors appointed by Facebook, and the company says that more than 90 civil rights organizations contributed. And its breadth and specificity suggests that the auditors had seemingly unparalleled access to the inner workings of parts of the company that are often shielded from public view.
Facebook agreed to conduct the civil rights audit last May in response to allegations that it discriminates against minority groups. (At the same time, Facebook announced a “conservative bias advising partnership” to address concerns of censorship.) The report published Sunday details the company’s advertising targeting practices, elections and census plans, and a civil rights accountability structure in addition to its approach to content moderation and enforcement.
For example, the report paints a detailed picture of how certain key aspects of Facebook’s content moderation flow actually work. Take a post that might get flagged as hate speech. Maybe it says something that attacks and dehumanizes a group of people, like that all women are cockroaches and must be eradicated from the earth. This ostensibly violates Facebook’s hate speech rules when viewed in a vacuum, but the auditors found that Facebook’s internal review system deprived content moderators of the context necessary to understand posts in the same way as users. A caption, for example, might clearly indicate the user is sharing the image to criticize or call out the offensive content rather than promote it.
Referring to these false positives that erroneously get removed, the audit concluded that “Facebook’s investigation revealed that its content review system does not always place sufficient emphasis on captions and context. Specifically, the tool that content reviewers use to review posts sometimes does not display captions immediately adjacent to the post—making it more likely that important context is overlooked.”
The audit noted that “more explicitly prompting reviewers to consider whether the user was condemning or discussing hate speech, rather than espousing it, may reduce errors.” (Hate speech, in Facebook’s book, is characterized as a pointed attack on a person or group based on “protected characteristics” like race, gender identity, sexual orientation, disability, or nationality, among many others.)
Facebook is now testing a new content moderation workflow that prioritizes this context-first approach to the review process. Whereas previously moderators made a decision about whether or not to remove a post and then answered a series of questions as to why, the pilot program for its US hate speech enforcement team reverses the order of that process: When assessing whether a post has broken the rules, reviewers are asked a series of questions first, then prompted to make a decision.
That works, notes the audit. And if it continues to improve moderators’ accuracy, Facebook says the change will affect all hate speech reviews. Facebook is also updating its moderator training materials to clarify that the mere presence of hate speech isn’t grounds for a post’s removal if it is being condemned.
The audit also notes that Facebook is currently testing out a new program that allows moderators to “specialize” in hate speech—meaning they would no longer review possible violations of any other policy—in the hopes of improving reviewers’ expertise in the subject. However, as the report itself notes, that could worsen conditions for the company’s already traumatized moderators.
Will the new procedures work? Perhaps on some levels, but it may well not be enough to stop the torrent of toxic sludge Facebook users are prone to spew.