Intervention Category:

Content Analysis

Definition: Treating content differently based on its evaluation by automated classifiers.

Content Analysis is the pattern of applying policy-defined interventions (like takedowns, de-amplification, or shadow-bans) based on automated analysis of the content.  This is what most folks think of when they think about content moderation efforts, and it tends to take up a significant amount of the time and energies of teams working in Trust and Safety.

Though the implementation of a content analysis intervention can vary widely, it typically has the same constituent elements:

  1. Automated Identification - using some automated process, find content or behavior that likely violates policy. Identification can be as complex as using a Machine Learning Model, or as simple as looking for exact duplicates of past content.
  2. Intervention - Use a content or account specific tool to minimize the impact of the likely violative content or behavior. This can be as draconian as banning accounts or taking down content, or as subtle as down-ranking violative content.
  3. Appeals - Offer users some way to undo the mitigation through more manual review. These processes tend to be offered because no system of automated identification is right all of the time, and the cost of false positives can be high.

Strengths

  1. Iterative - Because Content Analysis' detection mechanisms (and intervention approaches) can be trivial or arbitrarily complex, it is easy for companies to stand up a process of content analysis using trivial detections and interventions, and then refine and expand them over time.
  2. Flexible - the process of Content Analysis is easily aligned with the most pressing concerns of a company, allowing a single process and approach to be used across a wide range of harms and challenges.
  3. Necessary - Every platform ends up having to implement some kind of content analysis process for illegal material like Child Sexual Abuse Material, so extending this process to other forms of harm is a straightforward extension.

Weaknesses

  1. Subjectivity - Content Analysis is explicitly discriminatory between pieces of content, and thus is constantly buffeted by the winds of controversy and anger. Discrimination between pieces of content is trivially perceived and reframed as discrimination between the beliefs, motivations, and identities of the people making the content.
  2. Errors  - Automated Analysis always has some positive and negative error rates - letting violative content through, and incorrectly punishing non-violative content. The creation of a system of automated analysis necessitates downstream the systems for mitigating the effects of its errors.
  3. Circumvention - Content Analysis suffers from offensive asymmetry: it is typically much easier for a malicious actor to create content that violates policy but fails to trigger detections than it is for a trust and safety team to develop an automated classification that catches all such content
  4. Binary - Most content analysis approaches generate binary outcomes, which leaves content that approaches the threshold ("borderline content") unmitigated.

Though content analysis is widespread, and likely to be required in some form, it is often tasked with cleaning up messes that it is underpowered to tackle. If a platform incentivizes and actively promotes vitriol, no amount of content analysis can undo the negative consequences of that design choice. This site is making an intentional and narrow argument: platforms should rely less on Content Analysis, by rethinking and redesigning their platforms to be less capable of, and attuned to, the perpetuation of harm.

Interventions using this Approach

Perform basic link vetting
Run basic validation on the contents that something links to before showing the link to the user.
Warn Before Risky Action
Use signals about affinity and content to occasionally warn the user about what they're about to see/download/visit.
Don't allow posting of Location
Use content filters to prevent users from posting addresses, latitude/longitude coordinates, or other location data.
Require unleaked passwords
During signup, don't allow users to use a password that has been included in a leaked dataset.
Label/Detect Identical Content
For some features, duplicate data suggests misuse.
Three Insult Rule
Rather than looking at whether individual pieces of content constitute harassment, consider patterns of behavior.
Loading...