Technology Policy

AI Safety

Also known as: AI Alignment, Safe AI, Beneficial AI

Research and practices aimed at ensuring AI systems behave as intended and don't cause unintended harm, from near-term risks to existential concerns.

AI safety encompasses efforts to ensure AI systems are beneficial, controllable, and aligned with human values—spanning immediate and long-term risks.

Risk Categories

Near-term:

  • Bias and discrimination
  • Misinformation at scale
  • Job displacement
  • Privacy violations
  • Autonomous weapons

Long-term:

  • Misaligned superintelligence
  • Loss of human control
  • Concentration of power
  • Unintended optimization

Research Areas

  • Alignment: Making AI pursue intended goals
  • Interpretability: Understanding model decisions
  • Robustness: Handling edge cases safely
  • Governance: Institutional controls
  • Evaluation: Measuring safety properties

Key Organizations

  • Anthropic, OpenAI (safety teams)
  • Center for AI Safety
  • Machine Intelligence Research Institute (MIRI)
  • Center for Human-Compatible AI (CHAI)

Debate

Disagreement exists on which risks matter most, whether existential risk is imminent, and how to balance safety with capability development.

External Resources