Transformer
Also known as: Transformer Architecture, Transformer Model
A neural network architecture using self-attention mechanisms, forming the foundation of modern LLMs like GPT and Claude.
Transformers are a neural network architecture that revolutionized AI by using self-attention mechanisms to process sequential data in parallel.
Key Innovation
Unlike previous architectures (RNNs, LSTMs), transformers:
- Process entire sequences simultaneously
- Use attention to weigh relationships between all elements
- Scale efficiently with more compute and data
How Attention Works
The model learns to focus on relevant parts of the input when generating each output token—like how you might re-read earlier sentences to understand context.
Impact
Introduced in 2017, transformers now power:
- Language models: GPT, Claude, Gemini, Llama
- Image generation: Vision Transformers (ViT)
- Multimodal AI: Models handling text, images, audio
Why It Matters
Transformers enabled the scaling laws that make modern AI possible—more parameters and data yield predictably better performance.
External Resources
Related Terms
Related Writing
When Memory Became a Service
May 20, 2026
Mind the Gap: The Shrinking Lead of Closed-Source AI
March 21, 2026
Distributed Agency in Human-AI Systems: A Framework for Analyzing Authorship, Control, and Autonomy
March 18, 2026
The Loss of Terra Firma: Fragmented Consciousness
March 2, 2026
CLI FTW / Command Line Interface For The Win
February 28, 2026