Transformer
Also known as: Transformer Architecture, Transformer Model
A neural network architecture using self-attention mechanisms, forming the foundation of modern LLMs like GPT and Claude.
Transformers are a neural network architecture that revolutionized AI by using self-attention mechanisms to process sequential data in parallel.
Key Innovation
Unlike previous architectures (RNNs, LSTMs), transformers:
- Process entire sequences simultaneously
- Use attention to weigh relationships between all elements
- Scale efficiently with more compute and data
How Attention Works
The model learns to focus on relevant parts of the input when generating each output token—like how you might re-read earlier sentences to understand context.
Impact
Introduced in 2017, transformers now power:
- Language models: GPT, Claude, Gemini, Llama
- Image generation: Vision Transformers (ViT)
- Multimodal AI: Models handling text, images, audio
Why It Matters
Transformers enabled the scaling laws that make modern AI possible—more parameters and data yield predictably better performance.
External Resources
Related Terms
Related Writing
The Shifting Value Proposition in the AI Era
December 24, 2025
Correction Penalty - The Plane of Infinite Tweaking
December 1, 2025
AI Ecosystem Capital Flows (updated)
November 20, 2025
Meta's Mind-Reading AI Sparks Urgent Call for Brain Data Privacy
November 13, 2025
Memetic Lexicon
October 7, 2025