AI & Generative Media

Inference

Also known as: Model Inference, AI Inference, Prediction

The process of running a trained AI model to generate predictions or outputs from new inputs—the 'using' phase as opposed to training.

Inference is when a trained AI model processes new inputs to generate outputs—predictions, text, images, or decisions.

Training vs. Inference

TrainingInference
Learns patterns from dataApplies learned patterns
Computationally expensiveRelatively lightweight
Done once (or periodically)Done continuously
Requires labeled dataProcesses new inputs

Infrastructure

Inference can run on:

  • Cloud APIs: OpenAI, Anthropic, Google
  • Edge devices: Phones, IoT, embedded systems
  • On-premise: Private servers for data security
  • Specialized hardware: GPUs, TPUs, inference chips

Costs

While cheaper than training, inference costs add up at scale. Optimization techniques include:

  • Model quantization (smaller precision)
  • Batching requests
  • Caching common responses
  • Smaller, distilled models