AI & Generative Media

Inference

Cloud APIs: OpenAI, Anthropic, Google
Edge devices: Phones, IoT, embedded systems
On-premise: Private servers for data security
Specialized hardware: GPUs, TPUs, inference chips

Also known as: Model Inference, AI Inference, Prediction

The process of running a trained AI model to generate predictions or outputs from new inputs—the 'using' phase as opposed to training.

Inference is when a trained AI model processes new inputs to generate outputs—predictions, text, images, or decisions.

Training vs. Inference

Inference can run on:

While cheaper than training, inference costs add up at scale. Optimization techniques include:

April 18, 2024

January 2, 2023