Deployment | Sutro Handbook

Deployment pages cover the runtime choices that shape AI system cost, latency, reliability, and operational control.

Pages in This Section

Batch vs. Real-Time Inference: when to run analytical AI workloads as batch jobs instead of real-time APIs.
Model Selection: how to choose a model based on task fit, cost, latency, control, and operational constraints.

In This Section

Batch vs. Real-Time Inference Faster, cheaper, better Model Selection Selecting the right model for the task at hand.