Docs

Pricing

Handbook

Request Access

Docs

Pricing

Handbook

Request Access

Pricing Tiers

Pay-as-you-go

For individual developers and hobbyists

No Platform Fee

$50 free credits at signup

Platform, Web UI, and SDK

All pre-trained models

Scale

For teams with large monthly data processing or generation needs

$500 / Month Platform Fee

$100 free credits per month

Monthly

Annual

Scale

For teams with large monthly data processing or generation needs

$500 / Month Platform Fee

$100 free credits per month

Monthly

Annual

Everything in Pay-as-you-go and…

24-hour email support SLA

Enterprise

For organizations with ongoing, large-scale needs

Custom Platform Fee

Pay for compute time directly

Everything in Team and…

Custom support packages

Custom Integrations

Compare Features

Pay-as-you-go

Scale

Enterprise

FUNCTIONS

Stored Functions

Custom

Iterations Per Function

Unlimited

Maximum Labels Per Iteration

Unlimited

Cost Per Additional Function

$79 / Month

$49 / Month

Custom

Avg. Cost per 1M tokens (Batch)

$0.13

Machine-time Pricing

Avg. Cost per 1M tokens (Standard)

$0.45

Machine-time Pricing

Bring your own cloud

Our managed cloud

BATCH Jobs

Input Tokens / Job

250 Million

1 Billion

Custom Quotas

Tokens / Day

2 Billion

20 Billion

Custom Quotas

P0 Jobs (Prototyping)

P1 Jobs (1 Hour)

Custom Acceleration

Pay-as-you-go

Scale

Enterprise

Jobs

Models

All pre-trained models

Job Types
(concurrency, speed)

p0 (prototyping), p1 (1 hour)

Job Quotas (scale)

Up to 250m input tokens/job 2 Billion tokens/day

Data

Data Retention

Up to 90 days

Data Residency

Our managed storage

Compute Residency

Our managed cloud

Available Models

We select frontier open-source models that are very adept at typical batch inference tasks. If you need help finding the right model for you task, please reach out to [email protected] and we would be happy to help.

We select frontier open-source models that are very adept at typical batch inference tasks. If you need help finding the right model for your task, please reach out to [email protected] and we would be happy to help.

Text and Vision Models

Model ID

Avg. Cost / 1m tokens

Context Window

gpt-oss-20b

$0.09

131,072

gpt-oss-120b

$0.32

131,072

qwen-3.5-27b

$0.48

262,144

qwen-3.5-27b-thinking

$0.48

262,144

qwen-3.5-35b-a3b

$0.14

262,144

qwen-3.5-35b-a3b-thinking

$0.14

262,144

qwen-3.5-122b-a10b

$0.57

262,144

qwen-3.5-122b-a10b-thinking

$0.57

262,144

nemotron-3-super-120b-a12b

$0.53

262,144

nemotron-3-super-120b-a12b-thinking

$0.53

262,144

nemotron-3-nano-30b-a3b

$0.10

262,144

nemotron-3-nano-30b-a3b-thinking

$0.10

262,144

gemma-4-31b

$0.95

131,072

gemma-4-31b-thinking

$0.95

131,072

gemma-4-26b-a4b

$0.19

131,072

gemma-4-26b-a4b-thinking

$0.19

131,072

Embedding Models

Model ID

Avg. Cost / 1m tokens

Context Window

qwen-3-embedding-0.6b

$0.01

32,768

qwen-3-embedding-8b

$0.05

32,768

Custom Models

We also offer support for custom and fine-tuned models on a per-request basis. To discuss such needs, please reach out at [email protected].

Notes

We serve quantized versions for some of the models we offer. This is done to pass on further time and cost savings to users, however if you have a workload that could benefit from full precision inference - we'd like to learn more - please reach out to [email protected].
Average token prices are based on blended input and output costs, weighted according to representative batch inference workload shapes. Actual pricing will depend on total usage. We encourage users to estimate costs ahead of job submission using the dry run functionality described in the documentation. For questions on pricing, please reach out to [email protected].

What Will You Scale with Sutro?

Get Access

Ask AI about Sutro