Pricing Tiers

Pay-as-you-go

$0

Monthly fee

For individual developers and hobbyists

Growth

$31

Monthly fee

For teams with large monthly data processing or generation needs

Enterprise

Custom

Let's talk!

For organizations with highly custom needs

Compare Features

Pay-as-you-go

Growth

Enterprise

Platform, Web UI, and SDK

Models

All pre-trained models

Up to 10 custom models

Unlimited custom models

Job Types
(concurrency, speed)

p0 (prototyping) p1 (1 hour)

p0 (prototyping) p1 (1 hour), p2 (30 mins), p3 (20 mins)

Custom acceleration

Job Quotas (scale)

Up to 250m input tokens/job 2 Billion tokens/day

Up to 1B input tokens/job 20 Billion tokens/day

Custom quotas

Data Retention

Up to 90 days

Up to 180 days

Unlimited Retention

Data Residency

Our managed storage

Bring your own s3-compatible bucket

-

Compute Residency

Our managed cloud

-

Bring your own cloud

External Integrations

-

HuggingFace

Custom Integrations Available

Credits

One-time 100 free credits

250 free credits per month

Custom credit setup

Support

Slack Community

24-hour email support SLA

Custom support packages

Available Models

We select frontier open-source models that are very adept at typical batch inference tasks. If you need help finding the right model for you task, please reach out to team@sutro.sh and we would be happy to help.

Text and Vision Models

Model ID

Avg. Cost / 1m tokens

Context Window

$0.02

131,072

$0.03

131,072

$0.28

131,072

$0.03

32,768

$0.06

32,768

$0.15

32,768

$0.04

262,144

$0.35

131,072

$0.03

131,072

$0.07

131,072

$0.15

131,072

Reasoning Models

Model ID

Avg. Cost / 1m tokens

Context Window

Embedding Models

Model ID

Avg. Cost / 1m tokens

Context Window

Custom Models

We also offer support for custom and fine-tuned models on a per-request basis. To discuss such needs, please reach out at team@sutro.sh.

Notes

  1. We serve quantized versions for some of the models we offer. This is done to pass on further time and cost savings to users, however if you have a workload that could benefit from full precision inference - we'd like to learn more - please reach out to team@sutro.sh.

  2. Average token prices are based on blended input and output costs, weighted according to representative batch inference workload shapes. Actual pricing will depend on total usage. We encourage users to estimate costs ahead of job submission using the dry run functionality described in the documentation. For questions on pricing, please reach out to team@sutro.sh.

What Will You Scale with Sutro?