Pricing Tiers
Pay-as-you-go
$0
Monthly fee
For individual developers and hobbyists
Growth
$31
Monthly fee
For teams with large monthly data processing or generation needs
Enterprise
Custom
Let's talk!
For organizations with highly custom needs
Compare Features
Pay-as-you-go
Growth
Enterprise
Platform, Web UI, and SDK
Models
All pre-trained models
Up to 10 custom models
Unlimited custom models
Job Types
(concurrency, speed)
p0 (prototyping) p1 (1 hour)
p0 (prototyping) p1 (1 hour), p2 (30 mins), p3 (20 mins)
Custom acceleration
Job Quotas (scale)
Up to 250m input tokens/job 2 Billion tokens/day
Up to 1B input tokens/job 20 Billion tokens/day
Custom quotas
Data Retention
Up to 90 days
Up to 180 days
Unlimited Retention
Data Residency
Our managed storage
Bring your own s3-compatible bucket
-
Compute Residency
Our managed cloud
-
Bring your own cloud
External Integrations
-
HuggingFace
Custom Integrations Available
Credits
One-time 100 free credits
250 free credits per month
Custom credit setup
Support
Slack Community
24-hour email support SLA
Custom support packages
Available Models
We select frontier open-source models that are very adept at typical batch inference tasks. If you need help finding the right model for you task, please reach out to team@sutro.sh and we would be happy to help.
Text and Vision Models
Model ID
Avg. Cost / 1m tokens
Context Window
Reasoning Models
Model ID
Avg. Cost / 1m tokens
Context Window
Embedding Models
Model ID
Avg. Cost / 1m tokens
Context Window
Custom Models
We also offer support for custom and fine-tuned models on a per-request basis. To discuss such needs, please reach out at team@sutro.sh.
Notes
We serve quantized versions for some of the models we offer. This is done to pass on further time and cost savings to users, however if you have a workload that could benefit from full precision inference - we'd like to learn more - please reach out to team@sutro.sh.
Average token prices are based on blended input and output costs, weighted according to representative batch inference workload shapes. Actual pricing will depend on total usage. We encourage users to estimate costs ahead of job submission using the dry run functionality described in the documentation. For questions on pricing, please reach out to team@sutro.sh.