Pricing

Documentation

Blog

Get Access

Synthetic Data Generation

Classification

Structured Extraction

Embedding Generation

Summarization

Is Better With Sutro

Run LLM batch jobs in hours, not days, at a fraction of the cost.

Get Access

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Get Started

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

Loading Models

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Get Started

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

Loading Models

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

From Idea to Millions of Requests, Simplified

Sutro takes the pain away from testing and scaling LLM batch jobs to unblock your most ambitious AI projects.

import sutro as so

from pydantic import BaseModel

class ReviewClassifier(BaseModel):

sentiment: str

user_reviews = '.

User_reviews.csv

User_reviews-1.csv

User_reviews-2.csv

User_reviews-3.csv

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema=ReviewClassifier)

Progress: 1% | 1/514,879 | Input tokens processed: 0.41m, Tokens generated: 591k

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Rapidly Prototype

Shorten development cycles by getting feedback from large batch jobs in as little as minutes before scaling up.

Reduce Costs

Get results faster and reduce costs by 10x or more by parallelizing your LLM calls through Sutro.

Scale Effortlessly

Confidently handle millions of requests, and billions of tokens at a time without the pain of managing infrastructure.

From Idea to Millions of Requests, Simplified

Sutro takes the pain away from testing and scaling LLM batch jobs to unblock your most ambitious AI projects.

import sutro as so

from pydantic import BaseModel

class ReviewClassifier(BaseModel):

sentiment: str

user_reviews = '.

User_reviews.csv

User_reviews-1.csv

User_reviews-2.csv

User_reviews-3.csv

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema=ReviewClassifier)

Progress: 1% | 1/514,879 | Input tokens processed: 0.41m, Tokens generated: 591k

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Rapidly Prototype

Shorten development cycles by getting feedback from large batch jobs in as little as minutes before scaling up.

Reduce Costs

Get results faster and reduce costs by 10x or more by parallelizing your LLM calls through Sutro.

Scale Effortlessly

Confidently handle millions of requests, and billions of tokens at a time without the pain of managing infrastructure.

From Idea to Millions of Requests, Simplified

Sutro takes the pain away from testing and scaling LLM batch jobs to unblock your most ambitious AI projects.

import sutro as so

from pydantic import BaseModel

class ReviewClassifier(BaseModel):

sentiment: str

user_reviews = '.

User_reviews.csv

User_reviews-1.csv

User_reviews-2.csv

User_reviews-3.csv

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema=ReviewClassifier)

Progress: 1% | 1/514,879 | Input tokens processed: 0.41m, Tokens generated: 591k

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Rapidly Prototype

Shorten development cycles by getting feedback from large batch jobs in as little as minutes before scaling up.

Reduce Costs

Get results faster and reduce costs by 10x or more by parallelizing your LLM calls through Sutro.

Scale Effortlessly

Confidently handle millions of requests, and billions of tokens at a time without the pain of managing infrastructure.

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema)

Improvement in Telegraph, #174,465, Alexander Graham Bell...

Electric Lamp, #223,898, Thomas Edison

Flying Machine, #821,393, Orville and Wilbur Wright

This patent describes the first creation of a working telephone system...

This patent pertains to the original invention of the lightbulb by Thomas Edison...

This patent was issued for the first heavier-than-air aircraft created by...

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema)

Improvement in Telegraph, #174,465, Alexander Graham Bell...

Electric Lamp, #223,898, Thomas Edison

Flying Machine, #821,393, Orville and Wilbur Wright

This patent describes the first creation of a working telephone system...

This patent pertains to the original invention of the lightbulb by Thomas Edison...

This patent was issued for the first heavier-than-air aircraft created by...

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema)

Improvement in Telegraph, #174,465, Alexander Graham Bell...

Electric Lamp, #223,898, Thomas Edison

Flying Machine, #821,393, Orville and Wilbur Wright

This patent describes the first creation of a working telephone system...

This patent pertains to the original invention of the lightbulb by Thomas Edison...

This patent was issued for the first heavier-than-air aircraft created by...

Iterate

Start small and iterate fast on your LLM batch workflows. Accelerate experiments by testing on Sutro before committing to large jobs.

Scale

Scale your LLM workflows so your team can do more in less time. Process billions of tokens in hours, not days, with no infrastructure headaches or exploding costs.

Progress: 1% | 1/2.5M Rows

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Progress: 1% | 1/2.5M Rows

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Progress: 1% | 1/2.5M Rows

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Data Orchestrators

Object Storage and Open Data Formats

Notebooks and Pythonic Coding Tools

Integrate

Seamlessly connect Sutro to your existing LLM workflows. Sutro's Python SDK is compatible with popular data orchestration tools, like Airflow and Dagster.

Purpose-Built Tools for Scalable LLM Workflows

Ship faster results without complex infrastructure to scale up any LLM workflow.

Synthesize

Generate high-quality, diverse, and representative synthetic data to improve model or RAG retrieval performance, without the complexity.

Classify

Automatically organize your data into meaningful categories without involving your ML engineer.

Evaluate

Benchmark your LLM outputs to continuously improve workflows, agents and assistants, or easily evaluate custom models against a new use-case.

Extract

Transform unstructured data into structured insights that drive business decisions.

Embed

Easily convert large corpuses of free-form text into vector representations for semantic search and recommendations.

Label

Enrich your data with meaningful labels to improve model training and data preparation.

Purpose-Built Tools for Scalable LLM Workflows

Ship faster results without complex infrastructure to scale up any LLM workflow.

Synthesize

Generate high-quality, diverse, and representative synthetic data to improve model or RAG retrieval performance, without the complexity.

Classify

Automatically organize your data into meaningful categories without involving your ML engineer.

Evaluate

Benchmark your LLM outputs to continuously improve workflows, agents and assistants, or easily evaluate custom models against a new use-case.

Extract

Transform unstructured data into structured insights that drive business decisions.

Embed

Easily convert large corpuses of free-form text into vector representations for semantic search and recommendations.

Label

Enrich your data with meaningful labels to improve model training and data preparation.

Purpose-Built Tools for Scalable LLM Workflows

Ship faster results without complex infrastructure to scale up any LLM workflow.

Synthesize

Generate high-quality, diverse, and representative synthetic data to improve model or RAG retrieval performance, without the complexity.

Classify

Automatically organize your data into meaningful categories without involving your ML engineer.

Evaluate

Benchmark your LLM outputs to continuously improve workflows, agents and assistants, or easily evaluate custom models against a new use-case.

Extract

Transform unstructured data into structured insights that drive business decisions.

Embed

Easily convert large corpuses of free-form text into vector representations for semantic search and recommendations.

Label

Enrich your data with meaningful labels to improve model training and data preparation.

Common Use Cases

Improve Model Performance

Improve your LLM or RAG retrieval performance with synthetic data. Generate diverse and representative responses to fill statistical gaps.

Structure Web Pages

Crawl millions of web pages, and extract analytics-ready datasets for your company or your customers. Run standalone or successive batch jobs to explore complex link tree structures.

Enrich Data

Improve your messy product catalog data, enrich your CRM entries, or gather insights from your historical meeting notes without involving your machine learning engineer.

Personalize Content

Tailor your marketing and advertising efforts to thousands, or millions of individuals, personas, and demographics to dramatically increase response rates and ad conversions.

Unstructured ETL

Convert your massive amounts of free-form text into analytics-ready datasets without the pains of managing your own infrastructure.

Unlock Product Insights

Easily sift through thousands of product reviews and unlock valuable product insights while brewing your morning coffee.

Common Use Cases

Improve Model Performance

Improve your LLM or RAG retrieval performance with synthetic data. Generate diverse and representative responses to fill statistical gaps.

Structure Web Pages

Crawl millions of web pages, and extract analytics-ready datasets for your company or your customers. Run standalone or successive batch jobs to explore complex link tree structures.

Enrich Data

Improve your messy product catalog data, enrich your CRM entries, or gather insights from your historical meeting notes without involving your machine learning engineer.

Personalize Content

Tailor your marketing and advertising efforts to thousands, or millions of individuals, personas, and demographics to dramatically increase response rates and ad conversions.

Unstructured ETL

Convert your massive amounts of free-form text into analytics-ready datasets without the pains of managing your own infrastructure.

Unlock Product Insights

Easily sift through thousands of product reviews and unlock valuable product insights while brewing your morning coffee.

Common Use Cases

Improve Model Performance

Improve your LLM or RAG retrieval performance with synthetic data. Generate diverse and representative responses to fill statistical gaps.

Structure Web Pages

Crawl millions of web pages, and extract analytics-ready datasets for your company or your customers. Run standalone or successive batch jobs to explore complex link tree structures.

Enrich Data

Improve your messy product catalog data, enrich your CRM entries, or gather insights from your historical meeting notes without involving your machine learning engineer.

Personalize Content

Tailor your marketing and advertising efforts to thousands, or millions of individuals, personas, and demographics to dramatically increase response rates and ad conversions.

Unstructured ETL

Convert your massive amounts of free-form text into analytics-ready datasets without the pains of managing your own infrastructure.

Unlock Product Insights

Easily sift through thousands of product reviews and unlock valuable product insights while brewing your morning coffee.

FAQ

What is Sutro?

Do I need to code to use Sutro?

How much can I save using Sutro?

How do I handle rate limits in Sutro?

Can I deploy Sutro within my VPC?

Are open-source LLMs good?

Is my data secure in Sutro?

Can I use custom models in Sutro?

How can I load data into Sutro?

How do I sign up for Sutro?

What is Sutro?

Do I need to code to use Sutro?

How much can I save using Sutro?

How do I handle rate limits in Sutro?

Can I deploy Sutro within my VPC?

Are open-source LLMs good?

Is my data secure in Sutro?

Can I use custom models in Sutro?

How can I load data into Sutro?

How do I sign up for Sutro?

What is Sutro?

Do I need to code to use Sutro?

How much can I save using Sutro?

How do I handle rate limits in Sutro?

Can I deploy Sutro within my VPC?

Are open-source LLMs good?

Is my data secure in Sutro?

Can I use custom models in Sutro?

How can I load data into Sutro?

How do I sign up for Sutro?

What Will You Scale with Sutro?

Get Access

Blog

Documentation

Docs

team@sutro.sh