Docs

Pricing

Resources

Get Access

Book Demo

Docs

Pricing

Resources

Get Access

Book Demo

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Get Started

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Get Started

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

Loading Models

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Customer review analysis

Unlock product insights from thousands of reviews in minutes

Easily sift through thousands of product reviews and unlock valuable insights. Run LLM batch jobs in hours, not days, at a fraction of the cost.

Get Access

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

Loading Models

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Get Started

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

Loading Models

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Get Started

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

Loading Models

Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

From Idea to Millions of Requests, Simplified

Sutro takes the pain away from testing and scaling LLM batch jobs to unblock your most ambitious AI projects.

import sutro as so

from pydantic import BaseModel

class ReviewClassifier(BaseModel):

sentiment: str

user_reviews = '.

User_reviews.csv

User_reviews-1.csv

User_reviews-2.csv

User_reviews-3.csv

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema=ReviewClassifier)

Progress: 1% | 1/514,879 | Input tokens processed: 0.41m, Tokens generated: 591k

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Prototype

Start small and iterate fast on your review analysis workflow. Accelerate experiments by testing on a small batch before committing to large jobs.

Scale

Scale your analysis to process millions of reviews and billions of tokens in hours, not days, with no infrastructure headaches or exploding costs.

Integrate

Seamlessly connect Sutro to your existing workflows. Sutro's Python SDK is compatible with popular data orchestration tools like Airflow and Dagster.

Go from reviews to insights in minutes

Shorten development cycles by getting feedback from large batch jobs in minutes, not days. Parallelize your LLM calls to get results faster.

Analyze reviews at a fraction of the cost

Reduce costs by 10x or more compared to traditional methods. Sutro's batch processing makes large-scale review analysis affordable.

Scale from hundreds to millions of reviews

Confidently handle millions of reviews and billions of tokens at a time without the pain of managing infrastructure.

Related Use Cases

Sentiment analysis

Longer description goes here, should span multiple lines.

Structured Extraction

Transform unstructured review text into structured insights that drive business decisions.

Embedding Generation

Easily convert large corpuses of free-form text from reviews into vector representations for semantic search and recommendations.

Product insight mining

Easily sift through thousands of product reviews and unlock valuable product insights while brewing your morning coffee.

Content personalization

Tailor your marketing and advertising efforts to thousands of individuals based on insights from their feedback to dramatically increase response rates.

Document summarization

Condense long-form customer feedback into concise summaries to quickly identify key themes and issues.

Related Use Cases

Sentiment analysis

Longer description goes here, should span multiple lines.

Structured Extraction

Transform unstructured review text into structured insights that drive business decisions.

Embedding Generation

Easily convert large corpuses of free-form text from reviews into vector representations for semantic search and recommendations.

Product insight mining

Easily sift through thousands of product reviews and unlock valuable product insights while brewing your morning coffee.

Content personalization

Tailor your marketing and advertising efforts to thousands of individuals based on insights from their feedback to dramatically increase response rates.

Document summarization

Condense long-form customer feedback into concise summaries to quickly identify key themes and issues.

Related Use Cases

Sentiment analysis

Longer description goes here, should span multiple lines.

Structured Extraction

Transform unstructured review text into structured insights that drive business decisions.

Embedding Generation

Easily convert large corpuses of free-form text from reviews into vector representations for semantic search and recommendations.

Product insight mining

Easily sift through thousands of product reviews and unlock valuable product insights while brewing your morning coffee.

Content personalization

Tailor your marketing and advertising efforts to thousands of individuals based on insights from their feedback to dramatically increase response rates.

Document summarization

Condense long-form customer feedback into concise summaries to quickly identify key themes and issues.

FAQ

What is Sutro?

What can I use Sutro for?

How does Sutro reduce costs?

Can I integrate Sutro with my existing tools?

How does Sutro handle large-scale jobs?

What Will You Scale with Sutro?

Get Access