Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating


Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating


Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Data normalization

Normalize your data in hours, not days

Transform messy, inconsistent data into clean, analytics-ready datasets. Sutro takes the pain out of normalizing millions of records, helping you unlock insights without infrastructure headaches or exploding costs.

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating


Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating


Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

Batch LLM Inference is better with Sutro

Run LLM Batch Jobs in Hours, Not Days, at a Fraction of the Cost.

Generate a question/answer pair for the following chunk of vLLM documentation

Inputs

Outputs

Intro to vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

Loading Models

vLM models can be loaded in two different ways. To pass a loaded model into the vLLM framework for further processing and inference without reloading it from disk or a model hub, first start by generating


Using the Open AI Server

Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory. Further reading can be found in Run:ai Model Streamer Documentation.

vLLM supports loading weights in Safetensors format using the Run:ai Model Streamer. You first need to install vLLM RunAI optional dependency:

Question: Is vLLM compatible with all open-source models? ...

Question: How do I load a custom model from HuggingFace? ...

Question: Can I use the OpenAI compatible server to replace calls...

+128 more…

From messy data to standardized insights

Sutro simplifies the entire data normalization workflow, from initial testing to full-scale production, all within your existing data ecosystem.

import sutro as so

from pydantic import BaseModel

class ReviewClassifier(BaseModel):

sentiment: str

user_reviews = '.

User_reviews.csv

User_reviews-1.csv

User_reviews-2.csv

User_reviews-3.csv

system_prompt = 'Classify the review as positive, neutral, or negative.'

results = so.infer(user_reviews, system_prompt, output_schema=ReviewClassifier)

Progress: 1% | 1/514,879 | Input tokens processed: 0.41m, Tokens generated: 591k

█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Prototype

Start small and iterate fast on your normalization workflows. Accelerate experiments by testing your prompts on Sutro before committing to large jobs.

Scale

Scale your normalization workflows so your team can do more in less time. Process billions of tokens in hours, not days, with no infrastructure headaches.

Integrate

Seamlessly connect Sutro to your existing LLM workflows. Sutro's Python SDK is compatible with popular data orchestration tools, like Airflow and Dagster.

Reduce normalization costs by 10x or more

Get consistent, clean data faster and reduce costs significantly by parallelizing LLM calls through Sutro's batch processing.

Normalize millions of records effortlessly

Normalize millions of records effortlessly

Normalize millions of records effortlessly

Confidently process millions of messy CRM entries or product catalogs to create standardized data without the pain of managing complex infrastructure.

Shorten development cycles

Test your data normalization logic on a large batch job and get feedback in minutes before committing to processing your entire dataset.

Structured Extraction

Longer description goes here, should span multiple lines.

Unstructured ETL

Convert your massive amounts of free-form text into analytics-ready datasets without the pains of managing your own infrastructure.

Enrich Data

Improve your messy product catalog data, enrich your CRM entries, or gather insights from your historical meeting notes.

Record Deduplication

Identify and merge duplicate records across massive datasets to create a single source of truth for your business.

Data mastering

Create a definitive, master record for key data entities by consolidating information from various disparate sources.

Metadata generation

Automatically generate descriptive tags, categories, or other metadata for large volumes of unstructured content.

Structured Extraction

Longer description goes here, should span multiple lines.

Unstructured ETL

Convert your massive amounts of free-form text into analytics-ready datasets without the pains of managing your own infrastructure.

Enrich Data

Improve your messy product catalog data, enrich your CRM entries, or gather insights from your historical meeting notes.

Record Deduplication

Identify and merge duplicate records across massive datasets to create a single source of truth for your business.

Data mastering

Create a definitive, master record for key data entities by consolidating information from various disparate sources.

Metadata generation

Automatically generate descriptive tags, categories, or other metadata for large volumes of unstructured content.

Structured Extraction

Longer description goes here, should span multiple lines.

Unstructured ETL

Convert your massive amounts of free-form text into analytics-ready datasets without the pains of managing your own infrastructure.

Enrich Data

Improve your messy product catalog data, enrich your CRM entries, or gather insights from your historical meeting notes.

Record Deduplication

Identify and merge duplicate records across massive datasets to create a single source of truth for your business.

Data mastering

Create a definitive, master record for key data entities by consolidating information from various disparate sources.

Metadata generation

Automatically generate descriptive tags, categories, or other metadata for large volumes of unstructured content.

FAQ

What is Sutro?

How does Sutro help with costs?

What kind of workflows can I run on Sutro?

Does Sutro integrate with my existing tools?

How do I get started with Sutro?

What Will You Scale with Sutro?