Synthetic Data Generation in minutes, not days.

 The fastest path to synthetic data generation. Augment existing datasets or generate entirely new ones from scratch. Offline generations, RL rollouts, and distillation pairs in a cinch.

import sutro as so

import polars as pl

from pydantic import BaseModel


df = pl.read_csv('customer-support-dialogues-20k.csv')


system_prompt = "Generate a novel product review. Include a title, text, author, product name, product description, product category, and rating out of 5."


class ProductReview(BaseModel):

review_title: str

review_text: str

product_name: str

product_category: str

rating_out_of_5: int

results = so.infer(

    [""] * 100, # <-- generate 100 random reviews

    model="qwen-3-14b", 

    system_prompt=system_prompt, 

    output_schema=ProductReview,


results = so.infer(

[""] * 100, # <-- generate 100 random reviews

model='qwen-3-14b',

system_prompt=system_prompt,

output_schema=ProductReview

random_seed_per_input=True # <-- uses a random seed for each input

)


results = so.await_job_completion(results, with_original_df=df)


print(results.head())

┌────────────────────────────┬────────────────────────────┬────────────────────────────┬────────────────────────────┬─────────────────┐
│ review_title               ┆ review_text                ┆ product_name               ┆ product_category           ┆ rating_out_of_5 │
│ ---                        ┆ ---                        ┆ ---                        ┆ ---                        ┆ ---             │
│ str                        ┆ str                        ┆ str                        ┆ str                        ┆ i64             │

╞════════════════════════════╪════════════════════════════╪════════════════════════════╪════════════════════════════╪═════════════════╡

│ A Simple Win for Mental C… ┆ As someone who spends mos… ┆ Craft-tastic – Empower Po… ┆ Home & Office Decor / Cre… ┆ 5               │

│ A Simple Joy for My Kids—… ┆ As a sales rep who spends… ┆ Melissa & Doug Dot-to-Dot… ┆ Educational Children's To… ┆ 5               │
│ A Smart Upgrade for My Ur… ┆ As someone who lives in o… ┆ RPM Rear Shock Tower for … ┆ Bicycle Accessories        ┆ 5               │

│ Crank & Crash Derby – A N… ┆ As someone who’s spent th… ┆ Disney Pixar Cars Mini Ra… ┆ Toys & Games - Action Fig… ┆ 4               │

│ Small but Mighty — My Fir… ┆ As someone who’s spent ye… ┆ Areaware Cubebot Small     ┆ Smart Home Devices         ┆ 5               │

└────────────────────────────┴────────────────────────────┴────────────────────────────┴────────────────────────────┴─────────────────┘

High-Quality Datasets On Demand

High-Quality Datasets On Demand

New Datasets in Minutes, Not Days

Simulate tens of thousands of realistic user interactions in minutes. Augment your existing data to remove PII and class biases in a few dozen lines of code. The dataset of your dreams is just a few keystrokes away.

Datasets that Don’t Break the Bank

Augment existing datasets or create entirely new ones for a fraction of the cost of human labeling and competing inference services. Sutro is up to 10x cheaper, with additional savings for pre-committed usage.

Great Data is a Team Sport - Treat it Like Winners Do

Seamlessly collaborate and share results with teammates, track experiments, and view live results as they’re being produced. Use Sutro’s LLM-as-a-judge capabilities to automatically refine results and build the highest-quality dataset for your needs.

Synthetic Data with LLMs - Zero to Hero

Get up and running with synthetic data generation using the Sutro Python SDK.

Read Guide

Synthetic Data For Privacy Preservation

Learn how to create useful synthetic data from the relevant characteristics of another dataset while reducing privacy concerns.

Read Guide

Sutro lets our researchers fire off batch inference—whether it’s a thousand samples or a few billion—through one API call. They don’t have to check cluster queues or negotiate priorities; the job runs immediately with a predictable, fast return-time.

Sutro lets our researchers fire off batch inference—whether it’s a thousand samples or a few billion—through one API call. They don’t have to check cluster queues or negotiate priorities; the job runs immediately with a predictable, fast return-time.

Nathan Lile

Nathan Lile

Nathan Lile

CEO, Synthlabs

CEO, Synthlabs

CEO, Synthlabs

Learn More In Our Case Study

SynthLabs—a pioneering AI research lab cited by DeepMind, Meta, and NVIDIA—needed to shorten its large-scale synthetic data generation efforts from months to days. By partnering with Sutro, SynthLabs generated a 351 billion-token dataset with 10x greater speed and 80% lower costs, turning complex research ideas into production-grade results without infrastructure bottlenecks.

Data Curation Superpowers For Ordinary People

Data Curation Superpowers For Ordinary People

Simulate User Data

Test your models on simulated user data before hitting production.

Augment Existing Datasets

Boost class representation, smooth statistical long-tails, or generate entirely new examples.

Improve Retrieval Performance

Generate Q/A pairs to enrich embeddings for RAG and search quality enhancement.

Remove Identifiers

Increase data portability and usability by reducing PII occurrences in unstructured data.

Agent Simulations

Create thousands of agent trajectories to catch and remove unexpected behavior.

Scalable Offline Generations

Observe model behavior and outputs at near-limitless scale and detect anomalies using Sutro’s LLM-as-a-judge capabilities.

FAQ

What is Sutro?

Do I need to code to use Sutro?

How much can I save using Sutro?

How do I handle rate limits in Sutro?

Can I deploy Sutro within my VPC?

Are open-source LLMs good?

Is my data secure in Sutro?

Can I use custom models in Sutro?

How can I load data into Sutro?

How do I sign up for Sutro?

What is Sutro?

Do I need to code to use Sutro?

How much can I save using Sutro?

How do I handle rate limits in Sutro?

Can I deploy Sutro within my VPC?

Are open-source LLMs good?

Is my data secure in Sutro?

Can I use custom models in Sutro?

How can I load data into Sutro?

How do I sign up for Sutro?

What is Sutro?

Do I need to code to use Sutro?

How much can I save using Sutro?

How do I handle rate limits in Sutro?

Can I deploy Sutro within my VPC?

Are open-source LLMs good?

Is my data secure in Sutro?

Can I use custom models in Sutro?

How can I load data into Sutro?

How do I sign up for Sutro?

70%

Lower Costs

1B+

Tokens Per Job

10X

Faster Job Processing

Faster Processing

Generate High-Quality Data On Demand

Stop wrestling with infra and get the data you need, today.