Synthetic Data Generation in minutes, not days.
The fastest path to synthetic data generation. Augment existing datasets or generate entirely new ones from scratch. Offline generations, RL rollouts, and distillation pairs in a cinch.
import sutro as so
import polars as pl
from pydantic import BaseModel
df = pl.read_csv('customer-support-dialogues-20k.csv')
system_prompt = "Generate a novel product review. Include a title, text, author, product name, product description, product category, and rating out of 5."
class ProductReview(BaseModel):
review_title: str
review_text: str
product_name: str
product_category: str
rating_out_of_5: int
results = so.infer(
[""] * 100, # <-- generate 100 random reviews
model="qwen-3-14b",
system_prompt=system_prompt,
output_schema=ProductReview,
results = so.infer(
[""] * 100, # <-- generate 100 random reviews
model='qwen-3-14b',
system_prompt=system_prompt,
output_schema=ProductReview
random_seed_per_input=True # <-- uses a random seed for each input
)
results = so.await_job_completion(results, with_original_df=df)
print(results.head())
New Datasets in Minutes, Not Days
Simulate tens of thousands of realistic user interactions in minutes. Augment your existing data to remove PII and class biases in a few dozen lines of code. The dataset of your dreams is just a few keystrokes away.
Datasets that Don’t Break the Bank
Augment existing datasets or create entirely new ones for a fraction of the cost of human labeling and competing inference services. Sutro is up to 10x cheaper, with additional savings for pre-committed usage.
Great Data is a Team Sport - Treat it Like Winners Do
Seamlessly collaborate and share results with teammates, track experiments, and view live results as they’re being produced. Use Sutro’s LLM-as-a-judge capabilities to automatically refine results and build the highest-quality dataset for your needs.
Synthetic Data with LLMs - Zero to Hero
Get up and running with synthetic data generation using the Sutro Python SDK.
Read Guide
Synthetic Data For Privacy Preservation
Learn how to create useful synthetic data from the relevant characteristics of another dataset while reducing privacy concerns.
Read Guide
Go Deeper On Synthetic Data
Generating 1 Million Synthetic Humans - a New Method for Seeding Diverse LLM Outputs
We demonstrate a new method for seeding diverse LLM responses, and release an accompanying open-source dataset of 1 million synthetic humans.
The Future (and Present) of AI is Synthetic Data
Until recently, computers could only follow rigid statistical rules. LLMs are changing that and unleashing a future powered by synthetic data.
Simulate User Data
Test your models on simulated user data before hitting production.
Augment Existing Datasets
Boost class representation, smooth statistical long-tails, or generate entirely new examples.
Improve Retrieval Performance
Generate Q/A pairs to enrich embeddings for RAG and search quality enhancement.
Remove Identifiers
Increase data portability and usability by reducing PII occurrences in unstructured data.
Agent Simulations
Create thousands of agent trajectories to catch and remove unexpected behavior.
Scalable Offline Generations
Observe model behavior and outputs at near-limitless scale and detect anomalies using Sutro’s LLM-as-a-judge capabilities.
FAQ
70%
Lower Costs
1B+
Tokens Per Job
10X
Generate High-Quality Data On Demand
Stop wrestling with infra and get the data you need, today.





