A simple workflow for data purity
Sutro simplifies the entire process of record deduplication at scale. Connect to your data sources, define your logic in Python, and let us handle the rest.
import sutro as so
from pydantic import BaseModel
class ReviewClassifier(BaseModel):
sentiment: str
user_reviews = '.
User_reviews.csv
User_reviews-1.csv
User_reviews-2.csv
User_reviews-3.csv
system_prompt = 'Classify the review as positive, neutral, or negative.'
results = so.infer(user_reviews, system_prompt, output_schema=ReviewClassifier)
Progress: 1% | 1/514,879 | Input tokens processed: 0.41m, Tokens generated: 591k
█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Prototype
Start small and iterate fast on your deduplication workflows. Accelerate experiments by testing on Sutro before committing to large jobs.
Scale
Scale your LLM workflows so your team can do more in less time. Process billions of tokens in hours, not days, with no infrastructure headaches or exploding costs.
Integrate
Seamlessly connect Sutro to your existing LLM workflows. Sutro's Python SDK is compatible with popular data orchestration tools, like Airflow and Dagster.

Cleanse datasets of any size
Confidently handle millions of requests, and billions of tokens at a time without the pain of managing infrastructure. Scale your data cleaning workflows effortlessly.
Get results faster and reduce costs by parallelizing your LLM calls through Sutro. Process your entire dataset in a single batch job for maximum efficiency.

Go from messy to clean in hours, not days
Shorten development cycles by getting feedback from large batch jobs in as little as minutes. Run LLM batch jobs in hours, not days, to accelerate your data preparation pipelines.