Build the most accurate, trustworthy judges, classifiers, and extractors in hours, not weeks.
A new way to build AI that works with, not against you.
Sutro Functions
A new way to quickly build expert-aligned judges, classifiers, and extractors.

Support Agent Judge v1.3
Pass/fail judge for our new customer support agent.
Say goodbye to slow, brittle prompt engineering and massive, costly labeling queues
Stop wasting time crafting unstable prompts and manually creating golden sets while staying stuck in eval hell.
And hello to accurate, consistent, and trustworthy decision-making
Sutro auto-labels your data, surfacing only ambiguous cases for last-mile preference learning. Labeling is a breeze - as easy as a left or right swipe.
Functions are life-long learners
Once deployed to production, learning doesn’t end. Use confidence scores to surface new edge cases, data drift, or regressions and send them to a queue for continual learning.
How It Works
Bring unlabeled data,
a simple task definition.
No ground-truth or golden set is needed.
Choose the best decision and rationale or add your own.
We compile your decision
preferences and learn your
generalizable rules.
Automatic prompt optimization, oh my.
Loop in your experts
Easily send and receive labeling requests to internal or external teams, empowering everyone in your org to scale their decision making.
Once your task is learned,
we produce an expert model
ready for usage at scale.
Our functions return calibrated, numerical confidence scores so you can fill in any remaining gaps discovered in production.
The building blocks for confident, high-volume AI
Sutro lets you confidently scale decisions you know you can trust.
LLM-as-a-judge
Build and run high quality automated evals for AI products or agents. When your judges work, your product works.
Great for:
LLM output evaluation
Pass/fail agent traces
QA gates
Classify
Organize unstructured data into one or several pre-defined categories, with confidence scores you can actually trust.
Great for:
Routers
Triaging systems
Semantic filters
Extract
Pull structured spans, keywords, and relevant passages into normalized schemas.
Great for:
Structuring large datasets for analytics
Document retrieval systems
Normalization scripts
Sutro Batch
Run Sutro Functions, custom models, and pre-trained LLMs over large datasets with thousands, or millions of inputs.
10x
Faster
5x
Less Expensive
Simple Python SDK compatible with most data tools and dataframe libraries.




