SynthLabs x Sutro: Scaling and Accelerating Synthetic Data Generation for RL

Pricing

Documentation

Blog

Case Studies

Get Access

<- Back to case studies

SynthLabs x Sutro: Scaling and Accelerating Synthetic Data Generation for RL

Sutro Team

Jul 15, 2025

SynthLabs—a pioneering AI research lab cited by DeepMind, Meta, and NVIDIA—needed to shorten its large-scale synthetic data generation efforts from months to days. By partnering with Sutro, SynthLabs generated a 351 billion-token dataset with 10x greater speed and 80% lower costs, turning complex research ideas into production-grade results without infrastructure bottlenecks.

About SynthLabs

SynthLabs is a post-training research lab with a focus on scaling AI capabilities via a combination of synthetic data, reinforcement learning (RL), and reasoning. Cited by DeepSeek, DeepMind, Meta, Microsoft, NVIDIA, ByteDance, and others—their research is shaping how the industry uses such approaches to build next-generation AI models. Their work focuses on closing the "last-mile gap" for AI, specializing models to master complex reasoning, adhere to specific safety protocols, and understand nuanced domain knowledge that public models lack.

SynthLabs research invents and applies data generation strategies within a broader reinforcement fine-tuning framework that has delivered transformative results, such as:

Matching GPT‑4 performance with a small Llama model trained entirely on synthetic data + RL, extending Anthropic’s Constitutional AI (arXiv 2402.07896).
Demonstrated that synthetic labels can rival human judgments; trained GenRM models outperformed LLMs as judges and were more robust—garnering attention from frontier labs like DeepSeek and Meta (arXiv.org 2410.12832).

SynthLabs’ Synthetic Data Generation Process

Generating high-quality synthetic datasets is at the core of SynthLabs’ research. This process often entails starting with a small, human-labeled dataset and then fine-tuning a strong, open-source LLM to bootstrap large-scale generations. Hence, there are essentially two phases of research for SynthLabs:

Prototyping and Refinement: Typically done on a small scale (typically hundreds or thousands of examples and millions of tokens), this process involves iterating on the fine-tuned model - tweaking data, adjusting hyperparameters, and bouncing between quick training and inference jobs to ensure the model is really for large-scale generations.
Production: This step involves large-scale synthetic data generation (typically millions of samples and many billions of tokens) using a finalized model configuration created during the prototyping phase. These millions of samples are then used to train specialized reward models, domain-specific judges, and reasoning systems that master tasks no public model could ever learn.

The Challenge: Demanding, Highly-Variable Research Inference Workloads

In the first phase, research velocity is key. The speed at which SynthLabs can run small-scale, offline inference workloads on custom models and validate their efficacy often determines the velocity of this step, as training runs are quite quick.

But it’s in the second phase where SynthLabs was finding extreme bottlenecks in the research process. It could take weeks to months to generate datasets on the order of hundreds of billions of tokens, severely hindering research velocity. Using a large interconnected cluster for this volume of unpredictable and variable workload is suboptimal—inference doesn't utilize the expensive interconnect, yet consumes enough cluster resources that training jobs begin to queue. Moreover, for a team focused on research rather than infrastructure - doing an in-house build of the required tooling wasn’t necessarily worth the engineering investment.

"When we're ready to scale up we had a custom model that needed to generate billions of tokens, this presents a real challenge as there's no easy way to plug a custom model into existing inference providers for massive inference generation without significant engineering setup work or really slow completion timelines. This challenge is particularly acute for one-off research pipelines like ours, where we don't plan on (or are unsure) about running this exact workflow repeatedly. In these cases having a way to quickly set up large-scale generation jobs for custom models becomes especially valuable... you need the flexibility to blast inference when the research demands it, without the overhead of building permanent infrastructure."

— Nathan Lile, CEO of SynthLabs.ai

Practical RL and Synthetic Data: A Primer

LLM agents trained with RL promise to unlock extraordinary business value, but only through deep alignment with the organization's distinct priorities and constraints.

Today's most capable closed models are trained on public data, and can get organizations most of the way there. But "most" doesn’t meet precision requirements for high-stakes deployment. For example, what might otherwise seem like a simple classification problem in an enterprise setting is often filled with domain and company-specific edge cases, tacit knowledge and know-how. An initial deployment of a closed-source foundation model may only realistically yield 80-90% accuracy on such a task without further post-training refinement.

Open-source models can be tailored to enterprise quality, regulatory, and workflow standards, but closing this “last-mile gap” demands both training infrastructure and high-quality, domain-specific data — often scarce or prohibitively expensive to label. Traditional human annotation pipelines are expensive, slow, and challenging to scale.

This is precisely where synthetic data + Reinforcement Learning (RL) become transformative—providing a scalable, cost-effective way to iterate quickly and boost model performance. Creating quality synthetic data remains an active research space, and the ability to capture reasoning, context, and workflows missing from public corpora can make RL dramatically more effective.

A few popular synthetic-data tactics include:

Expanding Existing Labels: amplify small, labeled datasets into large, comprehensive training sets. Such improvements in quality, diversity, and complexity have been an area of focus for SynthLabs.
Iteratively Bootstrapping Models: generate synthetic data with the model itself, retrain, and repeat—driving continuous improvement. Examples include generative reward models and self-taught reasoning models.

The most common pain points that synthetic data + RL can immediately address include:

Factuality: Eliminate hallucinations in a domain by generating synthetic validation chains
Instruction Following: Ensure precise, nuanced adherence to an organization’s policies and protocols
Technical Reasoning: Expand limited examples into comprehensive problem-solving capabilities
Safety & Compliance: Create industry-specific safety scenarios that don't exist in public datasets
Domain Focus: Keep AI on-topic and optimized for a company’s specific workflows
Complex Decisions: Handle real-world ambiguity where multiple valid approaches exist

Each goal above benefits from improved reasoning capabilities—but models struggle to learn behaviors they've never seen. Augmenting training datasets with domain-specific synthetic examples gives RL the foundation it needs to 'latch' onto these abilities.

Partnering with Sutro for On-Demand, Accelerated Batch Inference

SynthLabs’ partnership with Sutro was a natural fit. Sutro’s platform is purpose-built to handle the exact type of offline, batch inference workloads that were bottlenecking SynthLabs' research. Sutro excels at large-scale, accelerated offline jobs. This makes it ideal for generating the massive, specialized datasets needed for model initialization, verification, and iterative retraining.

Sutro eliminates the need to wrangle GPUs, distributed compute libraries, and storage integrations - offering a simple "model in, data out" solution. For SynthLabs, this meant their researchers could bypass infrastructure management entirely and execute workloads immediately and predictably through a single API.

Accelerating Research Velocity and Dropping Costs

Sutro is designed for both of SynthLabs modes of research. With Sutro’s unified tooling layer, smaller-scale inference jobs (on the order of thousands of rows of data, and millions of tokens) are also now, faster, cheaper, and easier to run than ever before - greatly increasing SynthLabs research velocity in the earlier phases of research. With the same API, researchers can quickly prototype and develop at million token scale during experimentation phase, and then immediately productionize to billion token scale with zero friction.

"Sutro lets our researchers fire off batch inference—whether it’s a thousand samples or a few billion—through one API call. They don’t have to check cluster queues or negotiate priorities; the job runs immediately with a predictable, fast return-time.”

Going Large-Scale

The partnership was put to the test on a landmark project: generating a synthetic dataset with a custom, fine-tuned LLM that would have previously been too large and expensive to create. The outcome was transformative.
• Massive Scale Achieved: Sutro successfully generated a 351 billion-token dataset (250 million math Q&A pairs), a scale that was previously out of reach.
• 10x Faster Turnaround: The entire job was completed in a matter of days, not months, dramatically accelerating the research lifecycle.
• 80% Lower Cost: The generation was completed at a fraction of the cost of alternative solutions or building the infrastructure in-house.
• Zero Engineering Overhead: SynthLabs achieved this without any internal engineering investment in infrastructure, allowing the team to focus exclusively on research.

SynthLabs, Sutro and You?

SynthLabs is excited to partner more companies looking to expand their AI capabilities via RL and synthetic data, powered by Sutro’s accelerated batch inference infrastructure. The following is a sample list of real-world applications that can benefit companies today:

Synthetic Data Generation

Design pipelines for generating, filtering, and curating data that captures your exact policies, edge cases, and domain quirks that no public model could ever learn—enabling faster model training and reducing reliance on expensive human labeling.
Cut human annotation and labeling costs by multiplying existing corpora using test-time computation and reasoning strategies—allowing teams to scale datasets 10x or more without proportional increases in time or expense.
Simulate diverse user personas for annotations, rankings, testing copy, and personalization—revolutionizing how businesses gather feedback by replacing costly real-human panels with scalable AI-driven insights

Financial Services

Generate compliant synthetic trading scenarios for risk models, automate comprehensive test suites for regulatory reporting systems, and train specialized judge models for transaction monitoring.

Healthcare & Life Sciences

Synthesize patient interaction scenarios while maintaining HIPAA compliance, generate edge cases for diagnostic AI systems, and create training data for clinical trial matching algorithms.

Enterprise Operations

Transform SOPs into comprehensive training datasets, generate user interaction scenarios for testing automation, build custom verifier models for quality assurance, model diverse user testing/simulated touchpoints with open models, enable comprehensive coverage of edge cases and regulatory scenarios absent from base model training.

Deliver competitive advantages in your company's AI initiatives today.

If you’re looking to generate sophisticated synthetic datasets, specialize LLM agents with RL, or collaborate on post-training research, reach out to SynthLabs and Sutro to see how we can make our collaboration work for your business, by emailing team@sutro.sh and team@synthlabs.ai.

AI that learns your business, not generic rules

Not generic responses. Not 80% accuracy. But AI that knows your compliance rules, speaks your industry language, and improves every day from your real workflows.

SynthLabs pioneered using RL to train models that reason through complex judgments—evaluating "quality," ranking "correctness," providing nuanced feedback, and grading solutions with no simple right answer. We build these reasoning systems specifically for your requirements, from policy nuances to brand voice, then let them continuously refine themselves using your operational data.

No massive annotation teams. No waiting for model updates. Just AI that tackles the hard problems generic models can't handle.

The result: Production-ready AI that catches what others miss. All running securely within your firewall.

What Will You Scale with Sutro?

Get Access

Blog

Documentation

Docs

team@sutro.sh