A platform for generating embeddings from large datasets. Process billions of documents up to 20x faster and 90% cheaper.
import sutro as so
import polars as pl
apple_patent_chunks = pl.read_parquet("apple-patent-chunks-4m.parquet")
results = so.infer(
    apple_patent_chunks,
    column="text",
    model="qwen-3-embedding-8b",
    job_priority=1,
    )
print(results.head())
Generate Embeddings in Minutes, Not Days
Process billions of tokens at a time. Our purpose-built batch engine is optimized for embedding workloads, delivering results up to 20x faster.
Slash Your Embedding Costs
Up to 90% cost reduction. Efficient job packing and resource allocation make large-scale vectorization financially feasible, even for massive backfills.
Simple SDK, No Infrastructure Hell
Abstract away rate limits, backoffs, and parallelization. Replace brittle processing scripts with a few lines of code. We handle the errors, you get the vectors.
Any Model, Any Scale
Use powerful open-source models or your own private models. Scale from a small sample to your entire dataset with the same simple code.
FAQ
70%
Lower Costs
1B+
Tokens Per Job
10X
Make Anything Searchable
Stop worrying about infrastructure. Start building. Get access to Sutro and scale your embedding pipelines today.
