Blog
Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Tasks
Jun 6, 2025
Open source LLMs now outperform closed alternatives at 90% lower cost for workhorse tasks. We analyzed benchmarks, pricing, and real-world performance to show exactly which models beat GPT and Claude—and by how much.
Generating 1 Million Synthetic Humans - a New Method for Seeding Diverse LLM Outputs
Apr 15, 2025
We demonstrate a new method for seeding diverse LLM responses, and release an accompanying open-source dataset of 1 million synthetic humans.
Hacker News is Obsessed with Aviation: Classifying 42 Million Posts with SLMs
Mar 31, 2025
An analysis of 40+ million Hacker News posts reveals that 0.62% are aviation-related, with the percentage steadily increasing over time. Using small language models, we classified 10.7B tokens of content to discover aviation's surprising popularity among technologists.
Model Security with Large-Scale Inference
Mar 12, 2025
How do you verify open-source AI models aren't Trojan horses? We bombarded Qwen 2.5 Coder with 50,000 diverse programming tasks and used Mistral Codestral as a verifier to check for malicious outputs. Our large-scale inference approach found minimal security concerns, demonstrating a practical framework for evaluating model safety before deployment.