Blog
The End of Moore's Law for AI? Gemini Flash Offers a Warning
Jul 3, 2025
Google hiked the price of Gemini Flash when they released Flash Lite. This is the first time a model provider increased the cost of inference on a model and is a signal of things to come.
(No) Need For Speed: Why Batch LLM Inference is Often the Smarter Choice
Jun 15, 2025
Doing any task in bulk with LLMs can quickly become expensive and brittle. Learn why batch LLM APIs offer a cleaner, less expensive, and often faster alternative to synchronous APIs.
Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Tasks
Jun 6, 2025
Open source LLMs now outperform closed alternatives at 90% lower cost for workhorse tasks. We analyzed benchmarks, pricing, and real-world performance to show exactly which models beat GPT and Claude—and by how much.
Generating 1 Million Synthetic Humans - a New Method for Seeding Diverse LLM Outputs
Apr 15, 2025
We demonstrate a new method for seeding diverse LLM responses, and release an accompanying open-source dataset of 1 million synthetic humans.
Hacker News is Obsessed with Aviation: Classifying 42 Million Posts with SLMs
Mar 31, 2025
An analysis of 40+ million Hacker News posts reveals that 0.62% are aviation-related, with the percentage steadily increasing over time. Using small language models, we classified 10.7B tokens of content to discover aviation's surprising popularity among technologists.
Model Security with Large-Scale Inference
Mar 12, 2025
How do you verify open-source AI models aren't Trojan horses? We bombarded Qwen 2.5 Coder with 50,000 diverse programming tasks and used Mistral Codestral as a verifier to check for malicious outputs. Our large-scale inference approach found minimal security concerns, demonstrating a practical framework for evaluating model safety before deployment.