Blog

The End of Moore's Law for AI? Gemini Flash Offers a Warning

Jul 3, 2025

Google hiked the price of Gemini Flash when they released Flash Lite. This is the first time a model provider increased the cost of inference on a model and is a signal of things to come.

(No) Need For Speed: Why Batch LLM Inference is Often the Smarter Choice

Jun 15, 2025

Doing any task in bulk with LLMs can quickly become expensive and brittle. Learn why batch LLM APIs offer a cleaner, less expensive, and often faster alternative to synchronous APIs.

Workhorse LLMs: Why Open Source Models Dominate Closed Source for Batch Tasks

Jun 6, 2025

Open source LLMs now outperform closed alternatives at 90% lower cost for workhorse tasks. We analyzed benchmarks, pricing, and real-world performance to show exactly which models beat GPT and Claude—and by how much.

Generating 1 Million Synthetic Humans - a New Method for Seeding Diverse LLM Outputs

Apr 15, 2025

We demonstrate a new method for seeding diverse LLM responses, and release an accompanying open-source dataset of 1 million synthetic humans.

Hacker News is Obsessed with Aviation: Classifying 42 Million Posts with SLMs

Mar 31, 2025

An analysis of 40+ million Hacker News posts reveals that 0.62% are aviation-related, with the percentage steadily increasing over time. Using small language models, we classified 10.7B tokens of content to discover aviation's surprising popularity among technologists.

Model Security with Large-Scale Inference

Mar 12, 2025

How do you verify open-source AI models aren't Trojan horses? We bombarded Qwen 2.5 Coder with 50,000 diverse programming tasks and used Mistral Codestral as a verifier to check for malicious outputs. Our large-scale inference approach found minimal security concerns, demonstrating a practical framework for evaluating model safety before deployment.