fastpriorsTalk to an engineer
resources

Benchmarks. Playbooks. Receipts.

Everything we've learned, written down. Including the things that didn't work, with enough detail that you can avoid making the same mistakes.

in preparation

Field notes.

Benchmarks and playbooks we're packaging up for public release. Need one before it ships? Tell us which.

benchmark · Q3 2025
benchmark

Llama 3.1 70B inference: hosted vs self-hosted on 8× H100

Q3 2025drafting
benchmark · Q3 2025
benchmark

FP8 vs INT8 quantization across 14 production workloads

Q3 2025drafting
writeup · Q2 2025
writeup

Speculative decoding in practice: when 4× speedups vanish

Q2 2025drafting
playbook · Q2 2025
playbook

The migration runbook: hosted API → owned VPC, step by step

Q2 2025drafting
tool · github
tool

fp-eval: an open eval-parity harness for migrations

github · MITdrafting
calculator · interactive
calculator

Hosted-vs-owned spend calculator (with assumptions)

interactivedrafting
tool · github
tool

kv-layout: KV cache layout explorer for custom kernels

github · MITdrafting
writeup · Q4 2025
writeup

What we learned from 14 production migrations in 2025

Q4 2025drafting
playbook · Q3 2025
playbook

Sub-50ms RAG: a latency budget worked example

Q3 2025drafting
methodology

How we measure.

Every benchmark on this site comes with assumptions, hardware, and a reproducible script.

Real workloads, real traffic

Numbers from actual production rollouts, sampled across the day. Not synthetic stress tests.

Eval parity before perf

We don't quote a speedup unless the model passes the same eval suite as before, within the agreed tolerance.

Hardware disclosed

Every benchmark says exactly which SKU, how many, what interconnect, and what the host OS was running.

Failures included

We publish the cases where it didn't work, too. The interesting margin is in what doesn't optimize.

Want a benchmark for your workload?

The cost audit ends with one. Refundable.

Talk to an engineer →