A small firm.
An opinionated one.
We're a four-person consulting practice for AI-native companies that have outgrown hosted inference. We do migrations, optimization, custom kernels, and the unglamorous SRE work that keeps it all up at 3am.
What we believe.
Stated up front so you can decide whether we'll get along before the kickoff call.
Sovereignty by default
Your weights, your data, your VPC. We never touch production credentials we don't need, and we hand them back when we leave.
Predictable bills
No revenue share, no token margin, no platform that magically gets more expensive at scale.
Six engagements a year
We stay small on purpose. You get senior engineers, not an account manager and a PowerPoint.
Code that survives our exit
Documented, testable, runbook'd. We measure success by what works after we leave, not what breaks if you fire us.
Four people.
Forty years of GPUs.
Every engagement is run by a senior engineer from start to finish. No hand-offs, no junior-on-junior staffing, no offshore "delivery centers."
Aarav Shenoy
Ex-Together AI. Built distributed serving for a 1.4M-RPS RAG product.
Mira Halvorsen
Ex-NVIDIA TensorRT. CUDA kernels, FP8 quantization, MoE routing.
Joon-ho Park
Ex-Stripe SRE. Builds the eval harnesses that catch regressions in production.
Dr. Selma Okafor
Compiler optimization, sparse attention, posts at NeurIPS we mostly understand.
Distributed by design.
We work where our clients run their hardware. Office hours overlap on Wednesdays.
Want to work together?
Book a 30-min call. We'll tell you on the call whether we can help.
Talk to an engineer →