FastPriors
Talk to an engineer
frequently asked

Questions before
the call.

Cost, timeline, eval parity, security, exit. The objections we have heard most often, answered honestly. Did we miss yours? Tell us.

01 /What does a typical engagement look like?
A written cost audit (~1 week, fixed-fee, refundable against the engagement that follows), then a 6–14 week migration: Audit → Plan → Port → Tune → Ship. Shadow traffic from week 3, eval-parity proof before any cutover, hosted baseline kept warm for 30 days post-cutover.
02 /How much will I save by self-hosting?
For most teams running ≥$10K/month on a hosted API with consistent traffic, the math comes out to 40–70% lower per-token cost. Our public ROI calculator at /calculator runs your numbers against current hosted pricing (OpenAI, Anthropic, Together, Bedrock, Replicate) and current GPU rental rates (RunPod, Lambda, CoreWeave). The break-even point depends on your utilization, your model, and whether your workload has cacheable prefixes.
03 /How do I know my self-hosted model is as good as my hosted one?
Eval parity. Before cutover we build (or extend) an eval harness that scores both stacks against the same prompts. We publish a parity report showing per-task delta with confidence intervals. We will not cut traffic until the new stack passes the tolerance you signed off on. The fp-eval harness we use is open-source.
04 /What if the migration doesn't work?
You stay on your hosted API. We charge for the audit and the work performed; you keep the runbooks and the cost model. We have walked away from engagements where the math did not justify the migration, that is what the audit is for.
05 /Do you only work with NVIDIA hardware?
Mostly H100, H200, and B200 because that is what most of our clients run, but we have shipped MI300X work on AMD and we have advised on hybrid fleets. Hardware choice is an output of the engagement, not an input, we pick what fits the workload, not what we have a relationship with. We have no resale agreements with any vendor.
06 /Which inference engines do you use?
vLLM most often, TensorRT-LLM where the gain justifies the build complexity, SGLang for structured-output workloads, TGI for Hugging Face shops, custom Triton/CUDA kernels when the off-the-shelf does not fit. We are not loyal to any one runtime, we benchmark on your traffic and pick what wins.
07 /Where will my model run?
Your VPC, your Kubernetes cluster, your bare-metal box, your colo, your on-prem rack. We do not run a hosted platform. The only fastpriors-controlled infrastructure your traffic ever touches is, by design, none.
08 /Can you sign a BAA / DPA / NDA?
Yes to all three. NDA before any discovery call where confidential material is shared. DPA on engagements that touch EU/UK personal data. BAA on engagements that touch PHI. See /security for the full posture.
09 /Are you SOC 2 certified?
Not yet, Type 1 is on the 2026 roadmap. We can already complete most SOC-2-style security questionnaires; ask and we will send the latest answers. We will not claim a certification we do not hold.
10 /How big is the team?
Two senior engineers, both founders. Every engagement is run end-to-end by one of us. No project managers, no junior staffing, no offshore delivery centers. If we sign with you, you get us.
11 /How many engagements do you take a year?
Small on purpose. We turn down work that is not a fit so the work we take gets the depth it needs. The cost audit (~1 week) is usually available within a week of the kickoff call.
12 /What does the cost audit cost?
Fixed fee, scoped to a one-week engagement, refundable against any larger engagement that follows. Exact number is in the proposal we send after the discovery call, we do not publish it on the marketing site because it depends on the access pattern your environment requires.
13 /Can you start without an audit?
In rare cases yes, usually when the migration is a follow-up to a prior engagement where we already have the cost model. For new clients, the audit is non-optional. It is the artefact that lets us both decide whether the math works.
14 /Do you take revenue share?
No. No revenue share, no token margin, no per-call fee, no platform that gets more expensive at scale. Fixed-scope engagements only.
15 /What happens after you leave?
You own the runbooks, the eval harness, the dashboards, and the code. We hand over a written handover document and stay reachable for clarification questions for 90 days at no charge. Your team can disable our last access on day one of the handover.
16 /Will you train on my data?
No. Not for fastpriors models, not for any model, not on any data, production prompts, completions, weights, datasets, traces. We do not commercialise on client data, ever. This is in writing in the engagement contract and the DPA.
17 /Can I just hire you to optimize without migrating?
Yes, that is one of our most common engagements. If you are already self-hosted and your p95 latency or cost-per-token is the problem, the optimization engagement (kernels, quantization, batching, scaling) is fixed-scope and runs 4–8 weeks.
18 /Why is the website animated like that?
Because the live system diagram is a faster way to convey what we do than three paragraphs of copy. Animations honour prefers-reduced-motion; if your OS is set to reduce motion, the page is static. They also pause when off-screen.
19 /How do I start?
Open /contact, fill out the form. We respond within one working day with either (a) a 30-minute discovery call, (b) a written "this is not a fit because…" note with a referral if appropriate, or (c) a request for the specific data we need to scope the audit.

Still have a question?

The contact form takes 60 seconds. We reply within one working day.

Talk to an engineer →