Question 1

What does a typical engagement look like?

Accepted Answer

A written cost audit (~1 week, fixed-fee, refundable against the engagement that follows), then a 6-14 week migration: Audit → Plan → Port → Tune → Ship. Shadow traffic from week 3, eval-parity proof before any cutover, hosted baseline kept warm for 30 days post-cutover.

Question 2

What if the migration doesn't work?

Accepted Answer

You stay on your hosted API. We charge for the audit and the work performed; you keep the runbooks and the cost model. We have walked away from engagements where the math did not justify the migration, that is what the audit is for.

Question 3

How big is the team?

Accepted Answer

Two founders. Abhimanyu Singh leads inference-and-infra delivery end-to-end. Kaushlendra Kumar Giri (Edinburgh M.Sc. AI) leads optimization, quantization, and evaluation work. No project managers, no junior staffing, no offshore delivery centers. If we sign with you, you get us.

Question 4

How many engagements do you take a year?

Accepted Answer

Small on purpose. We turn down work that is not a fit so the work we take gets the depth it needs. The cost audit (~1 week) is usually available within a week of the kickoff call.

Question 5

Can you start without an audit?

Accepted Answer

In rare cases yes, usually when the migration is a follow-up to a prior engagement where we already have the cost model. For new clients, the audit is non-optional. It is the artefact that lets us both decide whether the math works.

Question 6

Can I just hire you to optimize without migrating?

Accepted Answer

Yes, that is one of our most common engagements. If you are already self-hosted and your p95 latency or cost-per-token is the problem, the optimization engagement (kernels, quantization, batching, scaling) is fixed-scope and runs 4-8 weeks.

Question 7

How much will I save by self-hosting?

Accepted Answer

For most teams running ≥$10K/month on a hosted API with consistent traffic, the math comes out to 40-70% lower per-token cost. Our public ROI calculator at /calculator runs your numbers against current hosted pricing (OpenAI, Anthropic, Together, Bedrock, Replicate) and current GPU rental rates (RunPod, Lambda, CoreWeave). The break-even point depends on your utilization, your model, and whether your workload has cacheable prefixes.

Question 8

What does the cost audit cost?

Accepted Answer

Fixed fee, scoped to a one-week engagement, refundable against any larger engagement that follows. Exact number is in the proposal we send after the discovery call, we do not publish it on the marketing site because it depends on the access pattern your environment requires.

Question 9

Do you take revenue share?

Accepted Answer

No. No revenue share, no token margin, no per-call fee, no platform that gets more expensive at scale. Fixed-scope engagements only.

Question 10

How do I know my self-hosted model is as good as my hosted one?

Accepted Answer

Eval parity. Before cutover we build (or extend) an eval harness that scores both stacks against the same prompts. We publish a parity report showing per-task delta with confidence intervals. We will not cut traffic until the new stack passes the tolerance you signed off on. The fp-eval harness we use is open-source.

Question 11

Do you only work with NVIDIA hardware?

Accepted Answer

Mostly H100, H200, and B200 because that is what most of our clients run, but we have shipped MI300X work on AMD and we have advised on hybrid fleets. Hardware choice is an output of the engagement, not an input, we pick what fits the workload, not what we have a relationship with. We have no resale agreements with any vendor.

Question 12

Which inference engines do you use?

Accepted Answer

vLLM most often, TensorRT-LLM where the gain justifies the build complexity, SGLang for structured-output workloads, TGI for Hugging Face shops, custom Triton/CUDA kernels when the off-the-shelf does not fit. We are not loyal to any one runtime, we benchmark on your traffic and pick what wins.

Question 13

Where will my model run?

Accepted Answer

Your VPC, your Kubernetes cluster, your bare-metal box, your colo, your on-prem rack. We do not run a hosted platform. The only fastpriors-controlled infrastructure your traffic ever touches is, by design, none.

Question 14

Can you sign a BAA / DPA / NDA?

Accepted Answer

Yes to all three. NDA before any discovery call where confidential material is shared. DPA on engagements that touch EU/UK personal data. BAA on engagements that touch PHI. See /security for the full posture.

Question 15

Are you SOC 2 certified?

Accepted Answer

Not yet, Type 1 is on the 2026 roadmap. We can already complete most SOC-2-style security questionnaires; ask and we will send the latest answers. We will not claim a certification we do not hold.

Question 16

Will you train on my data?

Accepted Answer

No. Not for fastpriors models, not for any model, not on any data, production prompts, completions, weights, datasets, traces. We do not commercialise on client data, ever. This is in writing in the engagement contract and the DPA.

Question 17

What happens after you leave?

Accepted Answer

You own the runbooks, the eval harness, the dashboards, and the code. We hand over a written handover document and stay reachable for clarification questions for 90 days at no charge. Your team can disable our last access on day one of the handover.

Question 18

How do I start?

Accepted Answer

Open /contact, fill out the form. We respond within one working day with either (a) a 30-minute discovery call, (b) a written "this is not a fit because…" note with a referral if appropriate, or (c) a request for the specific data we need to scope the audit.

Question 19

Why is the website animated like that?

Accepted Answer

Because the live system diagram is a faster way to convey what we do than three paragraphs of copy. Animations honour prefers-reduced-motion; if your OS is set to reduce motion, the page is static. They also pause when off-screen.

Questions before
the call.

What does a typical engagement look like?

What if the migration doesn't work?

How big is the team?

How many engagements do you take a year?

Can you start without an audit?

Can I just hire you to optimize without migrating?

Still have a question?

Questions beforethe call.

What does a typical engagement look like?

What if the migration doesn't work?

How big is the team?

How many engagements do you take a year?

Can you start without an audit?

Can I just hire you to optimize without migrating?

Still have a question?

Questions before
the call.