AI Agent Agency Due Diligence: A 5-Step Verification Process

AI agent agencies are harder to verify than traditional software shops — the field is newer, track records are shorter, and demos are easy to fake. Here's a 5-step process to verify what you're actually buying.

Why AI Agency Due Diligence Is Harder Than Traditional Software

Evaluating a traditional software development agency is hard enough. Evaluating an AI agent agency is harder for three specific reasons. First, the field is young enough that even experienced practitioners have short AI-specific track records. An agency that has been building web applications for 15 years and pivoted to AI agents in 2024 might be outstanding or might be learning on your budget — their general software track record doesn't tell you. Second, the output is harder to demo meaningfully. An agency can build an impressive demo in 48 hours with a curated dataset and controlled prompts that would fall apart in 48 minutes of production use. Third, the skills required for production AI agents — prompt engineering, evaluation framework design, observability, cost management, model reliability — are genuinely different from general software engineering, and it's hard for non-technical buyers to distinguish real expertise from confident-sounding marketing language. The 5-step process below is designed to surface real expertise and real production track records, not pitch quality. Use the Vendor Scorecard to document your findings across vendors so you have a structured comparison before making a decision.

Step 1: Portfolio Verification — Production Deploys, Not Demos

The first question to ask every agency is: 'Can you show us a production AI agent system that is currently running for a paying client, and can we speak with a user of that system?' This single question immediately separates agencies with genuine production experience from those with impressive demos and case study slides. What to look for in the response: a direct yes with specific details (system name, client type, go-live date, current usage volume), not a hedged 'most of our work is under NDA.' Some legitimate work is under NDA — but an agency that cannot produce any verifiable production reference has a problem. For the production systems they do share, ask: the go-live date (anything launched in the last 3 months is not seasoned production experience), the current usage volume (low usage may indicate low confidence in the system), the failure modes they encountered post-launch and how they handled them (experienced teams have war stories; teams with no war stories haven't been in production long enough), and whether the client is still using the system or if it was quietly decommissioned. Agencies with genuine production experience will be specific and forthcoming about both successes and failures. Agencies selling on potential rather than track record will be vague, generic, and over-reliant on AI industry hype.

Step 2: Reference Calls — What to Actually Ask Past Clients

Reference calls are only valuable if you ask the right questions. Most buyers ask 'would you recommend them?' — which produces a useless positive answer from a reference the agency selected. Ask these questions instead: (1) What did the agent actually do wrong after launch, and how did the agency respond? A confident 'nothing went wrong' is a red flag — every production AI agent has issues. (2) Was the delivery timeline accurate, and if not, what changed? (3) How did the agency handle scope disagreements — when they said something was out of scope, was that call reasonable? (4) What would you specifically change about how the engagement was run? (5) Who on the agency team was most valuable, and are they still at the agency? (6) If you were hiring again today for a similar project, would you go back to them, and why or why not? Also ask what the agency was like under pressure: when production was down, when the demo didn't work in front of stakeholders, when the model started behaving unexpectedly. That's where you find out who the agency really is. Ask for at least two reference contacts and verify independently that they work at the client company before scheduling the call.

Step 3: Technical Deep-Dive Interview

The technical deep-dive is your primary mechanism for separating real AI engineering expertise from polished sales capability. The goal is not to stump the team — it's to ask questions that require specific, grounded answers and observe how they respond to genuine uncertainty. Include your most technically capable person in this conversation; if you don't have one, consider hiring a fractional technical advisor for this step alone. Questions that reveal real expertise: (1) Walk us through your evaluation framework — how do you know an agent is ready for production? Ask for a specific example with metrics, not a general description. (2) How do you handle prompt injection attacks in production? If they haven't thought about adversarial inputs, that's a gap. (3) What's your approach to observability — how do you know the system is degrading after go-live? Experienced teams have specific tooling: LangSmith, Arize, Helicone, or similar. (4) Describe a time a production agent started producing wrong outputs — what caused it, how did you detect it, and how did you fix it? (5) How do you manage LLM API costs at scale, and what's your failover strategy when the primary model provider has an outage? Use the Interview Questions tool to build a complete agenda covering these technical areas before the meeting.

Step 4: Team Stability and Key Person Risk

The AI talent market is exceptionally competitive. Senior AI engineers with production deployment experience command $200k–$350k+ in compensation, and turnover at AI agencies is meaningfully higher than in traditional software firms. Buying an AI agent engagement is implicitly buying access to a specific team — and if that team changes mid-project, your project changes materially. Ask directly: (1) Who specifically will be working on our project — name the individuals, not the roles? (2) What is their current capacity and how many other projects are they running simultaneously? (3) What is your policy when a key team member leaves during an engagement? (4) What was your engineering team turnover rate in the last 12 months? Be skeptical of evasive answers to the first question. Agencies that can't commit to specific personnel at proposal time often staff projects with whoever is available — which means your project gets a different team than the one you evaluated. Ask for the contract to include a key person clause: if named personnel are removed from the project without your consent, you have the right to pause the engagement or negotiate a credit. Smaller agencies (5–20 people) often have lower turnover and more consistent staffing; larger agencies have more redundancy but higher likelihood of bait-and-switch staffing practices.

Step 5: Financial and Operational Health

You need this agency to support you 18 months from now. AI agent systems require ongoing maintenance, prompt updates, model version management, and incident response. An agency that gets acquired, runs out of runway, or significantly downsizes in the next 18 months puts your production system at risk. Ask directly: (1) Are you profitable, or are you VC-funded and burning runway? If VC-funded, what is your current runway? (2) How many clients do you have, and what is your largest single client as a percentage of revenue? Client concentration over 40% is a business risk — if that client churns, the agency is in trouble. (3) Do you have post-launch support contracts in place for the systems you've built? This tells you whether clients are paying for ongoing support, which indicates the agency builds things that require ongoing support — a good sign. (4) What is your standard post-launch support offering, and can you show an example contract? Independently, search for the agency's funding history, any news coverage, and the LinkedIn tenure of key team members. You're not doing a full audit — you're checking whether anything suggests the agency won't be there when you need them. Use the Budget Transparency Index data to benchmark what a financially healthy engagement and support structure looks like for agencies at this project scale.

Related Resources

Find agencies that specialize in the frameworks and use cases covered in this article.

Interview Questions →Proposal Evaluator →Vendor Scorecard →Search Agencies →

Buyer Guide

How to Hire an AI Agent Agency: The Complete Buyer's Guide (2026)

Read →

Buyer Guide

Why AI Agent Projects Fail: The 7 Most Common Mistakes (And How to Avoid Them)

Read →

Buyer Guide

AI Agent Development Contracts: 12 Clauses You Must Have

Read →

Explore the Directory

Find the right AI agent agency for your project.