Home AIS AIOS Assistant
Solutions🔧 Contractor Sites⚡ All Services
Case Studies Blog About
Methodology

The AI integration playbook: from brief to production in 90 days

I've run enough of these to know the shape. The brief that kills the project before a line of code is written. The ontology gap that shows up in week 2. The UAT session where a real user finds the thing IT testing completely missed. This is the full AI integration playbook — week by week, deliverable by deliverable — of how we take an enterprise AI project from a conversation to a live production system in ninety days.

By Landon Little, Founder of Nova Solutions · 29 May 2026 · 14 min read

Week 0 is where it dies. Not week 6. Not UAT. Week 0.

I've watched three AI pilots collapse in the first two weeks. Not because the model was wrong. Not because the integration was hard. Because the brief was garbage. The client didn't actually know what they wanted the AI to do — or they knew abstractly ("AI for customer service," "AI for collections") but hadn't gotten specific enough for anyone to build against. Or there was a data problem nobody mentioned because nobody on their side thought to check before the kickoff call.

Every one of those problems is fatal later. None of them are hard to catch early. The clients who push back on a proper discovery phase — "we already know what we want, just start building" — consistently end up with a build that takes twice as long and a UAT that surfaces problems that should have died in week 1. The discovery isn't delay. It's compression on the backend.

This AI integration playbook is what we actually run. Not a sales deck with vague phases labeled "strategy" and "execution." The real week-by-week structure with actual deliverables, the failure modes per phase, and the handoffs between them. If you're evaluating whether to engage Nova Solutions for an enterprise AI rollout, this is what you're buying. If you're doing your own 90-day AI project plan, take it and use it.

The phases of an AI project implementation guide

Week 0 — Days 1–5
Brief

The brief isn't a discovery worksheet. It's a decision document. By the end of five days, it should answer four specific questions with enough precision that a developer who has never spoken to the client could build the right thing.

Question one: What are you actually trying to automate? Not the category. The workflow. Not "AI for collections." Something like: "Reduce the time a collections agent spends on initial borrower contact from 4 minutes to under 90 seconds, using an outbound AI agent that identifies the borrower, reads the account summary aloud, and transfers to the human agent with a live transcript." That sentence contains a measurable baseline (4 minutes), a measurable target (90 seconds), a named workflow (initial borrower contact), and a defined handoff (transfer with live transcript). Every word of the brief should be at this level of specificity.

Question two: What data exists, and what form is it in? This is the conversation most clients avoid. We don't avoid it. We ask to see a sample export from every system the AI agent will touch — before signing a scope. Not a screenshot. An export or a schema, with row counts and date ranges. I've had three projects where the client was confident they had clean data and the actual state was: one system with 40 percent null values in a required field, one Excel file last updated eight months ago, and a legacy database that required a proprietary driver to access. Finding that in week 1 costs a conversation. Finding it in week 5 costs a renegotiation.

Question three: Who are the actual end users, and how technically capable are they? Not the IT team. The collections agent. The branch teller. The loan officer. What does their current workflow look like? What tools do they use? How much training budget is in the project? An AI agent that's brilliant but requires a 40-minute onboarding session will get used by 30 percent of the target users. We size the UX for the actual user, not the aspirational one.

Question four: What regulatory environment applies? For Philippine financial institutions this means BSP, NPC, and RA 10173 at minimum. For healthcare-adjacent data, PhilHealth and DOH guidelines. For LGU-adjacent work, DICT and relevant DILG circulars. The answer changes the audit trail architecture, which changes the agent data model — and that needs to be known before the AI ontology design starts in week 1.

Deliverable: a 2-page brief, signed by the client's authorized representative
Why we require a signature

The brief signature isn't a formality. It's the client's explicit commitment to a specific outcome. When the brief is unsigned, "what we want" drifts between conversations. When it's signed, there's a document to return to when scope creep starts. Every mid-project "actually we also want..." that isn't in the brief is a change order. This protects the client as much as it protects us — they get what they signed for, not a shifted version of it.

Weeks 1–2
Ontology design

The ontology is the formal model of every important business object the AI system will operate on. For a collections AI: customer, loan, payment history, collector, branch, contact attempt, escalation rule, regulatory constraint. For a document processing AI: document, template, extracted field, validation rule, downstream system, exception handler. Every object is defined — what it is, what fields it has, where those fields come from, what the data quality looks like, and what the fallback logic is when a field is missing or malformed.

We draw the ontology as a diagram first. Every object is a node. Every relationship is an edge with a direction and a cardinality. A customer has many loans. A loan has one primary collector and many contact attempts. A branch has many collectors. A contact attempt has one outcome and maps to one regulatory window — because calling a borrower twice in a day in certain contexts violates BSP guidance. The diagram has to be complete before any code is written. The agent is built against the ontology, not against the underlying database. If the AI ontology in week 2 is wrong, the agent is wrong.

The data gap analysis is the document that maps every field in every object to its actual data source. "Customer.name — source: core banking system, completeness: 99.8%, format: mixed case with occasional ALL CAPS, resolution logic: normalize to title case." "Loan.last_payment_date — source: collections system, completeness: 87%, missing value logic: pull from payment history table, fallback: mark as unknown and flag for manual review." Every gap has a resolution strategy. If the resolution strategy is "we don't have this data," that's recorded and the agent design is adjusted.

The "we don't actually have that data" bomb always drops in week 2, never week 0 — because nobody on the client's side thought to verify before the kickoff. "Oh, we don't actually have the borrower phone number stored in the system — field exists but it was never populated." That's a recoverable problem at week 2 ontology review. It's a rebuild at week 4 agent dev.

Deliverable: ontology diagram + data gap analysis, reviewed and approved by client's data or IT lead
Weeks 3–6
Agent development — first agent only

One agent. Not three, not five. Pick the highest-value workflow with the clearest scope from the brief and build that one first.

I've watched clients push for parallel agent development in the first build phase. "We have six workflows to automate — shouldn't we build all of them simultaneously to save time?" No. The first agent is where you discover the ontology problems. The field that wasn't where the schema said it would be. The API that returns data in a format the ontology parser doesn't handle. The edge case in the business rule that the client's process doc didn't mention because everyone in the office already knows it and didn't think to write it down. You want to discover those problems once, against one agent, in a contained build. If you're building six agents simultaneously and hit an ontology problem, you're fixing it six times.

Weeks 3 and 4 are pure development against the ontology. No real data yet. Synthetic test data generated from the object definitions. The agent should run end-to-end on synthetic data by the end of week 4.

Week 5 is when we bring in real data for the first time. This is usually the most interesting week of the entire AI deployment timeline. Real data surfaces things synthetic data doesn't — outliers, encoding issues, historical records that predate the current schema, migration artifacts from legacy systems. We run the agent against a slice of real data (typically 10 to 20 percent of production volume) and record every case where it fails, produces unexpected output, or hits a confidence threshold below the acceptable floor.

Week 6 is fixing what week 5 surfaces. Goal by end of week 6: an agent that performs correctly on the real data sample with a failure rate below the threshold agreed in the brief. For a collections routing agent that's usually below 2 percent of cases requiring manual intervention. For a document extraction agent it's usually below 5 percent extraction errors on the defined field set.

Deliverable: working agent in staging, tested against real data sample, performance report against brief targets
Weeks 6–8
Integration and user acceptance testing

Wiring the agent into the client's existing tools is the most technically tedious phase. Not the hardest — the ontology design is harder — but the most fiddly. Most of our integrations go in via REST API. Some use webhooks where the client's system can push events to the agent in real time. Occasionally we use file-based integration for clients whose systems are old enough that API access requires a separate procurement process.

The edge cases are what surprise most clients here. The API that works perfectly in the sandbox and returns a 503 under production load. The webhook that fires duplicate events when a record is saved twice in quick succession, causing the agent to take the same action twice. The authentication token that expires after 24 hours with no refresh mechanism — nobody noticed because in testing the token was always fresh. We document every integration edge case, implement handling for it, and verify it in the integration test suite before AI UAT testing begins.

UAT is not IT testing. This matters enormously and most enterprise AI rollout plans get it wrong.

IT testing verifies that the system does what the spec says. UAT verifies that the actual end users — the collections agents, the branch tellers, the loan officers — find it usable, accurate, and integrated into their real workflow. These are different tests. An IT team can pass UAT on a system that real users will refuse to use because it doesn't match how they actually work.

On one recent collections agent deployment, I sat three actual collectors with the system for four hours each across two days. We watched them use it on real borrower cases. Two of the three found the same friction point: the agent's summary read the outstanding balance in a format that didn't match the format on the paper ledger the collector also consults, so they double-checked every time rather than trusting the AI output. That friction wasn't in any spec. It was invisible in IT testing. It would have silently killed adoption if we'd shipped without catching it. The fix took two hours. Finding it took having real users in the room.

The thing nobody budgets for: AI change management. A collections agent who's been doing the same job for three years and now has an AI system routing her calls is going to resist it — not because she's difficult, but because every competent professional has anxiety about a new tool that might make her look incompetent in front of her borrowers if it misbehaves. Budget time for this. We typically allocate two sessions with end users during UAT: one for orientation and one for feedback incorporation. The second session is where trust is built. The first session is where the complaints come out.

Deliverable: agent integrated with client systems, UAT passed by end-user group, 3+ documented edge cases handled and verified
Weeks 8–10
Hardening

Hardening is the phase most clients want to skip because it feels like polish on something that already works.

It's not polish. It's the difference between a system that works in UAT and a system that works at 2 AM when nobody is watching.

Error handling means every failure mode the agent can hit has a defined response. API timeout: retry three times with exponential backoff, log the failure, surface to the AI monitoring dashboard. Confidence below threshold: route to manual review queue rather than proceeding. Data field missing that's required for the agent's decision: halt, flag the record, notify the assigned handler. None of these should require a developer to interpret. The agent's failure responses should be self-describing to the operator managing the queue.

Fallback behaviors mean that when the primary path fails, there's a secondary path that keeps the operation running at degraded but acceptable service. A collections routing agent that can't reach its primary LLM endpoint should fall back to a rule-based routing table rather than returning an error to the user. It runs slower and less accurately in fallback mode. It doesn't stop working.

Alert thresholds are defined per agent and per metric: what throughput below X means the agent is underperforming, what error rate above Y triggers a page, what latency above Z indicates a dependency problem. "Healthy" for a collections routing agent in our deployments is typically: P95 decision latency under 2 seconds, error rate below 1.5 percent, manual review queue below 8 percent of total cases, and zero hard failures. Deviations from these thresholds should surface in the AI monitoring dashboard automatically, with severity levels that distinguish "investigate this week" from "call someone now."

Documentation comes last in hardening and it's the thing most developers least want to do. We produce two documents: a technical specification for the development team that owns the system ongoing, and a user guide for the actual operators. The technical spec describes the agent's architecture, the ontology it operates on, the integration endpoints, the error handling logic, and the steps to update or modify it. The user guide describes what the agent does, what inputs it needs, what outputs it produces, what to do when it sends something to the review queue, and who to contact when something seems wrong. The user guide has screenshots. The technical spec has code samples.

Deliverable: production-ready system with AI monitoring dashboard, error handling verified, both documentation types complete
Weeks 10–12
Live and first learning loop

We don't go full-volume on day one. We go live with a defined subset — typically 20 to 30 percent of total production traffic — for the first two weeks. This isn't timidity. It's the only sane way to run a first production AI ramp.

The subset is selected to be representative of the full range of cases, not just the easy ones. For a collections routing agent, the first-wave subset includes clean cases, borderline cases, and a small sample of the historically difficult ones. We want to see the hard cases in production before we give the agent full volume. The AI monitoring dashboard runs continuously during the two-week partial deployment. We're watching the error rate, the manual review queue rate, the confidence distribution across decisions, and the latency profile. All of these should match or improve on the staging performance. If they don't, we want to know on day 3 with 20 percent traffic, not on day 1 with 100 percent.

Edge cases appear in production that didn't appear in UAT. They always do. UAT surfaces a good sample but never the full population. Over the two-week partial deployment, we collect every case where the agent's behavior was incorrect, unexpected, or lower-confidence than the threshold. These cases become the inputs to the tuning pass.

The tuning pass happens at week 11. We adjust the agent's prompts and logic based on what production edge cases showed us. For most agents, this improves performance by 8 to 15 percent on the metrics that were weakest at go-live. It's not a rebuild. It's calibration. The agent already works — this makes it work better on the specific patterns production data reveals.

At week 12, we move to full production volume. The first performance report is produced at the end of week 12: comparison of the brief's target metrics to actual performance at full volume. Every metric from the brief should appear in this report with a before-state, a target, and an achieved value. If a metric isn't at target, the report explains why and what the remediation path is. This report is the formal close of the 90-day AI project plan.

Deliverable: full production at volume, first 30-day performance report against brief targets, tuning pass documented

After 90 days: AI retainer or handoff

Every client who thinks they don't need an ongoing retainer is wrong. I say this without exception.

AI systems drift. Business rules change. A new product launches that the AI ontology doesn't model yet. The regulatory environment updates — BSP issues a new circular, NPC releases new data handling guidance, the client's compliance team changes their audit requirements. The underlying LLM models update, which means the agent's prompt behavior shifts slightly even when the prompts haven't changed. The volume profile changes, which stresses parts of the system that were fine at the original scale.

None of these are catastrophic events. All of them require someone with intimate knowledge of the agent system to notice them and respond. If nobody's watching, you discover silent degradation three months later when someone asks why collections performance has dropped from 94 percent accuracy to 81 percent and nobody knows when it changed.

Two paths are both legitimate. One: hand off to an internal technical team that has the capacity to own the agent system ongoing. This requires they were involved in the build — they sat in the AI ontology sessions, they attended the UAT review, they understand the AI monitoring dashboard and what the alerts mean. If they have this context, they can own it. The technical documentation we produce is designed to enable this. Two: retain us on an ongoing basis as your AI retainer. We monitor the performance metrics, respond to alerts, tune the agent when new patterns appear, and update the ontology when the business adds new products or changes its rules.

Most of our clients in the Philippines choose the retainer path. Their technical teams are already carrying full workloads and the agent system isn't their primary competency. The retainer isn't a lock-in mechanism. It's an acknowledgment that a live AI system is a living thing that requires maintenance. The right question isn't "do I need an AI retainer?" — the answer is always yes — but "who should own this maintenance: my team or yours?"

The 90-day AI integration steps, compressed

Quick-reference version of the full enterprise AI rollout plan:

The 90 days isn't the finish line. It's the start of the production life of a system that will evolve. Treat the first 90 days as the build phase of a long-running system, not as a project with a hard end date, and you'll make better decisions throughout.

If you want to run this AI integration playbook against a specific integration you're considering — bring me the workflow, the data environment, and the regulatory context — I can scope the 90-day AI deployment timeline in a 30-minute call. We'll walk through each phase against your actual situation and you'll leave with a concrete project plan, not a ballpark estimate.

Start with a 30-minute scope session

Bring one workflow. I'll walk through each phase of this AI integration playbook against your data, your users, and your regulatory environment. You leave with a real project plan — specific milestones, deliverables, and a peso cost estimate — not a proposal template.

Book the scope session

Last updated: 29 May 2026. Timeline ranges reflect Nova Solutions deployments from 2024 to 2026. Specific durations vary by integration complexity, client data readiness, and regulatory requirements. The playbook structure is constant; the specific week counts within each phase flex based on scope.