How it works

Six steps. Ninety days.
A knowledge base that compounds.

The framework below combines a16z's context-layer construction model with the Forward Deployed Engineering practice the leading applied-AI shops use. We use it because it works — and because it matches the shape of every successful engagement we've seen. Each step is a concrete deliverable; each step is reviewable on its own.

Step 1

Data accessibility

We catalog your existing data — the warehouse, the SaaS exports, the half-finished pipelines, the spreadsheets your operations team actually uses. Nothing gets thrown out. Everything gets a known path.

Most engagements start here because the prospect cannot answer 'where does revenue actually come from?' without three different people in the room. We make that question one query.

Step 2

Automated context construction

We extract the implicit context from query history, data-modeling tools, schema migrations, and tribal knowledge in tickets. The knowledge base starts seeded — not blank.

Your team has been writing this context for years in PR descriptions, runbooks, Slack threads, and incident post-mortems. We don't ask you to re-write it. We capture what's already there.

Step 3

Human refinement

Captured context gets reviewed and refined by people who actually own the domain. Implicit, conditional, exception-laden knowledge — the stuff that lives in Sarah's head — gets written down once, used forever.

This is the step every other vendor skips. It's also the step that determines whether the system works. We staff it as a workshop series, not a back-office task.

Step 4

Agent connection

Your AI agents — whether they're customer-facing, internal-tool helpers, or analytics assistants — connect to the knowledge base via API or MCP. They retrieve what they need, when they need it, with full audit trail.

The knowledge base is a single source of truth. Adding a new agent is connecting another consumer, not building another silo.

Step 5

Evals — prove it works

Before the system goes live, we build the evaluation framework that proves it earns its keep. We trace the way your best human handles a task and grade the agent on each step. We collect a small set of perfect-answer examples and measure every output against them. The goal is not 'looks good in a demo' — it's a defensible answer to 'is this actually working?'

Evals are how an executive trusts the agent will deliver ROI. Without them, every conversation about scope and budget devolves into opinion. With them, you have numbers. The two-technique approach — trace human steps + golden dataset — is the practitioner standard among Forward Deployed Engineering teams.

Step 6

Self-updating flows

The agents do their own curation. Every interaction either reinforces existing knowledge, surfaces a gap, or proposes an update for human review. The knowledge base stays current — not because someone is paid to maintain it, but because the system maintains itself.

This is the ‘continuously-learning’ part. It is also where we differ from a one-time RAG-and-vector-database implementation. Static knowledge bases decay. Ours doesn't.

The architecture we ship

Stateful agents — not stateless workflows — paired with a knowledge base that grows from the work itself.

The six steps above describe the engagement shape. This section is the architectural distinction that determines what you actually receive at the end of it.

The workflow-vs-agent line

In December 2024, Anthropic published the canonical industry taxonomy in a post titled Building Effective Agents. The distinction is precise: a workflow is a system where LLMs and tools are orchestrated through predefined code paths. An agent is a system where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.

Most products marketed as “AI agents” today are workflows by this definition — prompt chaining, routing, orchestrator-worker patterns with LLM steps. They are useful. They are not agents.

Letta's critique

Letta — the UC Berkeley team behind the MemGPT memory research, $10M from Felicis on September 23, 2024 — sharpened the critique further:

“Most ‘agents’ today are essentially stateless workflows: they have no way to persist interactions beyond what fits into the context window.”

What our stateful agents do differently

Our agents persist across sessions. They have identity, memory that consolidates over time, accumulated experience that informs the next task. They are embedded in your environment — runtime, data layer, operational systems — and observe directly instead of sitting in a chat window. They log everything: every observation, every action, every outcome, every correction. They distill those raw signals into structured knowledge entries — guardrails, reasoning rules, pattern signatures, tree articles — that the next agent invocation can use.

The knowledge base from steps 1-6 above is not a separate artifact you have to maintain. It is the byproduct of the agents doing real work in your environment. The agents populate it. The agents read from it. The agents propose improvements to it. Your team reviews and steers.

Why this combination makes the knowledge base the best in the industry

A static knowledge base (Notion, Confluence) decays — humans pay the upkeep cost forever. A RAG-augmented chatbot retrieves but does not learn. An LLM-curated personal wiki (the Karpathy workflow) works for one user. A stateful agent without a shared knowledge base remembers per-agent but doesn't compound across agents. Only the combination — stateful agents plus continual-learning knowledge base plus environment embedding plus state-signal distillation — produces a knowledge base that grows automatically from real operational work and serves multiple agents drawing from it. That is what we ship.

Multi-agent by design

Many agents working at once — not one agent doing everything.

We do not ship a single agent. We ship an operating environment in which multiple specialized agents work at the same time, each one reacting to a shared view of the world. The same environment hosts our agents, external partner agents from third-party vendors, and agents your team builds — all under explicit permission tiers and audit identities.

That shared view of the world is the structural distinction. Each agent continuously reads three layers of environment: your knowledge (the shared knowledge tree), your live data (the operational systems the agents are embedded in), and what every other agent is accomplishing. The third layer is what a single-agent system structurally cannot have — agents reacting to peers, not just to data. The coordination machinery beneath it (a shared work queue, audit and permission tiers, a constitutional anchor, channels for cross-agent communication, and a deeper framework layer refined over more than a year of continuous production operation) is the engineering that makes the collaboration reliable without making it brittle.

Why multi-agent beats single-agent at customer scale

  • Specialization beats generalization. No single agent can hold full-stack engineering AND product management AND compliance review AND domain research at production quality. Specialists collaborating outperform generalists.
  • The knowledge tree compounds. Lessons learned by one agent immediately become context for every other agent on the next task. Adding the next agent multiplies the knowledge base's utility, not just its size.
  • Decisions are attributable. When a regulator, an auditor, or your CFO asks “which agent did this and why?” — the answer is in the audit trail. Single-agent systems cannot answer this.
  • External agents join with controlled access. A partner vendor's agent (or one your team builds) joins the environment with a defined permission tier — no bilateral integration required. They read the parts you allow, contribute the way you allow, operate under the same protocol everyone else does.
  • Failure isolation. When one agent fails or goes off the rails, the others continue. Constitutional anchor plus permission tiers contain the blast radius. Single-agent systems take the whole deployment down with them.

Production proof: a sister deployment at ayoai.com runs this architecture continuously — multiple specialized agents on one shared knowledge tree, reacting to live data and to each other's contributions. The same framework deploys to your business.

What changes from a typical RAG project?

Audit-grade by construction

Every learning is sourced from a specific interaction, scored for confidence, and retrievable in audit-ready form. Useful for FedRAMP / FISMA work. Useful for SOC 2. Useful when you need to explain a decision six months later.

Continuously self-curating

The agent reflects, scores its own outputs, retires stale entries, promotes high-utility patterns. The work that consultancies normally bill 4-hour sessions for — automatically, every iteration.

Compounds across engagements

Most consultants leave a project and start the next one from zero. We leave a project and the next one starts at project N+1. Your price reflects the cost of adding to what we already know.

Built for your sector.

The framework is the same. The compliance posture, pricing shape, and engagement cadence change depending on whether you're a government program office or a 30-person operating business.