Context Engines: Building Explainable AI Agents

RAG the Universe

It's popular to think that a good AI system looks like the following: lots of data + RAG + the latest and greatest LLM + as many agents as you can swallow = good outcomes.

Directionally this isn't a terrible perspective, but there is a massive hole in this thinking around explainability. When every piece of evidence an agent used to provide an answer came out of searching over a massive amount of available data, how do you know it found the right data? How do you know that the reasoning it used to break it down is sound or appropriate?

"Firehose + RAG" feels like a solution because it generalizes well, but it always comes with the same limitations: inconsistent outputs, unpredictable costs, no clear line from evidence to conclusion.

The problem is an absence of architectural opinion.

A Structured Approach: A Context Engine

The first step to bringing some structure to your AI agents is putting some thought into categories of data. Think of this like curating a data catalog for an agent, so that instead of searching over literally everything, it can pick data sources that are actually relevant to the task at hand. This involves thinking about the workflows you are trying to support and breaking down what sources of data a person in that role would actually consult to do their job.

A context engine is the layer between raw data and an agent: It dictates what sources of information are relevant when an agent is doing its work.

Data Perspectives

Technically, I advocate for organizing the data catalog into 'perspectives'. A 'perspective' is defined by a list of available data sources, and as an agent moves through a workflow, it is always operating from one or more perspectives.

Concretely, let's think about a sales manager trying to estimate this quarter's revenue commitment. We could hand our whole CRM over to an agent and say "have at it", and maybe it will do a good job. Maybe it won't. We have almost no tools to audit how it made any decisions, the quality of those decisions, or the quality of the data it used.

Instead, consider breaking this workflow down in terms of how a person might actually do it:

consider each deal on its own, deals are generally independent of each other
estimate the likelihood that a given deal will close (maybe using MEDDIC)
add up the predicted revenue from all deals that are likely to close

The data perspective we want here is a specific deal - the agent should be able to 'step in' to the perspective of a single deal for a moment and say, this is the universe of data that matters right now. That means in our 'context engine', we need a way of telling the agent, "Hey, no matter how you look up data right now, it's going to be about this deal".

For example:

class DealPerspective {
  constructor(dealId: string) {}

  searchCalls(): string[];  // just calls that relate to this deal!
  fetchCrmValue(fieldName: string): any;

  scoreMeddicMetrics(): { score: number; explanation: string; };
  scoreMeddicEconomicBuyer(): { score: number; explanation: string; };
  // and so on...
}

It's a Spectrum of Structure

The pipeline forecast workflow also gives us absolute structure that we should bake into the agent's decision making. At each step in the workflow, we can establish a clear 'handover' from one job to another. These handover points are exactly where we can run evals and start to build in auditability.

Concretely, think about the handover point between MEDDIC analysis of a specific deal and the calculation of the whole pipeline's expected revenue. Almost everything after we do the MEDDIC analysis can be done with normal software code - it's just basic math (if a deal's score is less than X include it, otherwise don't). If we implement our product this way, we can show the user the exact logic we used to come up with the prediction, and show them the numbers we came up with for each deal. Keep following this approach - break apart the MEDDIC analysis in the same way and we can explain how we came up with those numbers too.

Part of your job as the builder of an AI product is to understand which decisions really matter in a workflow, and to make these first-class citizens in how it's implemented. Not every single decision-point needs to be broken out: It is a lot more flexible to let the agent just 'figure it out' based on the prompt. But know that you are sacrificing your primary means of building trust and explainability when you do this.

Explainability Is Built In

The only way to establish trust in an AI agent's work is to mirror the steps a human would take to achieve the same task and make each step auditable. We need to be able to ask an agent to "show your work", and have that mean something in a way that we can set up Datadog alerts on.

Behind every decision is the data behind it, and thus the key to explainability is auditable context. Next time you are tempted to slap a vector DB onto an OpenAI request and call it a day, ask yourself: "If I had to do this job right now, would I sit here firing off a gajillion search requests until I find the answer? And if someone asked me to prove my answer, would this really satisfy them? That I searched until I got tired?" Most use-cases are better served with an intelligent catalog of available data that is built for the task at hand - which is what a context engine gives you.

Imagine if our pipeline forecast agent simply responded with a wall of text and some anecdotal quotes that it found as evidence. Now contrast this with a spreadsheet that gives a MEDDIC analysis of each deal, along with justification for each score and links back to the source material. Which one is a customer going to trust more?

Stop Delegating Your Thinking to a Computer

Dive deep on your customers' workflows, understand how they make decisions, break them into steps, then shape your context engine around the data they need at each step. This gives you an auditable trail of decisions made and the data considered relevant at each step, building trust in your output and making the system more transparent.

The goal isn't to build an agent that hands you conclusions - it's a system that walks you through the same analysis you'd do on your own. AI scales your analysis, it doesn't replace it.