If you’ve built an MCP server and noticed Claude confidently making up details that aren’t in your data, the response shape is probably the cause.
Most MCP servers return raw JSON. The model gets a list of objects and treats them all the same way. There’s no signal for "this is a system of record" versus "this is a fuzzy similarity match." No signal for "the cache says this, but it’s three days old." No signal for "you should refuse this question because the data doesn’t support it." The model fills those gaps from its training distribution, which is to say, it makes things up.
The fix is a structured envelope. Every tool returns { data, _meta }. The data is the result. The meta is everything the model needs to know about how trustworthy the data is, where it came from, and what to do when the answer isn’t there.
This is the pattern I run in production. The specific shape isn’t sacred. The principle is: make trust legible to the model.
The envelope shape
export interface ToolEnvelope<T = unknown> {
data: T;
_meta: {
tool: string;
server_ts: string;
source: 'netsuite' | 'google_drive' | 'rag' | 'cache' | 'derived';
confidence: number; // 0..1
freshness?: Freshness;
citations?: Citation[];
refusal?: Refusal;
warnings?: string[];
};
}
Three things matter most: source, citations, and the way refusals are structured. The rest is supporting cast.
Source: not all data is equal
The model has to know whether what you returned is authoritative. A NetSuite record fetched by internal ID right now is system-of-record truth. A RAG hit with a cosine similarity of 0.41 is "the closest thing I could find in the corpus." Those need to be treated differently when the model composes its reply.
The standing rule (encoded in the server instructions and the per-tool description footer): confidence applies meaningfully to RAG results only. NetSuite responses are authoritative regardless of the confidence score; the system-of-record bit comes from source: 'netsuite'. RAG results below confidence 0.5 should be hedged ("I’m not certain, but..."). RAG results above 0.7 can be stated plainly with a citation.
This single distinction makes a huge difference. Without it, the model treats a fuzzy doc match the same as a precise record lookup, and you get confidently-stated nonsense.
Citations: quote names verbatim
Every fact derived from a tool gets a citation. A citation in the envelope is structured:
{
id: 'file_abc123' or 'salesorder_226188',
name: 'Q4 2025 Pricing Strategy.gdoc' or 'SO-13174',
url: '/drive/file/d/abc123/view#:~:text=same-day%20delivery',
score: 0.82, // optional, RAG only
excerpt: 'Same-day delivery is Mon-Fri...' // optional
}
The model is told (in the standing rules baked into every tool description): quote citation names character-for-character when stating facts. Render the URL as a Markdown link in the reply so the user clicks straight through to the source.
The URL trick worth knowing: the text-fragment syntax (#:~:text=...) lets you deep-link to the exact sentence in the source document. Browsers (Chrome, Edge, recent Firefox) will scroll to and highlight the matched phrase on arrival. If you’re doing RAG, this is a free precision-boost on citations. It’s the difference between "the AI vaguely told me about delivery policy" and "the AI said X, quoted Y, and I clicked the link and saw exactly that sentence highlighted in the source doc." That’s the trust moment. Engineer for it.
Refusals: structured, not exceptions
The thing I see go wrong most often: when a tool can’t answer, it throws or returns an error string. The model sees the error, freelances, and tells the user something plausible-sounding. You don’t find out until a customer flags it.
Don’t throw. Return a structured refusal as a normal envelope:
interface Refusal {
// USER-facing. Plain English. No internal vocabulary.
// The model is told to relay this VERBATIM.
reason: string;
// USER-facing. A concrete next step they can take.
remedy: string;
// MODEL-only. May contain internal vocabulary the user shouldn't see.
// Used by the model to self-correct, not by the reply.
developer_reason?: string;
// MODEL-only. Structured per-input corrections so the model can retry.
corrections?: Correction[];
}
The trick is the two-audience design. reason and remedy are written for the end user; the model relays them as-is. developer_reason and corrections[] are written for the model itself, with internal terminology the user wouldn’t understand. The model uses them to retry intelligently without exposing the technical fields to the user.
A concrete example. The user asks "show me sales orders with status equal to ‘Open’." Internally, the status field is an enum with values like SalesOrd:A, not the display label "Open". The tool refuses:
{
data: null,
_meta: {
tool: 'search_records',
source: 'netsuite',
confidence: 0,
refusal: {
reason: "I couldn't filter by status='Open' on sales orders. The status field uses a code list with a specific set of values.",
remedy: "Try filtering with the list-list-values tool first to see the available statuses.",
developer_reason: "Column statusref is an enum; user supplied display label 'Open' instead of the value_id. Use list_list_values('SalesOrd_status') to retrieve the value_id map.",
corrections: [{
field: 'filter_value',
original: 'Open',
suggested: null,
reason: 'Status is a managed list. Resolve the value_id first.',
}]
}
}
}
The model reads developer_reason and corrections, figures out it needs to call list_list_values first, retries, and gets the answer. The user never sees the word "scriptid" or "value_id." They see "I couldn’t filter by status; let me check the available statuses first." Two audiences served from the same response object.
The standing-rules footer
The envelope is only as good as the model’s awareness of how to handle it. MCP servers send a one-time SERVER_INSTRUCTIONS in the initialize handshake, which is where the full rules live. But long-running sessions and large tool catalogs make it risky to rely on the model remembering rules from 50 turns ago.
The backstop is a short footer appended to every tool’s description. ~150 tokens, repeated on every tools/list. It restates the core envelope-handling rules: read _meta first, relay refusal.reason verbatim, never quote developer_reason to the user, hedge below confidence 0.5 for RAG, treat NetSuite as authoritative.
You pay the duplication (~150 tokens × tool count, once per session) for behavioral consistency. The model is told the same thing multiple ways. It works better than the alternatives I’ve tested: the model holds the envelope discipline through long conversations where it would otherwise drift.
What this earns you
The tools become harder to misuse. The model knows when it’s looking at authoritative data versus a fuzzy match. When it has to refuse, it does so politely and offers a real next step. When it cites, it cites verbatim with a clickable link to the exact passage. When it’s iterating on a failed call, it can correct itself without exposing developer-only error text to the user.
The cost is a few hundred extra lines of TypeScript for the envelope helpers (ok(), refuse(), freshnessFrom()), a system prompt, and the discipline to actually return refusals as data instead of throwing. That cost amortizes fast across every tool you add. Every new tool just calls the same helpers and inherits the same behavior.
Where this fails
Two things to watch. First: confidence scoring has to be honest. If you slap confidence: 1.0 on every RAG hit, the model trusts it and the whole pattern degrades. Calibrate. For OpenAI’s text-embedding-3-small, raw cosine scores cluster around 0.3 to 0.7 for relevant matches; map those onto the 0-1 confidence range with some thought.
Second: the model will only obey the standing rules to the extent the system prompt and footer say so. If the rules drift away from the actual implementation (you added a new source value but forgot to update the rules), the model handles the new case unpredictably. Treat the SERVER_INSTRUCTIONS as a contract: when the envelope changes, the prompt updates with it.
The honest thing about this pattern is that it took me three failed MCP servers to land on it. The first two threw exceptions, returned raw JSON, and hoped the model would figure it out. The model figured it out wrong. Structured trust beats hoping every time.