Permission-Aware RAG: The Problem Nobody Mentions in the Tutorials

Every RAG tutorial assumes one user. One corpus. Embed everything, search across all of it, return the top k. Production at a real company doesn’t look like that.

At every company I’ve built RAG for, there are documents Bob can see and Alice can’t. Project plans tied to a customer Alice isn’t on. HR docs only HR can see. Financial models the CFO sees and the marketing intern doesn’t. The RAG system has to respect every one of those boundaries, or it’s not shippable.

"Just filter in the app code" is the answer everyone reaches for first. It’s also the answer that gets you in trouble. This post is the pattern I actually use.

Why "filter in app code" fails

The naive approach: query Postgres for the top k by vector similarity, then in your application loop, drop any results the calling user can’t see. Quick to write. Bad in three ways.

First, it leaks information through latency. If Alice asks "what’s in the Q4 strategy doc" and the app drops the hit she can’t see, she sees no result. If she asks the same question and the doc never existed, she also sees no result. Same UX, but the existence of the doc is leaked if filter logic varies between the two paths. Subtle. Real.

Second, the auto-suggest paths leak content. A "did you mean..." or "I couldn’t find that, but I found these related questions" path running on cached prompts will surface things Alice didn’t ask about, but somehow the system already knows. If the cache key isn’t user-scoped, you’re one bug away from cross-tenant content bleed.

Third, and worst, it’s one mistake away from a breach. The day a junior dev adds a "show me the corpus stats" endpoint and forgets the filter, you’ve shipped a content-disclosure vulnerability.

The real fix is to push the access check INTO the database, so the only code path that can return doc chunks is one that’s already filtered by the user’s permissions.

Defense in depth: app filter + database RLS

I use both. Belt and suspenders. The app-level filter is fine for the happy path. The database-level Row-Level Security is the floor: even if the app forgets a filter, the database refuses to return rows the calling user can’t see.

The Postgres pattern:

Create a separate role used only for user-facing queries. Call it mcp_query or similar.
Write RLS policies on every table holding doc content. The policies read session settings (GUCs) for the user’s identity and group membership.
In every user-facing query, switch into that role for the transaction, set the GUCs to the user’s identity, run the query, exit.

scoped-query.ts

const rows = await sql.begin(async (tx) => {
  // Switch into the restricted role for this transaction only.
  await tx.unsafe('set local role mcp_query');

  // Set the user identity GUCs. RLS policies read these.
  await tx`select set_config('app.user_email',  ${email},  true)`;
  await tx`select set_config('app.user_groups', ${groups}, true)`;
  await tx`select set_config('app.user_domain', ${domain}, true)`;

  return tx`
    select chunk_id, content, score
    from public.search_doc_chunks(${email}, ${groups}, ${domain}, ${qVec}, ${query}, ${k})
  `;
});

The policy attached to doc_chunks reads current_setting('app.user_email') and current_setting('app.user_groups') and joins through an ACL table that maps Drive permissions onto file_ids. If the calling user isn’t in the file’s ACL, the row is invisible. The application can’t see it, can’t count it, can’t say it exists.

Two practical notes. The set local form means the role and the GUCs only apply for this transaction; you can’t accidentally bleed identity to the next request. And the GUCs default to empty strings if not set, which (in a well-written policy) returns zero rows. The failure mode of forgetting to set the user is "they see nothing," not "they see everything."

The "no results" disambiguation problem

RLS-scoped retrieval introduces a new UX bug. The user asks "what’s our 2025 strategy?" and gets back "I couldn’t find anything." Three things could be happening:

The corpus doesn’t have a doc on this topic.
The corpus has the doc, but Alice doesn’t have permission to see it.
The corpus has it, Alice can see it, but retrieval missed (bad embedding match).

Case 1 deserves "I don’t have any documents on this topic." Case 2 deserves something like "I think there are relevant docs you don’t have access to; ask the doc owner." Case 3 deserves "let me try a different phrasing." If you treat all three the same, the UX is worse for everyone.

The trick is to run a SECOND retrieval on the refusal path, this time UNSCOPED, but returning only a single boolean: "does the corpus have a strong match for this query at all?" The probe runs under the base Postgres role with no RLS, and returns absolutely nothing about content or identity.

corpus-probe.ts

export const corpusHasStrongMatch = async (
  query: string,
  threshold = 0.5,
): Promise<boolean> => {
  const embedding = await embedQuery(query);
  const rows = await sql<{ similarity: number }[]>`
    select 1 - (embedding <=> ${qVec}::vector) as similarity
    from public.doc_chunks
    where embedding is not null
    order by embedding <=> ${qVec}::vector
    limit 1
  `;
  return (rows[0]?.similarity ?? 0) >= threshold;
};

Critically: this returns a boolean and only a boolean. No chunk_id, no document name, no excerpt. The threshold (0.5 is a reasonable default for OpenAI’s text-embedding-3-small) is high enough that random off-topic queries don’t trip it.

If the user’s scoped search returned nothing AND the corpus probe says there’s a strong match, the refusal text changes: "I couldn’t find anything in the documents you have access to. There may be relevant docs in the corpus that you don’t have permission to view." That’s actionable. The user knows whether to rephrase or ask for access.

Window expansion under the same scope

If you’re doing window expansion (fetching chunk_index ± 1 around each hit, which you should be; see the chunking post for why), make sure that read runs under the same RLS-scoped role. The temptation is to think "the file_ids I’m passing in were already filtered, so I can use the privileged connection here for speed." Don’t. The day someone changes the upstream filter and a stale file_id sneaks through, you’ll be glad the neighbor read also enforces RLS.

The pattern is: every read of doc_chunks in a user-facing code path goes through set local role mcp_query and the GUC setup. Yes, the extra set-role round-trip costs you a few ms. Pay it. It buys you a defense-in-depth that survives whoever inherits the code three years from now.

What this doesn’t solve

RLS doesn’t help with the embeddings themselves. If you embed "salary band for senior engineers: $X" and store the vector, the vector exists. Anyone who can compute embeddings and run an unscoped query against your vector store can probe it. The defense there is operational: don’t let unscoped code paths exist for production data. The corpus probe above is an exception that exists specifically because we audited it; every other access to doc_chunks is scoped.

RLS also doesn’t help with stale permissions. If Bob is removed from a project on Tuesday, your ACL table needs to know by Tuesday. If your sync from Drive permissions to Postgres is daily, there’s a window where stale access leaks. The fix is either real-time sync (push notifications on file changes) or shorter sync intervals. Either way, assume your ACL table is the bottleneck, and design the polling cadence around how fresh the permissions need to be.

Need RAG that actually respects permissions?

I build production RAG with proper access controls baked in from the start. If you’re shipping AI into a company where document permissions exist (and that’s every company), let’s talk.