Why MCP servers are a security risk (and what kernel sandboxing fixes)

The Model Context Protocol has done something remarkable: in under two years it turned “give my AI agent access to a tool” from a bespoke integration into an npx one-liner. There are thousands of community MCP servers for everything from Slack to UniFi controllers.

It has also quietly reproduced one of the oldest mistakes in software distribution: we are all running unaudited third-party code with our most sensitive credentials in its environment.

The threat model nobody wrote down

Look at how a typical MCP server gets installed. You copy a JSON snippet into your client config:

{
  "mcpServers": {
    "slack": {
      "command": "npx",
      "args": ["-y", "some-slack-mcp"],
      "env": { "SLACK_BOT_TOKEN": "xoxb-..." }
    }
  }
}

Unpack what just happened:

You handed your raw token to a process you’ve never read the source of. The token sits in the process environment, readable by the server, every dependency in its tree, and any post-install script that ran during npx -y.
That process has your full network access. Nothing distinguishes its legitimate call to slack.com from a POST of your token to an attacker’s collection endpoint. Both are just outbound HTTPS.
It has your filesystem too. Your SSH keys, your browser profile, your other config files full of other tokens: all readable by default.
It updates out from under you. npx -y fetches whatever the registry serves today. The package that was clean when you installed it can be malicious after a maintainer-account compromise. The npm supply chain has demonstrated this failure mode repeatedly.

And there’s an MCP-specific twist: prompt injection becomes a confused-deputy attack. Even a perfectly honest server executes the tool calls your model asks for. If hostile text in a web page or document tricks the model, a benign filesystem server happily reads ~/.ssh/id_ed25519 and a benign HTTP server happily exfiltrates it. The server doesn’t need to be malicious; it just needs to be capable.

The common thread: in every scenario, the blast radius of one compromised (or merely obedient) MCP server is everything that process can reach, which today is everything you can reach.

Why app-level mitigations don’t cut it

The first instinct is to fix this in userspace: audit the servers, pin versions, set HTTPS_PROXY and filter traffic.

Each of these helps and none of them holds:

Audits don’t scale and rot instantly. You’d need to re-audit every dependency on every update, forever.
Version pinning protects against future tampering but not against malice already present, and most users won’t maintain pins.
Environment-variable proxying is bypassable. HTTPS_PROXY is a convention. Malicious code simply ignores it and opens a direct socket. We validated this directly while building Gig’MCP: env-only proxy enforcement failed against trivially uncooperative code. If the process can route around your proxy, your proxy is decoration.

The lesson security engineering keeps re-teaching: a boundary the untrusted code can opt out of is not a boundary. You need enforcement at a layer the code cannot reach. That layer is the kernel.

What kernel sandboxing actually fixes

Gig’MCP runs every community MCP server inside a sandbox built on bubblewrap (bwrap), the same foundation Flatpak uses. Concretely, each server gets:

A private mount namespace. No host filesystem. Your SSH keys and dotfiles aren’t merely protected; from inside the sandbox they don’t exist.
A private network namespace. The sandbox has its own isolated network stack. Its only route to anywhere is a virtual ethernet pair leading to the gateway’s egress proxy. This is the part that makes egress control real: not a proxy environment variable the code can ignore, but routing topology it cannot escape. The proxy enforces a per-server domain allowlist and identifies tenants by source IP, which is unforgeable because each sandbox can only source addresses from its own /30.
Private user and PID namespaces with a cleared environment. No host process visibility, no inherited secrets.
Zero privileges. A trusted bootstrap configures the sandbox’s network link, drops all capabilities (CapEff=0), switches to uid 65534 (nobody), and only then executes the untrusted server.
A seccomp-BPF filter that closes the classic container-escape vectors. It kills processes that call unshare or setns, arg-filters clone to kill CLONE_NEWUSER attempts (closing the nested-user-namespace escape), blocks the mount family (mount, pivot_root, chroot, and friends), and denies ptrace, keyctl, bpf, and kernel-module loading. One careful detail: clone3 returns ENOSYS rather than killing the process, because modern glibc uses it for pthread_create. The fallback path still hits the arg-filtered clone, so the escape stays closed without breaking multithreaded servers. This behavior is covered by tests: goroutines and TCP work, while unshare gets SIGSYS-killed.

And the most important property is the one that’s almost boring: credentials never enter the sandbox at all. The server gets a placeholder token; the real key is injected by the egress proxy, outside the sandbox, only on HTTPS calls to allowlisted domains. Even total compromise of the server yields a worthless string. That mechanism gets its own post.

Revisit the threat model with this in place:

Threat	Before	Sandboxed
Malicious dependency reads your token	Game over	Reads a placeholder
Server exfiltrates to attacker domain	Silent success	Blocked at the proxy; not on the allowlist
Server reads `~/.ssh`, other configs	Trivial	No host filesystem exists
Prompt-injected exfiltration	Full file + network access	Capped at the server’s declared entitlements
Compromised update changes behavior	Whatever the new code wants	Digest-pinned image; same kernel cage either way

Honest limits

Kernel sandboxing is a boundary, not magic, and we document what it doesn’t cover:

Shared-kernel isolation. A kernel 0-day breaks tenant separation. We consider that acceptable for self-hosted deployments running a curated catalog; truly hostile multi-tenant hosting would want gVisor or microVMs, which the design leaves room for.
A server can misuse the access you granted it. If you allowlist slack.com and hand over a Slack scope, a malicious server can do bad things on Slack, as you. Sandboxing caps the blast radius at the declared entitlements; it can’t make a granted capability safe. (This is why manifests declare minimal egress and a curated default tool subset.)
The current seccomp filter is a targeted denylist, not a full syscall allowlist. It closes the known escape and escalation vectors (and is verified by tests), while a stricter allowlist-style profile is on the hardening roadmap, alongside Landlock filesystem rules and cgroup resource limits. We’d rather state that plainly than imply more than we ship.

Trust the kernel, not the author

The MCP ecosystem’s current answer to security is reputational: stars, downloads, “it’s by a known person.” That works until it doesn’t, and supply-chain history says it eventually doesn’t.

The structural answer is to make the author’s trustworthiness irrelevant: run every server as if it’s hostile, behind boundaries enforced by the kernel, with credentials it never sees and egress it can’t choose. That’s what Gig’MCP is: an open-source (AGPL-3.0), self-hosted gateway you can run with one docker compose up, currently pre-launch and developed in the open.

The architecture and security model docs go deeper, the source is on GitHub, and if you want to know when the first release ships, watch the repo.