Would you let a stranger run code on your computer just because they promised it was safe?

That’s what happens every time an AI agent loads a “skill” or “tool” from the internet. The agent trusts the skill. The skill runs. And if it’s malicious? Too late.

We built a fix.


The Attack Surface Is Huge

AI agents are increasingly powerful. They can:

  • Read and write files
  • Make HTTP requests
  • Execute shell commands
  • Access databases
  • Send emails

Now imagine a malicious skill that looks like a “PDF summarizer” but actually:

  • Exfiltrates your API keys
  • Installs a backdoor
  • Mines cryptocurrency in the background

This isn’t hypothetical. We scanned public skill repositories. 13.4% had critical security issues.


The Solution: 4-Tier Verification

We don’t trust skills. We verify them. Every single one, through four tiers:

Tier 1: Fast Pass (Regex)

Pattern matching catches the obvious stuff in milliseconds:

  • Direct exec() or eval() calls
  • Hardcoded credentials
  • Known malicious patterns

Cost: ~0. Speed: instant.

Tier 2: Guard Model (LLM-as-Judge)

An AI reviews the skill’s code and documentation:

  • Does the code match the stated purpose?
  • Is it requesting excessive permissions?
  • Are there signs of scope drift?

The Guard Model asks: “Does this PDF summarizer really need network access?”

Tier 3: Sandbox Execution

Run the skill in an isolated container:

  • No network access
  • No filesystem access
  • Limited CPU/memory
  • Monitored for violations

If it tries something suspicious, we catch it.

Tier 4: Sign and Register

Skills that pass all tiers get:

  • Cryptographic signature (Ed25519)
  • Content hash (SHA-256)
  • Entry in the registry (Merkle tree)

Now you can verify: “This exact skill, with this exact code, was verified safe on this date.”


W^X: Write XOR Execute

We enforce a simple rule: code that can be written cannot be executed, and code that can be executed cannot be written.

This prevents entire classes of attacks:

  • No self-modifying code
  • No dynamic code injection
  • No “I’ll just download and run this script”

If a skill needs to generate code, it must go through the pipeline again.


The Numbers

Our reference implementation:

TierLatencyCost
Fast Pass<10ms$0
Guard Model~5s~$0.01
Sandbox~2s~$0.001
Signing<100ms$0

Total: ~7 seconds and $0.01 to verify a skill is safe.

Compare that to the cost of a breach.


Try It Yourself

The full verification pipeline is open source:

pip install fdaa-cli

# Verify a skill
fdaa pipeline ./my-skill

# Check a skill's signature
fdaa check ./my-skill

๐Ÿ“„ Whitepaper: Skill Verification Pipeline (DOI: 10.5281/zenodo.18676240)

๐Ÿ’ป fdaa-cli on GitHub


This is Part 2 of our research series on provable agent infrastructure.