Zero-Trust for AI: Why We Verify Every Agent Skill

Would you let a stranger run code on your computer just because they promised it was safe?

That’s what happens every time an AI agent loads a “skill” or “tool” from the internet. The agent trusts the skill. The skill runs. And if it’s malicious? Too late.

We built a fix.

The Attack Surface Is Huge

AI agents are increasingly powerful. They can:

Read and write files
Make HTTP requests
Execute shell commands
Access databases
Send emails

Now imagine a malicious skill that looks like a “PDF summarizer” but actually:

Exfiltrates your API keys
Installs a backdoor
Mines cryptocurrency in the background

This isn’t hypothetical. We scanned public skill repositories. 13.4% had critical security issues.

The Solution: 4-Tier Verification

We don’t trust skills. We verify them. Every single one, through four tiers:

Tier 1: Fast Pass (Regex)

Pattern matching catches the obvious stuff in milliseconds:

Direct exec() or eval() calls
Hardcoded credentials
Known malicious patterns

Cost: ~0. Speed: instant.

Tier 2: Guard Model (LLM-as-Judge)

An AI reviews the skill’s code and documentation:

Does the code match the stated purpose?
Is it requesting excessive permissions?
Are there signs of scope drift?

The Guard Model asks: “Does this PDF summarizer really need network access?”

Tier 3: Sandbox Execution

Run the skill in an isolated container:

No network access
No filesystem access
Limited CPU/memory
Monitored for violations

If it tries something suspicious, we catch it.

Tier 4: Sign and Register

Skills that pass all tiers get:

Cryptographic signature (Ed25519)
Content hash (SHA-256)
Entry in the registry (Merkle tree)

Now you can verify: “This exact skill, with this exact code, was verified safe on this date.”

W^X: Write XOR Execute

We enforce a simple rule: code that can be written cannot be executed, and code that can be executed cannot be written.

This prevents entire classes of attacks:

No self-modifying code
No dynamic code injection
No “I’ll just download and run this script”

If a skill needs to generate code, it must go through the pipeline again.

The Numbers

Our reference implementation:

Tier	Latency	Cost
Fast Pass	<10ms	$0
Guard Model	~5s	~$0.01
Sandbox	~2s	~$0.001
Signing	<100ms	$0

Total: ~7 seconds and $0.01 to verify a skill is safe.

Compare that to the cost of a breach.

Try It Yourself

The full verification pipeline is open source:

pip install fdaa-cli

# Verify a skill
fdaa pipeline ./my-skill

# Check a skill's signature
fdaa check ./my-skill

📄 Whitepaper: Skill Verification Pipeline (DOI: 10.5281/zenodo.18676240)

💻 fdaa-cli on GitHub

This is Part 2 of our research series on provable agent infrastructure.