Would you let a stranger run code on your computer just because they promised it was safe?
That’s what happens every time an AI agent loads a “skill” or “tool” from the internet. The agent trusts the skill. The skill runs. And if it’s malicious? Too late.
We built a fix.
The Attack Surface Is Huge
AI agents are increasingly powerful. They can:
- Read and write files
- Make HTTP requests
- Execute shell commands
- Access databases
- Send emails
Now imagine a malicious skill that looks like a “PDF summarizer” but actually:
- Exfiltrates your API keys
- Installs a backdoor
- Mines cryptocurrency in the background
This isn’t hypothetical. We scanned public skill repositories. 13.4% had critical security issues.
The Solution: 4-Tier Verification
We don’t trust skills. We verify them. Every single one, through four tiers:
Tier 1: Fast Pass (Regex)
Pattern matching catches the obvious stuff in milliseconds:
- Direct
exec()oreval()calls - Hardcoded credentials
- Known malicious patterns
Cost: ~0. Speed: instant.
Tier 2: Guard Model (LLM-as-Judge)
An AI reviews the skill’s code and documentation:
- Does the code match the stated purpose?
- Is it requesting excessive permissions?
- Are there signs of scope drift?
The Guard Model asks: “Does this PDF summarizer really need network access?”
Tier 3: Sandbox Execution
Run the skill in an isolated container:
- No network access
- No filesystem access
- Limited CPU/memory
- Monitored for violations
If it tries something suspicious, we catch it.
Tier 4: Sign and Register
Skills that pass all tiers get:
- Cryptographic signature (Ed25519)
- Content hash (SHA-256)
- Entry in the registry (Merkle tree)
Now you can verify: “This exact skill, with this exact code, was verified safe on this date.”
W^X: Write XOR Execute
We enforce a simple rule: code that can be written cannot be executed, and code that can be executed cannot be written.
This prevents entire classes of attacks:
- No self-modifying code
- No dynamic code injection
- No “I’ll just download and run this script”
If a skill needs to generate code, it must go through the pipeline again.
The Numbers
Our reference implementation:
| Tier | Latency | Cost |
|---|---|---|
| Fast Pass | <10ms | $0 |
| Guard Model | ~5s | ~$0.01 |
| Sandbox | ~2s | ~$0.001 |
| Signing | <100ms | $0 |
Total: ~7 seconds and $0.01 to verify a skill is safe.
Compare that to the cost of a breach.
Try It Yourself
The full verification pipeline is open source:
pip install fdaa-cli
# Verify a skill
fdaa pipeline ./my-skill
# Check a skill's signature
fdaa check ./my-skill
๐ Whitepaper: Skill Verification Pipeline (DOI: 10.5281/zenodo.18676240)
๐ป fdaa-cli on GitHub
This is Part 2 of our research series on provable agent infrastructure.