How to use the Auditor.

Drop a file, paste a snippet, or point at a repo. The auditor runs a rigorous 10-step method — three code passes, calibrated severity, concrete patches — and produces a structured security report. No exploits generated. Ever.

Scope

Establish context

Threat model, trust boundaries, deployment target

Scan

Three-pass review

Patterns → Auth/Authz → Context-specific

Fix

Reproduce & Patch

Concrete repro steps and code diffs

Ship

Report & Plan

Calibrated report + remediation plan

Installation

Claude Code / Cowork

Clone into your skills folder. The skill is auto-detected on the next session.

# Clone into your skills folder git clone https://github.com/alboechat/appsec-vulnerability-auditor.git # Or clone a stable tag git clone --branch v1.1.0 https://github.com/alboechat/appsec-vulnerability-auditor.git

OpenAI Codex CLI

Place in your Codex skills directory.

cd ~/.codex/skills git clone https://github.com/alboechat/appsec-vulnerability-auditor.git

Cursor / Windsurf / Cline

Copy the contents of SKILL.md into your custom instructions or system prompt. Grant filesystem access to references/, examples/, and templates/.

Any AI coding agent

The skill is a standard Markdown prompt file. Any agent that accepts system instructions or custom prompts can use it — paste or reference SKILL.md.

No dependencies. The skill is a prompt file + reference documents. No build step, no API keys, no runtime.

Your first audit

The simplest invocation: paste code and ask.

// Paste your code, then ask: "Review this for security vulnerabilities" // Or point at a file: "Audit src/api/routes.ts for security" // Or an entire repo: "Do a full security review of this codebase"

The auditor will ask one clarifying question to establish scope (threat model, trust boundaries), then run the full 10-step method. If your code is small, the entire audit runs in one pass. For repositories, it uses staged mode with resumable state tracking.

Tip: Providing a threat model upfront skips the clarification step and improves severity accuracy. Tell the auditor if it's a public web app, internal tool, CLI, library, or LLM agent.

Trigger phrases

The skill activates automatically on any of these patterns — in English or Portuguese.

"audit this code"

"review for security"

"find security bugs"

"check for OWASP Top 10"

"is this safe to deploy?"

"look for vulnerabilities"

"review this AI-generated code"

"I had Claude write this — check it"

"auditar segurança"

"é seguro para o deploy?"

You can also paste any code snippet accompanied by phrases like "is this OK?" or "can you check this?" — the auditor will recognize the intent.

The 10-step method

Every audit follows the same sequence. No steps are skipped, even on small inputs — the rigor is the value.

Step	Name	What happens
1	Establish scope	Threat model, trust boundary, authorship, deployment target
2	Map attack surface	Entry points, trust transitions, sensitive sinks
2.5	Ingest seed findings (optional)	Normalize SARIF/Semgrep input into the ledger
3	Pattern scan	OWASP Top 10, CWE Top 25, AI-code pitfalls, secrets
4	Auth, authz & data flow	Protected routes, IDOR, sanitizer fit, state machines
5	Context-specific passes	Stack-matched checks: web, LLM agent, mobile, IaC
6	Reproduce or sketch	Safe verification steps for Critical/High findings
7	Patch	Code diff, why it closes the risk, regression tests
8	Report	Structured Markdown: scope, surface, findings, next steps
9	Calibrate severity	Confidence, assumptions, downgrade/upgrade conditions
10	Remediation plan	Sprint tasks, acceptance criteria, dependency graph

STEP 1

Establish scope

Ask about threat model, trust boundaries, who wrote the code (human, AI, or both), deployment target, and whether secrets/PII/PHI are in scope. If already provided, skip to Step 2.

STEP 2

Map the attack surface

Build a mental model: entry points (HTTP routes, CLI args, file uploads, MCP tools), trust transitions (where untrusted input enters a trusted context), and sensitive sinks (auth, crypto, file I/O, subprocess).

STEP 2.5

Ingest seed findings (optional)

If the user provides pre-existing findings (Semgrep, SARIF), normalize them into the internal ledger format with provenance tags. Validate during Pass 1.

STEP 3

Pass 1 — Pattern scan

Walk the code against OWASP Top 10, CWE Top 25, AI-generated code pitfalls, LLM agent security patterns, and secrets/config checks. Tag each finding with severity, confidence, CWE ID, and file:line.

STEP 4

Pass 2 — Auth, authz & data-flow

Hand-trace authentication (every protected route actually protected?), authorization (IDOR checks on every user-owned resource), data flow (untrusted input → sanitizer → sink), and state machines (can steps be skipped?).

STEP 5

Pass 3 — Context-specific

Choose checks based on what the code is (web frontend, backend, LLM agent, mobile, IaC, library). Load language-specific reference packs only for the stack in scope.

STEP 6

Reproduce or sketch

For every Critical/High finding: show a concrete reproduction (curl call, request body) or describe steps in prose. The user must be able to verify without receiving an exploit kit.

STEP 7

Patch

For every finding: a code diff in the same language and style, an explanation of why it closes the vulnerability, and regression tests. If it requires architectural change, say so.

STEP 8

Report

Structured Markdown report: scope, attack-surface map, calibration note, findings table, detailed writeups, negative findings, out-of-scope notes, and recommended next steps.

STEP 9

Calibrate severity

Every finding gets: severity level, confidence, assumptions, downgrade/upgrade conditions, and validation questions. No inflation, no deflation — calibrated severity is the credibility of the report.

STEP 10

Offer remediation plan

After delivering the report, offer to generate a prioritized remediation plan. Users choose which findings to fix by tier (T1/T2/T3), preset (quick-wins, ship-blocker, full), or individual selection. Output: sprint-allocated .md plan with task checklists, fix steps, acceptance criteria, and dependency graph.

Vibe-coding patterns

Code written via AI assistants ("vibe-coded") fails in characteristic ways. The auditor checks these even when the rest of the audit is clean — they are the highest-yield findings and the key differentiator vs. a generic SAST tool.

Top 10 vibe-coding checks

#	Pattern	Why it matters
1	Confidently wrong crypto	md5, sha1, ECB mode, hand-rolled JWT verification that looks idiomatic
2	Missing authz on PATCH/DELETE	AI scaffolds CRUD, authenticates the route, forgets resource ownership check → IDOR
3	Prompt-injectable tool use	Untrusted text passed as system/user role with tool-calling enabled = pre-auth RCE-equivalent
4	Hardcoded keys "for testing"	In commit history, .env.example, comments, frontend bundles, mobile resources
5	Server-trusts-client	Pricing on client, role from JWT claim client sets, quantity validated only in JS
6	String-concatenated SQL/HTML/shell	AI falls back to interpolation even when ORM/template engine was available
7	Overly broad CORS (*)	Combined with credentialed endpoints = full cross-origin access
8	eval / exec / pickle.loads	On user input — disproportionately common in AI-generated "flexible" code
9	SSRF in URL fetchers	fetch(user_url) with no allowlist — agent stacks expose metadata services
10	Secrets in LLM context window	Agent with env var / file / DB access can be coerced into echoing them = data exfil

Report structure

Every audit report contains these sections, in order:

Section	Purpose
1. Scope & threat model	What was reviewed, what was assumed
2. Attack-surface map	Entry points, trust transitions, sensitive sinks
3. Calibration note	How to interpret severity, confidence, provisional ratings
4. Findings table	Summary sorted by severity
5. Detailed writeups	One per finding: description, reproduction, patch, calibration
6. Negative findings	What was checked but found clean (builds trust)
7. Out-of-scope notes	Anything seen but not reviewed (e.g., third-party deps)
8. Recommended next steps	Priority-ordered actions
9. Findings Ledger	JSON format for programmatic consumption (optional)

Output formats

Default output is a Markdown report. You can request specific formats:

Format	What you get
Markdown default	Structured .md report following the example template
SARIF	SARIF 2.1.0 JSON for GitHub Code Scanning upload
GitHub Issue	Markdown body ready to paste into a GitHub issue
JIRA Ticket	Formatted for JIRA issue creation
Executive Summary	Non-technical one-pager for leadership
Findings Ledger	Machine-readable JSON per finding-schema.json

// Request a specific format: "Audit this code and output as SARIF" "Review for security — I need a JIRA ticket per finding" "Security review with executive summary for my CTO"

Severity scale

The auditor uses calibrated severity with strict definitions. No inflation, no deflation.

Severity	Definition	SLA
Critical	Pre-auth RCE, full DB exfil, total auth bypass, ransomable	24 hours
High	Authenticated RCE, SQLi behind auth, IDOR exposing other users' data, secrets in repo	1 week
Medium	Stored XSS (auth), missing rate limits, weak crypto where threat model tolerates it	1 month
Low	Info disclosure (low sensitivity), missing security headers, verbose errors	Convenient
Info	Best-practice deviations, deprecated patterns, hardening opportunities	No SLA

Every finding also includes confidence (High/Medium/Low), assumptions, downgrade conditions, upgrade conditions, and validation questions when context is incomplete. Findings without a threat model are marked (provisional).

Findings Ledger

For automation and tracking, the auditor can output a JSON Findings Ledger conforming to references/finding-schema.json (JSON Schema Draft 2020-12).

{ "schema_version": "1.0", "audit_id": "notes-app-2026-05-28", "findings": [ { "id": "FINDING-001", "title": "SQL Injection in search endpoint", "severity": "Critical", "confidence": "High", "cwe": "CWE-89", "file": "src/api/search.ts:42", "source": "manual" } ] }

CLI tools

The skill ships with automation scripts in the tools/ directory. These are not required for auditing — the skill works with just SKILL.md — but they streamline common workflows.

Script	Purpose
`appsec-audit.py`	Python CLI wrapper for invoking audits programmatically
`run-evals.sh`	Run the full evaluation suite against benchmark cases
`eval-compare.py`	Compare two scorecard snapshots for regression detection
`seed-with-semgrep.sh`	Run Semgrep and format output as seed findings for the auditor
`validate.sh`	Validate skill structure, internal links, and eval manifest
`check-skill-links.sh`	Check all internal file references in SKILL.md

Agent configs

Ready-made configuration files for specific platforms live in agents/.

File	Platform	Notes
`agents/openai.yaml`	OpenAI Codex CLI	Drop-in agent config with skill path and instructions

Community contributions welcome. If you create a config for Cursor, Windsurf, Cline or another platform, open a PR. The goal is zero-friction setup on every major AI coding tool.

Worked examples

The examples/ directory contains complete, production-quality output samples showing exactly what the auditor produces.

File	Shows
`example-audit-report.md`	Full Markdown audit report with all 9 sections
`findings-ledger.example.json`	Structured JSON ledger with severity, CWE, assumptions
`remediation-plan-example.md`	Sprint-ready remediation plan from Step 10
`repository-audit-summary.md`	Repo-scale audit summary with module-level findings
`audit-state.example.json`	Resumable audit state for large repositories
`output-formats/sarif-example.json`	SARIF 2.1.0 for GitHub Code Scanning upload
`output-formats/github-issue.md`	GitHub issue body per finding
`output-formats/jira-ticket.md`	JIRA-friendly ticket per finding
`output-formats/executive-summary.md`	CTO/leadership risk summary

CI pipeline

The repository includes a GitHub Actions workflow (.github/workflows/ci.yml) that validates the skill on every push:

Check	What it validates
Structure	All referenced files exist, SKILL.md frontmatter is valid
Internal links	Every path in SKILL.md resolves to an actual file
Eval manifest	All benchmark cases exist and expected findings sum correctly
Schema	finding-schema.json is valid JSON Schema

What's in the box

A complete map of the repository structure.

SKILL.md # Core audit prompt references/ # 11 security reference packs owasp-top-10.md cwe-top-25.md ai-generated-code-pitfalls.md llm-agent-security.md secrets-and-config.md severity-calibration.md output-formats.md repository-audit-protocol.md evaluation-protocol.md remediation-plan-protocol.md seed-input-protocol.md finding-schema.json lang/ # 9 language-specific packs typescript.md python.md go.md rust.md java.md ruby.md php.md csharp.md swift-kotlin.md examples/ # 9 worked output samples templates/ # Remediation plan template evals/ # 5 benchmark cases + scorecard tools/ # CLI wrapper, eval runner, validators agents/ # Ready-made agent configs docs/ # Architecture, glossary, threat model glossario-appsec-pt-br.md # PT-BR AppSec glossary threat-model-template.md # Structured intake template

Repository-scale audits

For full-repo audits, the skill uses staged mode with resumable state tracking. It partitions the codebase into scoped segments and audits each in sequence, maintaining an audit-state.json that tracks progress.

Resumable. If a session runs out of context, re-invoke the skill — it picks up where it left off from the persisted audit state.

At the end, a repository audit summary consolidates all segment findings into a single prioritized report with cross-cutting themes and architectural recommendations.

Semgrep seed input

You can bootstrap the audit with pre-existing findings from static analysis tools. The skill accepts SARIF, Semgrep JSON, or custom JSON conforming to finding-schema.json.

# Run Semgrep and feed results to the auditor semgrep --config auto --json -o semgrep-results.json src/ # Or use the included wrapper ./tools/seed-with-semgrep.sh src/

Every seed finding is tagged with its provenance (e.g., source: semgrep) so the report distinguishes tool-found from manually-identified vulnerabilities. The auditor validates seed findings during Pass 1 to filter false positives.

Language coverage

The skill loads language-specific reference packs only for the stack in scope — it does not load every pack by default.

Language / Framework	Reference pack
TypeScript / JavaScript / Node / React / Next.js	`references/lang/typescript.md`
Python / Django / Flask / FastAPI	`references/lang/python.md`
Go / Gin / Echo / chi / gRPC	`references/lang/go.md`
Rust / Axum / Actix Web / Rocket	`references/lang/rust.md`
Java / Kotlin JVM / Spring / Jakarta	`references/lang/java.md`
Ruby / Rails / Sinatra	`references/lang/ruby.md`
PHP / Laravel / Symfony / WordPress	`references/lang/php.md`
C# / .NET (ASP.NET Core, Blazor, EF)	`references/lang/csharp.md`
Swift / Kotlin (iOS / Android)	`references/lang/swift-kotlin.md`

Defensive posture

The skill is defensive only. It identifies vulnerabilities and writes patches. It does not:

Refused	Offered instead
Write working exploits or weaponized PoCs	Detection rules and hardened test cases
Fingerprint third-party systems	Guidance on what to check in your own infra
Bypass authentication or licensing	Recommendations for proper auth hardening
Produce offensive instructions	The equivalent defensive version

When asked for offensive output, the auditor declines and offers the equivalent defensive version — a detection rule, a fix, a hardened test case.