Plan
Start with threat model, trust boundaries, sensitive data and deployment target.
AppSec Audit reviews files, diffs, PRs and repositories for real security risk. It maps attack surface, traces untrusted input, calibrates severity, and turns findings into concrete fixes.
User: Audit this PR before deploy. Scope: public Next.js app, Prisma, Stripe, admin routes Surface: 14 routes, 3 webhook handlers, 2 LLM tools [ok] threat boundaries mapped [ok] language packs loaded [next] generate remediation plan?
No slash-command memorization required. Say what you want reviewed, name the artifact, then ask for the output you need.
Run AppSec Vulnerability Auditor on this file.Audit this PR for vulnerabilities before merge.Do a repository-scale security audit with resumable audit state.Ingest this SARIF report as seed findings, validate them, then audit.Use this Semgrep JSON as provenance-tagged seed evidence.Review this Gitleaks output and confirm which findings are real.Audit this MCP server for prompt injection, tool authz and context exfiltration.Trace untrusted model output into tools, file access and network egress.Check agent URL fetchers for SSRF and missing allowlists.Return the audit as Markdown first, then SARIF 2.1.0.Create GitHub issue bodies for each finding.Write an executive summary with risk, assumptions and next steps.Generate a remediation plan for ship-blocker findings only.Give me quick-wins across all tiers with tests and acceptance criteria.Create a full remediation plan with sprints and residual risk.Start with threat model, trust boundaries, sensitive data and deployment target.
Review files, diffs, PRs or repos with language packs and AppSec references.
Get calibrated findings with CWE, OWASP, file line, assumptions and validation questions.
Ask for patches, regression tests, sprint tasks and residual-risk notes.
ask: establish scopeCreates review assumptions for exposure, data sensitivity, deployment and trust boundaries.ask: map entry pointsFinds routes, webhooks, CLIs, queues, uploads, MCP tools and sensitive sinks.attach: SARIF, Semgrep, GitleaksImports existing scanner findings as provenance-tagged evidence, then validates them.ask: OWASP / CWE / AI codeChecks OWASP Top 10, CWE Top 25, generated-code pitfalls, secrets and LLM-agent risks.ask: trace untrusted inputFollows input into DB, shell, templates, file IO, deserialization, network egress and tools.auto: by stackLoads targeted checks for frontend, backend, mobile, infra, library or LLM-agent code.ask: reproduce or sketchGives defensive verification steps for Critical and High without weaponized payloads.ask: patch findingsReturns fix direction, implementation notes and regression tests in the codebase style.ask: SARIF, JIRA, GitHub, JSON, CTOTransforms report into scanner upload, ticket bodies, ledger JSON or executive summary.ask: full repository auditUses manifest scan, entry-point map, module deep dives, cross-cutting passes and audit state.ask: quick-wins, ship-blocker, fullBuilds sprint tasks with dependencies, acceptance criteria, tests and residual risk.ask: run evalsUses benchmark cases and scorecards to check recall, precision, severity and calibration.AppSec Audit starts with threat boundaries, follows data flow by hand, validates high-signal patterns, and ends with fixes engineers can apply.
THREAT.mdsurface mapdata flowfindings ledgerremediation.mdentrypoints: - /api/projects/:id PATCH - /api/checkout POST - /webhooks/stripe POST - llm.tools.fetch_url sensitive_sinks: - prisma.project.update() - stripe.checkout.sessions.create() - markdown renderer - fetch(user_url)
High, provisional confidence: medium assumption: route is reachable by regular users downgrade: gateway blocks cross-tenant IDs upgrade: endpoint exposes payment metadata validation: confirm route middleware order
The skill treats context as evidence. Framework conventions, language packs, SAST seeds and deployment assumptions shape the final report.
entry-points: POST /api/auth/login [public] PATCH /api/projects/:id [auth] ← IDOR candidate POST /api/webhooks/stripe [signed] GET /llm/summarize [auth] ← tool-call scope sensitive-sinks: prisma.user.findUnique() [PII] stripe.paymentIntents.create [financial] exec(userCmd) [RCE risk] trust-boundaries: 3 surfaces: 14
finding: IDOR on PATCH /projects/:id severity: High confidence: High assumption: no row-level policy in DB downgrade: if RLS enforced → Medium upgrade: if admin bypass → Critical validate: "Does Prisma scope by user_id?"
Small inputs still get rigor. Large repositories get staged state. High-risk findings get verification guidance without weaponized payloads.
Threat model, trust boundary, authorship, deployment target and sensitive data set the review frame.
HTTP routes, CLIs, queues, uploads, webhooks, MCP tools and sinks are mapped before findings are written.
SARIF, Semgrep, Gitleaks and previous reports become provenance-tagged evidence, not automatic truth.
OWASP Top 10, CWE Top 25, AI-generated-code pitfalls, LLM-agent security and secrets/config checks collect candidates.
Protected-route coverage, ownership checks, IDOR, sanitizer fit and state-machine abuse get hand-traced.
Frontend, backend, LLM agent, mobile, infra and library audits load different checks.
Critical and High findings get safe verification steps without becoming exploit kits.
Each finding includes fix direction, why it closes risk, and tests that prevent regression.
Scope, surface map, calibration note, findings table, writeups, clean checks, out-of-scope notes and next steps ship together.
Critical through Informational ratings are tied to exploitability, impact, confidence and assumptions.
User-selected findings turn into sprint tasks with acceptance criteria, tests and residual-risk notes.
Coverage combines classic vulnerability classes with failure modes disproportionately common in AI-assisted development.
| Category | Examples | Typical generated-code shape |
|---|---|---|
| Broken access control | IDOR, tenant bypass, admin route gaps | CRUD route authenticates but never checks ownership |
| Injection | SQL, NoSQL, shell, template injection | String interpolation where ORM or argv arrays already exist |
| LLM-agent security | Prompt injection, tool authz, context exfiltration | Untrusted text enters tool-enabled LLM calls with secrets nearby |
| Cryptographic failures | Weak hashing, JWT misuse, bad randomness | Plausible helper function skips signature or uses SHA for passwords |
| Server trusts client | Price, role, tenant, ownership from frontend state | Generated UI and API agree on a field nobody verifies server-side |
| Secrets and config | Hardcoded keys, debug prod, weak cookies | Testing values survive into repo, bundle or example env files |
Generated code often looks idiomatic while skipping checks that make it safe. AppSec Audit is tuned for those repeat failures.
Looks tidy, misses signature validation, password storage, mode choice or randomness.
cryptoRoute is authenticated, but resource ownership never enters the query or policy.
IDORUntrusted content flows into tool-enabled LLM calls with access to network, files or secrets.
agentExample credentials, real tokens and debug switches remain in source, comments or bundles.
secretsAgent tools, PDF renderers and webhook helpers fetch arbitrary URLs without allowlists.
egressMarkdown remains the source of truth. Machine and ticket formats preserve severity, confidence, assumptions, validation questions and remediation context.
| Format | Use |
|---|---|
| Markdown audit report | Scope, attack surface, detailed findings and recommended next steps. |
| SARIF 2.1.0 | Upload to GitHub Code Scanning or compatible security tooling. |
| GitHub / JIRA | One issue or ticket body per finding, calibrated and actionable. |
| Executive summary | Leadership-safe risk framing without exploit payloads. |
| Findings ledger JSON | Programmatic severity, CWE, OWASP, assumptions and validation questions. |
audit-report.md |- scope and threat model |- attack-surface map |- findings table |- detailed writeups |- clean checks `- next steps findings-ledger.json |- severity |- confidence |- CWE / OWASP |- assumptions `- validation questions
After an audit, Step 10 lets you choose what to fix and generates a sprint-ready remediation plan.
Acceptance: user A cannot modify user B project. Regression: cross-tenant PATCH returns 403.
Acceptance: submitted price is ignored by checkout service. Regression: forged amount keeps canonical price.
Acceptance: disallowed tags removed. Regression: renderer snapshot covers unsafe attributes.
The skill should never claim full-repo coverage unless every in-scope module was reviewed or explicitly deferred.
The eval suite keeps the skill honest as references, language packs and prompts evolve.
case: notes-app-2026-05-28 expected: 6 found: 6/6 recall: 100% case: agent-stripe-2026-04-15 expected: 4 found: 4/4 recall: 100% case: next-prisma-2026-03-22 expected: 5 found: 5/5 recall: 100% severity calibration: ✓ safe-guard behavior: ✓ false positives: 0 / 15 findings
Synthetic mode validates the comparator and scorecard pipeline. Live evaluation can measure recall, precision, severity accuracy, calibration completeness and safety behavior.
If a request crosses into offensive use, the skill redirects to a defensive equivalent: detection rule, fix, regression test or hardening plan.
The skill triggers from natural language: audit, security review, find vulnerabilities, safe to deploy, OWASP Top 10, CWE Top 25 and AI-generated code review.
git clone https://github.com/alboechat/appsec-vulnerability-auditor.git mv appsec-vulnerability-auditor ~/.codex/skills/
git clone https://github.com/alboechat/appsec-vulnerability-auditor.git mv appsec-vulnerability-auditor ~/.claude/skills/
"Run the AppSec Vulnerability Auditor on this file." "Audit this PR for vulnerabilities." "Security-review this before deploy." "Generate a remediation plan for critical findings."
No. It is a defensive source-code review skill. It can support audit prep and secure engineering, but it does not replace authorized penetration testing or compliance assessment.
No. Critical and High findings get verification guidance or safe reproduction sketches, but not weaponized PoCs.
Yes. It checks prompt injection, tool-use authorization, output-as-input loops, context exfiltration, SSRF in agent URL fetchers and secrets in context windows.
Yes. Seed findings can be ingested, tagged by provenance, validated and calibrated.
It combines AppSec references, language packs, AI-generated-code pitfalls, LLM-agent risks, manual data-flow reasoning, severity calibration and remediation planning.
Keep the speed of vibe coding. Add threat boundaries, calibrated findings, defensive patches and a plan your team can execute.