— release notes

Changelog.

Notable changes per release. Reverted versions are kept here for archaeology.

v0.12.0

2026-06-14 current

Hypothesis layer — the missing epistemology

Four new PLAN_TOOLS verbs: propose_hypothesis(target, vuln_class, rationale), update_hypothesis(id, status, evidence), mark_dead_end(id, reason), self_critique(reflection). New Hypothesis SQLModel + Alembic migration track theories through PROPOSED → TESTING → CONFIRMED | REFUTED | DEAD_END. The Open hypotheses block lands in every observation summary so the agent sees its own working theory across turns. CONFIRMED auto-writes a Finding so chain detection fires immediately.
OOHA reasoning protocol

_BASE_INSTRUCTIONS rewritten from a checklist into an explicit five-step protocol — OBSERVE / ORIENT / HYPOTHESISE / ACT / CRITIQUE — followed by an adversarial-mindset block ("every primitive unlocks a chain", "WAFs lie — try double-encoding before concluding NOT_VULNERABLE", "negative results are evidence too") and a three-pivots-before-quit ladder. Every phase prompt opens with a one-line threat-model nudge. Reasoning prose written alongside tool calls now persists as a \'reasoning_trace\' audit event.
Parallel tool dispatch

Non-control tool calls in a single turn now run concurrently via asyncio.gather, capped by a runner-level semaphore (defaults to executor.max_parallel_tools, falls back to 4). Control verbs — advance_phase, change_phase, abort — stay serial so the orchestrator can short-circuit. One crashing dispatch never kills the rest of the batch; ABORT wins over ADVANCE wins over JUMP wins over CONTINUE.
Structured failure signals

FailureSignal dataclass + 8-kind classifier (TIMEOUT, RATE_LIMITED, PERMISSION_DENIED, AUTH_REQUIRED, OUT_OF_SCOPE, BINARY_MISSING, ZERO_RESULTS, EXCEPTION) replaces the v0.11 "failed Nx" string. Each kind carries an attacker-flavoured pivot hint: rate_limited → use aegis_verify in-process to dodge the throttle; auth_required → ingest the captured Burp/ZAP HAR with cookies; binary_missing → aegis env install --missing. STUCK SIGNALS block renders kind + count + pivot, most-recent-first, capped at 5.
Proactive chain detection + partial-chain primers

FindingsDB.add_finding post-write hook fires detect_chains on every write; first fire emits a chain_fired audit event. New partial_matches() returns chains where one required_any group is satisfied but at least one is missing — the agent sees "PARTIAL CHAINS (one probe away)" in every observation summary with up to 3 near-misses. Also fixed a silent v0.11 regression where FindingsDB.to_dict_finding dropped the tags field, forcing chain detection onto the substring fallback.
Lateral creativity table

src/aegis/reasoning/lateral.py adds the post-confirmation question every brilliant bug hunter asks: "given primitive X, what NEW classes does it unlock?" 10 hand-curated rows. SSRF → IMDS → IAM → S3 → credential pivot. SQLi → file read via INTO OUTFILE → log poisoning → RCE. .git exposure → source review → JWT secret extraction → forge any token. Every confirmed primitive auto-proposes one hypothesis per unlocked class (proposed_by lateral:<source_tag>).
Real probes — baseline diff, OOB polling, WAF mutations

verify/baseline.py kills the catch-all-200 false positive with differential request/response diffs (status, body hash, content-length above a 200-byte threshold, redirect Location). verify/oob_waiter.confirm_oob polls the OOB session until a callback matches the nonce or the deadline hits — v0.11 sent the payload and returned immediately. verify/mutations.py per-probe-type WAF-evasion table: sqli (double-URL-encode, backtick swap, /**/ comment inject), xss (fullwidth unicode), lfi (null-byte + .png/.jpg), plus ssrf / ssti / cmdi / header_injection.
Three new playbooks + chain rules

Closing the audit\'s highest-ROI gaps: host_header_injection (Host Header Injection → Cache Poisoning, critical), web_cache_deception (Web Cache Deception → Auth Token Leak, high), http_parameter_pollution (HTTP Parameter Pollution → Auth Bypass, high). Each ships with severity floor, preconditions, concrete probes, confirm_chain reference, and follow-up tags. Chain rule count 22 → 25.
Test suite 387 → 494

Coverage expanded across every workstream — hypothesis store + auto-bridge, reasoning verbs dispatcher, OOHA prompt invariants, threat-model coverage, reasoning_trace persistence, failure classifier patterns, pivot hints, parallel dispatch timing + semaphore cap + short-circuit + exception isolation, partial chains + newly_fired, lateral table coverage + every-pivot-references-a-known-tag invariant, post-write hooks, baseline diff (with the catch-all-200 regression test pinned), OOB waiter timeout/regex/exception-swallow, mutation table, three new playbooks. 494/494 green.

v0.11.0

2026-06-14

Per-tool MD playbooks

165 markdown playbooks ship inside the wheel under aegis/_docs/tools/. The agent fetches one on demand via aegis_tool_help(name) — every doc lists real flags, common pitfalls, and concrete invocations. aegis_tool_list_help(category) returns the one-line index. Average per-turn prompt stays small; the full playbook only loads when the agent actually needs it.
Guided hunting — playbooks + auto-escalation

New vuln_playbooks.py defines 24 entries keyed to chain TAG_* constants. Each playbook carries severity floor, preconditions, concrete probes, follow-up tags, and the chain rule it ultimately feeds. adaptive.escalate() now runs after every phase tick and injects the top three suggested probes into the next observation summary. Recommendations are audit-logged so post-engagement stats can measure how often the model takes them.
Auto-tagging at finding write

FindingsDB.add_finding calls a new tag_inference helper at the persistence boundary. Findings get structured tags from a tool-base map plus a 31-rule substring matcher over the title + evidence text — jwt_alg_confusion vs jwt_weak_secret, stored vs reflected XSS, IMDSv1 in the payload tags cloud_metadata_v1. Chain detection stops falling back to substring matching once enough findings carry inferred tags.
Token diet — ~30% per-turn reduction

_BASE_INSTRUCTIONS trimmed from 3690 → 1592 chars (57% smaller). EngagementState.to_context_dict compresses open_ports + tech_stack to {hosts, top:{host:{count, ports}}}. NANO summariser threshold lowered 25 → 15 and extended to nmap + httpx. gau_fetch caps URLs 2000 → 200. New build_plan_tools(phase) filters the run_tool enum from ~102 catalog entries to that phase's available_tools set, saving 25–28% off the tools schema each turn.
Modern JS attack surface

New tools_js_modern.py: nextjs_data_probe walks /_next/data/<buildId>/<page>.json and flags pages whose getServerSideProps response leaks admin/user fields without auth. astro_endpoint_probe extracts /api/* paths from rendered HTML chunks. source_map_extract grep .js.map sourcesContent for AWS keys, GitHub PATs, Slack tokens, private keys, Stripe keys. spa_route_discover parses webpack/vite/parcel chunks for React-router / Vue / Next.js route tables.
Synthesised nuclei templates

tools_nuclei_synth.nuclei_synth_from_probe turns a confirmed aegis_verify probe into a minimal nuclei v3 YAML in <engagement_dir>/templates/synth/, runs nuclei -t against the broader target set, persists matches as finding candidates. The template is a reusable artefact — even after the engagement, the operator keeps a detector for that specific bug shape. nuclei_synth_list inventories them.
Burp/ZAP bridge — aegis ingest har --replay

The HAR ingest already parsed Burp/ZAP exports into endpoints; --replay re-issues every captured request fresh through httpx, drops Host + Content-Length so the layer below sets them, diffs status / body hash / content length against the recorded response. Endpoints whose status changes (401 → 200 without the Burp session cookie) are the high-signal ones for unauth probes. Replay deltas persist as ENDPOINT observations.
Mobile + API extras

New tools_mobile_api.py: mobsf_static_scan POSTs an APK/IPA to a local MobSF instance (MOBSF_URL + MOBSF_API_KEY env), returns severity counts and a report URL. dredd_run runs an OpenAPI / API Blueprint contract fuzz against a live API. postman_runner wraps newman for Postman collection replay. Catalog rows added for mobsf (pipx), dredd (npm), newman (npm).
162 → 173 registered MCP tools

11 new wrappers across the four buckets above, plus the two doc meta-tools. TOOL_CATALOG grew 102 → 105 (mobsf, dredd, newman). Existing v0.10.x engagements upgrade in place — auto-tagging starts on the next add_finding, chain detection still falls back to substring matching for findings written before the upgrade.
281 → 387 tests

Coverage expanded across every workstream — context compression, per-phase tool filter, tool-docs index + meta-tool, vuln playbooks, tag inference at the finding boundary, adaptive escalation, the four new scanner modules, HAR replay deltas. 387/387 green.

v0.10.0

2026-06-14

API ingest — OpenAPI, Postman, HAR

aegis ingest openapi|postman|har parses spec or trace files and writes Endpoint observations into the engagement DB. paths.txt and params.txt drop under artifacts/ingest/ for kiterunner / arjun / ffuf seeding. HAR import dedups by (method, normalised URL, sorted param names) so 50 hits on the same form don't pollute the DB. Out-of-scope hosts in the spec are flagged but not persisted.
GraphQL schema walk + rate-limit probe

graphql_audit now harvests Query type fields alongside mutations, probes each one unauthenticated, and flags sensitive names (user, admin, token, secret, role…) as high-severity findings. A 30-request burst against __typename detects missing rate limits at the GraphQL layer — the most common gap in modern stacks that enforce HTTP rate limits but never wire one at the operation layer.
Tag-based attack chain library, 22 rules

Detection moved off substring matching onto a tag predicate model. Findings carry a tags list; chain rules match on tag sets. Library expanded from 5 → 22: mass assignment + admin, SSRF + IMDSv1, open bucket + Lambda, request smuggling + auth header, cache poisoning + auth cookie, prototype pollution + RCE sink, JWT alg-confusion + privileged, exposed .git, open registration + IDOR, race condition + auth bypass, plus five AD chains. Substring fallback still fires for pre-0.9.5 DBs.
Active Directory + Cloud + k8s

19 catalog rows added (azurehound, roadtools, scoutsuite, gcp-iam-collector, gcp-scanner, pmapper, cloudsplaining, iamspy, impacket, kerbrute, bloodhound-python, ldapdomaindump, crackmapexec, certipy, kubectl-who-can, kube-bench, kube-hunter, kdigger). Nine new MCP tools in tools_cloud_ad.py: kerbrute_userenum, impacket_get_userspns, impacket_get_npusers, bloodhound_collect, certipy_find, scoutsuite_scan, cloudsplaining_scan, kube_bench_run, kube_hunter_scan. kerbrute + crackmapexec ship with destructive=True. 162 registered tools total.
AD attack chains

Kerberoasting → service account compromise (high). AS-REP roasting → pre-auth-disabled user (high). DCSync → full domain compromise (critical). AdminSDHolder ACL backdoor (critical). Unconstrained delegation → silver/golden ticket (critical). 11 new ad_* tag constants drive the predicate matching.
Resume-from-phase

EngagementState persists to <engagement_dir>/state.json after every phase advance. Re-running aegis run against the same directory resumes at the saved phase with the full prior state primed — tech stack, open ports, subdomains, endpoints, token spend, finalize_mode. Atomic write via tmp + os.replace: a crash mid-write leaves the previous snapshot intact.
SARIF, HackerOne, Bugcrowd exports

--format sarif emits SARIF 2.1.0 for GitHub Code Scanning and GitLab Security. One rule per category, severity → result.level, partialFingerprints[evidence/v1] so re-runs dedupe across scans. --format h1 writes one HackerOne JSON per finding under reports/h1/ with markdown body sections (Summary / Affected Asset / Steps to Reproduce / Suggested Fix / References). --format bugcrowd produces a VRT-mapped JSON bundle with P1–P5 priority.
PII redaction

aegis.report.redact sweeps reports for emails, JWTs, AWS keys, GitHub PATs, Slack tokens, US SSNs, credit-card-shaped numbers, and phone numbers. IP redaction opt-in. Optional NANO-tier LLM second pass for contextual PII regex misses. Audit log records counts only — never content. Scope metadata (engagement_id, client) intentionally preserved.
Multi-platform finding webhooks

aegis.report.webhook supports Slack (Block Kit), Discord (severity-coloured embeds), Linear (issue title/description/priority/labels), and a stable JSON schema for n8n / Zapier. Severity threshold gate skips low-priority noise. Network errors swallowed — a flaky channel never crashes aegis watch. New --webhook-format and --webhook-min-severity flags.
aegis stats

New subcommand walks every aegis.db under a root and reports findings, tokens, USD spent, tool failure rate, and per-phase cost. --json for machine output. Default layout discovers both flat <root>/aegis.db and nested <root>/<id>/aegis.db.
SQLite-backed shell history

aegis shells --all queries a cross-engagement SQLite history at ~/.config/aegis/shells.db with optional --tool and --engagement filters. JSON snapshot of the last scan still ships unchanged. SQLite write errors swallowed so a corrupted history can't crash a scan.
Ctrl-P command palette

Modal fuzzy launcher in the TUI over slash commands and the most-used MCP tool names. Up/Down/Enter/Esc. Slash commands needing an arg pre-fill the input bar; MCP tool names hand off as guidance for the agent.
Test suite 74 → 281

Resume serialisation round-trip + atomic-write guarantees. SARIF/H1/Bugcrowd shape contracts. Webhook platform formatters + threshold + network failure modes. SQLite shell history schema + filters + resilience. AD chain rules + Cloud/AD wrapper degradation + parser correctness. 281/281 green.

v0.9.5

2026-06-13

Typed orchestrator control flow

New PhaseControl enum replaces the magic strings ("ok", "done", "advance", "jump", "abort") that signalled phase transitions in EngagementRunner. Every callsite migrated; backend errors that were silently swallowed now propagate explicitly. crashed:True is stamped on the summary when an unhandled exception escapes the phase loop.
Destructive-tool gating

sqlmap, commix, wpscan, nosqlmap, sstimap, smuggler now require operator approval before they run unless scope.yaml sets destructive_tests:true. New CONFIRM_REQUIRED event flows to the TUI; operator types /allow <id> or /deny <id>. Default-deny on a 120s timeout. confirm_before list lets engagements gate extra tools per-client.
Alembic migrations

Per-engagement SQLite DBs are now schema-managed. New baseline + add_finding_tags migrations. FindingsDB._ensure_schema handles three startup paths cleanly: brand-new, pre-alembic legacy, and already-managed. Adds Finding.tags column so the upcoming attack-chain rewrite can predicate on structured tags instead of substring matches.
CISA KEV + FIRST EPSS feeds

aegis kb sync --kev pulls the CISA Known Exploited Vulnerabilities catalogue and flags every match in the KB. aegis kb sync --epss attaches the daily Exploit Prediction Scoring System probability + percentile. Reports can now badge CVE findings with [KEV] and EPSS 0.97 so operators triage in seconds.
MCP server _ctx extraction

Shared lazy state accessors, hallucination guard, NANO summariser, rate limiter, and phase-instruction renderer moved out of the 8552-line server.py into _ctx.py (440 lines). Capability-module split (tools_recon, tools_vuln, etc.) prepared but deferred; snapshot regression test locks the 153-tool registration surface against future loss.
Hot-path test coverage

34 new tests across orchestrator/loop.py, llm/factory.py, mcp_server/server.py, and the new gating + KEV/EPSS modules. Total suite up from 74 to 123. Caught a NameError waiting to happen — Path was used but never imported in factory.py.

v0.9.4

2026-06-12

React/Ink TUI

Full rewrite of the scan UI in TypeScript + React + Ink. Tool cards with streaming output, agent reasoning pane, slash commands. Owns the terminal cleanly — no Rich.Live / Textual contention.
Ctrl+O outputs toggle

Tool cards default to a tight one-line summary. Press Ctrl+O (or Tab) to expand every card with framed output. Press again to collapse.
MCP-mode tool cards

Tools run by Claude Code through MCP now render as proper cards. Synthesised from LLM_DECISION + TOOL_DONE events, with tool result lines surfaced.
/stop, /quit, Ctrl+C actually work

/quit and Ctrl+C exit immediately. /stop injects a hard wrap-up note. /focus, /hunt, /pause, /resume all wire through.
Cleaner agent pane

Three distinct sections: shimmer status line, latest agent note in italic, history of prior notes. No more duplicated LLM_THINKING heartbeats.
Shimmer text effect

Status verbs (Thinking…, Running…, Hunting…) get a 3-character bright wave sliding across them. Color matches verb semantics.

v0.9.3

2026-06-11 reverted

Textual TUI redesign

Activity feed + thinking pane in Textual. Tokenized commands, distinct finding glyphs. Reverted in favor of React/Ink.

v0.9.2

2026-06-11 reverted

Textual TUI baseline

First Textual scan UI with shells panel and hunt mode. Hit unfixable terminal contention issues; replaced.

v0.9.1

2026-06-11

Docker relay mode

aegis env use docker relays all tool calls through the container so users without local tools can still run engagements.
Shells panel

Live view of every running tool execution with status, runtime, and output preview.
Claude Code in Docker

Built-in claude CLI in the Docker image so the React/Ink TUI inside the container can also use Claude Code as its backend.
Model updates

Haiku 4.5, Sonnet 4.6, Opus 4.7 — the freshest releases.

v0.9.0

2026-06-10

Docker image

Self-contained image with all 80+ tools preinstalled. aegis docker build / run / shell.
Universal install script

aegis env install --missing walks the catalog and installs everything via pacman, yay, go install, pipx, or npm as appropriate.
aegis docker CLI

Subcommands to build, pull, run, and inspect the Docker image.

v0.3.0

2026-05

MCP-native orchestrator

36 MCP tools across 7 PTES phases. Claude calls them via real MCP protocol — no JSON-in-prose parsing.
33 verification probes

Timing oracle, CSRF, SSTI, prototype pollution, race condition, header injection, OAuth flow, and more.
Adaptive escalation

Findings in one phase queue follow-ups in the next. CVE in nuclei → nuclei-fuzz. Leaked secret → trufflehog deeper.
Mission memory system

Cross-phase memory that survives session restarts. graphify_analyze, aegis_map.

Full commit history at github.com/glorybnat/aegis-pentest.