TreeTrace reads your AI coding sessions locally and turns the corrections into reusable regression tests, handoff memory, and an audit trail of every time an agent touched auth, secrets, or access control. Nothing leaves your machine.
AI coding agents misunderstand goals, make wrong assumptions, and repeat the same mistakes. The corrections a human makes to fix them are the highest-signal data you have, and today they vanish when the session closes.
AI agents edit auth, move secrets, and weaken access control without anyone reviewing the reasoning. TreeTrace flags those moments as they happen across a session, captures the correction, and turns it into a regression check the next agent has to pass.
A security_or_privacy_risk signal carries a confidence score, the evidence text, and the node where a human pushed back.
Run it in any repo after an AI coding session. It reads local transcripts, never the network.
Claude Code sessions are found automatically from your local project history. Plain transcripts and other tools import with a flag. Tool noise, retries, and "continue" nudges are filtered out.
A fork-aware tree is derived from prompt topology and your text: the root goal, direction changes, corrections, abandoned branches, checkpoints, and the accepted path, with failure signals and correction chains attached.
Structured artifacts are written locally for humans, agents, CI, and eval harnesses. Every export passes a redaction gate that fails closed if a secret is detected.
Human-readable reports plus an open machine schema. Below is genuine output from the bundled example.
Every repeated failure is paid for twice: once in engineering time, and again in the tokens and compute burned getting back to where you already were. Catching it once means fewer wasted runs and lower spend.
Reconstruct the session from local transcripts
Where it went wrong, and the fix that worked
The failure becomes a reusable regression check
The next agent run starts already knowing
Lineage is written as a documented, versioned JSON schema. Consumers ignore unknown fields, so adapters for promptfoo, OpenAI Evals-style harnesses, and dataset tools build on top without changing the local-first core.
No accounts, no uploads, no telemetry. A redaction gate scrubs secrets before anything is written and fails closed. Your transcripts never leave the machine.
Built for Claude Code today, with importers welcome for Codex CLI, Cursor, and chat exports. Eval cases are generic, so they run wherever your team already tests.
One command, in any repo. Nothing leaves your machine.