Anthropic releases open-source framework for autonomous AI vulnerability discovery
Anthropic has published a reference harness to help security teams build custom pipelines for finding and fixing code vulnerabilities, though the company notes that autonomous triage remains an open technical challenge.
Anthropic has published an open-source reference implementation on GitHub, titled 'defending-code-reference-harness', designed to assist security teams in constructing custom pipelines for the autonomous discovery and remediation of code vulnerabilities. The framework utilises the Claude AI model to facilitate threat modelling, scanning, triage, and patching, drawing on learnings from partnerships with security teams during the launch of the 'Claude Mythos Preview'.
The repository provides a structured approach for organisations to move from interactive skills to autonomous operations. The recommended implementation follows a phased timeline, beginning with interactive threat modelling and static scanning on the first day, followed by an autonomous run on a known-vulnerable open-source library on the second day. Subsequent days involve customising the harness for specific targets, with the second week introducing an outer loop for continuous scanning and triage.
While the reference pipeline is optimised for identifying memory vulnerabilities in C and C++ code, its architecture is generic enough to be ported to other languages or vulnerability classes. The system includes mechanisms to deduplicate findings across multiple runs and recalibrate severity ratings against a custom threat model. It supports integration with various Claude API access points, including Bedrock, Vertex, or Azure.
Security protocols within the framework are strict, requiring the autonomous reference pipeline to execute within a gVisor sandbox unless explicitly overridden. In contrast, interactive skills are read- and write-only and can run unsandboxed if tool use is approved. The pipeline is designed to verify and deduplicate its own findings, with the triage component collapsing duplicates across runs and routing findings to component owners.
Anthropic explicitly states that the repository is not actively maintained and does not accept external contributions. The company notes that autonomous triage and patching remain open technical challenges, with verified patches not always being upstreamable. For organisations seeking a managed solution, Anthropic directs users to 'Claude Security', a hosted product that provides a multi-stage verification pipeline to reduce false positives and manage findings through their lifecycle.


