MirrorCode benchmark shows AI reimplementing a 16,000-line codebase, and six attack types threaten agent security

Import AI 453 covers three items: MirrorCode, a benchmark from METR and Epoch showing AI autonomously reimplementing complex software; a Google DeepMind paper laying out six genres of attack against AI agents; and the Windfall Policy Atlas, a tool for navigating policy options around transformative AI.

MirrorCode: reverse-engineering complex software

AI measurement organizations METR and Epoch built MirrorCode to test whether AI systems can autonomously reimplement complex existing software from scratch. Each task gives the agent execute-only access to an original CLI program — along with visible test cases — but no source code. The agent must infer the program’s behavior entirely by running it.

According to Import AI, the benchmark spans more than 20 target programs covering Unix utilities, data serialization and query tools, bioinformatics, interpreters, static analysis, cryptography, and compression.

The headline result, as Import AI describes it: Claude Opus 4.6 successfully reimplemented gotree, a bioinformatics toolkit with approximately 16,000 lines of Go code covering more than 40 commands. The benchmark authors estimate that a human engineer working without AI assistance would need two to seventeen weeks to complete the same task. The research also found that performance scales with inference compute.

Import AI frames the skill being demonstrated: “imagine you gave a talented software programmer a CLI interface to a complicated program and asked them to write the underlying program without seeing its source code. I’d wager only a fraction of them could do it if the program was quite sophisticated.”

Import AI also notes caveats: the benchmark evaluates programs that produce canonical outputs, which naturally generate their own specification. There is also potential for memorization on simpler programs, and the slice of software covered is real but not exhaustive.

Six attack types for AI agents

The second major item is a Google DeepMind paper categorizing the attack surface for AI agents and proposing mitigations. According to Import AI, the framework identifies six genres of attack, each targeting a different part of the agent’s operation.

Content injection targets perception: embedding commands in CSS, HTML, or file metadata. Semantic manipulation targets reasoning: saturating content with authoritative language, or embedding malicious instructions inside educational or hypothetical frames. Cognitive state attacks target memory and learning: placing fabricated statements into retrieval corpora, or inserting data into memory stores that activates maliciously when retrieved.

Behavioral control attacks target action directly: embedding adversarial prompts in external resources, or convincing the agent to exfiltrate sensitive data. Systemic attacks target multi-agent dynamics: broadcasting signals to send agents off task, or performing what the paper calls “jigsaw attacks” — splitting a harmful command into pieces that separate agents later reassemble. Finally, human-in-the-loop attacks exploit cognitive biases to influence a human overseer.

The mitigation recommendations, as Import AI describes them, split between technical and ecosystem-level. Technical mitigations include hardening models through pre-training and post-training, and deploying layered inference-time defenses. Ecosystem interventions target changes to the digital infrastructure that agents operate inside.

Import AI’s framing: “AI agents are quite like toddlers — they’re powerful intelligences, but if you put them into the messiness of the world there are lots of ways they can go wrong, especially if strangers are actively trying to mislead or attack them.”

The Windfall Policy Atlas

The third item is the Windfall Policy Atlas, published by the Windfall Trust. According to Import AI, the atlas contains 48 distinct policy ideas in five categories — public and social investments, labor market adaptation, wealth capture, regulation and market design, and global coordination — presented in a navigable interface.