METR and Epoch AI publish preliminary results on AI completing weeks-long coding tasks

METR and Epoch AI have jointly published preliminary results from MirrorCode, a project the two organisations co-developed and which METR funded. The post on METR’s site is a linkpost directing readers to Epoch AI’s blog for full detail; the source excerpt available for this article does not reproduce those findings.

The METR linkpost states that MirrorCode examines whether AI systems can already complete some tasks measured in weeks of human effort. METR describes its broader research focus as measuring how well frontier AI systems perform complex tasks autonomously, including “broad autonomous capabilities and the ability of AI systems to conduct AI R&D.”

The organisation’s published work includes a metric framing AI performance in terms of the length of tasks AI agents can complete, which METR states has been “consistently exponentially increasing.” Separately, METR has reported that when developers use AI tools, “they take 19% longer than without” — a finding the organisation notes means AI makes developers slower in those observed conditions.

Full detail on MirrorCode’s methodology and results is available at Epoch AI’s blog at epoch.ai/blog/mirrorcode-preliminary-results/.

Note: This article is based on a thin linkpost. The underlying Epoch AI report was not available in the source excerpt provided. Claims are limited to what appears in the METR linkpost.