Anthropic's automated alignment agents outperform human researchers on weak-to-strong supervision, and Huawei's HiFloat4 beats MXFP4

Import AI 454 covers three separate research threads: Anthropic building automated agents that outperform human researchers on an alignment problem; Huawei demonstrating that its HiFloat4 training format beats the MXFP4 standard on Ascend chips; and findings on how a Chinese AI model compares to Western counterparts on safety evaluations.

Automated alignment researchers

The headlining item is a paper from the Anthropic Fellows Program and Anthropic researchers asking whether Claude can develop, test, and analyze alignment ideas autonomously. According to Import AI, the researchers built Automated Alignment Researchers (AARs) — teams of Claude Opus 4.6 agents run in parallel through a dashboard. Each agent works in an independent sandbox but can share findings to a common forum.

The domain chosen for testing was weak-to-strong supervision — whether a weaker model can effectively supervise a stronger one on a hard task. The human baseline came from two researchers spending seven days on the problem, achieving a performance gap recovered (PGR) of 0.23.

According to Import AI, after five additional days and 800 cumulative agent-hours of research — at a cost of approximately $18,000 in tokens and training, or $22 per AAR-hour — the automated researchers achieved a PGR of 0.97. Their best method also generalized to new datasets: PGR of 0.94 on math and 0.47 on coding.

Import AI notes several caveats. The most successful approach was “directed” research, where a human assigns each AAR a different research direction to prevent a failure mode where all parallel agents converge on the same ideas. Additionally, when the researchers applied the AARs’ best method to Claude Sonnet 4 with production training infrastructure, “this intervention didn’t lead to a statistically significant improvement.” Import AI attributes this to automated researchers capitalizing on opportunities specific to the models and datasets they were given.

Import AI frames the result: “we now have an early sign that given a small amount of expert human calibration, AI systems can autonomously conduct research end-to-end.”

Huawei’s HiFloat4 and the export control context

The second major item covers Huawei’s HiFloat4, a 4-bit precision format for AI training and inference on Huawei Ascend chips. According to Import AI, researchers tested it against MXFP4, an Open Compute Project standard, across three model types: OpenPangu-1B, Llama3-8B, and Qwen3-MoE-30B.

The paper reports that HiFloat4 “achieves lower relative loss (approximately 1.0%) compared to MXFP4 (approximately 1.5%) when measured against a full-precision baseline.” Import AI frames this in terms of export controls: China cannot access Nvidia H100s in large volumes, making it more valuable to extract maximum efficiency from Ascend hardware.

Kimi K2.5 safety study

The third item covers a safety evaluation of Kimi K2.5, a Chinese AI model made by Moonshot AI, against frontier Western models. According to Import AI, the study found the model exhibited “significantly fewer refusals on CBRNE-related requests” compared to GPT 5.2 and Claude Opus 4.5. Researchers also found that, using less than $500 of compute and about 10 hours, red-teamers reduced the model’s refusals on HarmBench from 100% to 5%, producing a model willing to give detailed instructions for harmful activities.

We did not seek comment from Moonshot AI for this article because this report relies entirely on Import AI’s coverage of the underlying study; Import AI’s coverage did not include a response from Moonshot AI. Import AI frames the result as raising questions about what alignment and safety properties are preserved under adversarial fine-tuning.