DeepSeek-V4 cuts KV cache to 2% of standard cost to make million-token agent context practical
DeepSeek-V4 combines two new attention mechanisms with agent-specific post-training to reduce KV cache memory to roughly 2% of a standard grouped-query-attention architecture, targeting long-horizon agentic workloads over chat.