Divide-and-conquer value learning reduces Bellman recursions logarithmically for long-horizon RL
A Berkeley researcher proposes a third RL paradigm beyond TD and Monte Carlo that exploits triangle inequality in goal-conditioned settings, avoiding the error accumulation that blocks Q-learning from scaling to long-horizon tasks.