The agent lab playbook: start with frontier models, specialize, then train your own

Latent Space and the Unsupervised Learning podcast recorded a crossover episode just after AIE Europe — before the Cursor-xAI deal — covering what has changed in AI over the past year. The full episode and write-up run through the state of AI infrastructure, the coding market, startup strategy, and what swyx (Shawn Wang, of Latent Space) changed his mind about in the past year. Jacob Effron, investor at Redpoint Ventures, hosts Unsupervised Learning and co-hosts the episode.

The episode is framed as a year-later check-in following the first Unsupervised Learning x Latent Space crossover. The scope covers engineering practice, startup economics, and product dynamics — less benchmark analysis than strategic interpretation of what the AI engineering community is currently focused on.

The agent lab playbook

One structural argument in the episode is what the hosts describe as the “agent lab” playbook: companies start by building on frontier models, specialize for a particular domain, and then train their own models once they have accumulated enough data, workload, and user behavior to justify the cost and latency tradeoffs. This is presented as a description of what companies like Cursor and Cognition have done.

Search, domain specialization, and distillation are described as increasingly important mechanisms for reaching the point where users will choose an in-house model over frontier alternatives. The episode draws a distinction between vertical and horizontal AI startups. Vertical application companies can act as the “outsourced AI team for enterprises,” the hosts argue, owning the workflow and the last mile in ways that frontier labs pushing into multiple verticals cannot easily replicate.

Infrastructure instability vs. application durability

A recurring theme in the episode is that AI infrastructure companies have had to reinvent themselves every year, while application companies have had an easier time surviving model volatility. The proposed reason: applications built on top of models can switch the underlying model when a better one arrives, while infrastructure companies often built for the specific properties of particular model generations. This observation is attributed to Harrison Chase of LangChain in the source.

The concept of “skills” as a minimal viable packaging format for agents comes up as a potential stabilization point. Sandboxes are described as the clearest reinvention of classic cloud infrastructure for the AI era.

Swyx says he had been “kind of bearish on open models” but that has changed — he now thinks open model momentum is rising. This is one of the things he explicitly says he changed his mind on over the past year. The episode notes that non-NVIDIA hardware is getting increased attention, and argues that every 10x speedup in inference can unlock new product experiences.

The AI coding wars

Coding receives substantial attention as a category the hosts describe as having “gone parabolic.” Anthropic, OpenAI, Cursor, and Cognition are identified as the main players who have ridden that wave. Claude Code vs. Codex is discussed in terms of the stickiness of coding products: first magical product experiences may matter more than expected.

The episode frames 2025 as the year of coding agents and 2026 as “coding agents breaking containment,” in swyx’s phrasing. The next frontier beyond zero-human-written code, the hosts argue, is zero-human-review code — where models not only write code but ship it without human review, which would force companies to rethink testing and verification from first principles.

Memory as a bottleneck

Swyx argues that despite context windows improving substantially, million-token context “has not changed most real workflows” in the way people expected. The episode argues that memory — how AI systems accumulate and personalize based on prior interactions — may be the key bottleneck for the next generation of systems.

This connects to a broader point in the episode about what it means to sell to agents instead of humans. The hosts argue this may mostly mean better developer experience by another name — APIs and documentation matter more than ever.

SaaS pressure and valuations

The episode describes traditional SaaS as being under genuine pressure from AI-native alternatives. AI valuations are described as having broken prior startup intuitions about scale and durability, with billion-dollar ARR products built in a year. The episode does not try to resolve whether those valuations are sustainable; it describes the environment they create for founders and researchers.

The episode closes by referencing Fei-Fei Li’s framing: “That’s exactly the difference between LLMs who know everything but haven’t experienced anything.” Whether world models are the path toward something closer to lived understanding is left open.