Most teams overuse subagents when skills are the better primitive. The architectural case for progressive context disclosure, automatic project scoping, and portable expertise across 30+ AI coding tools.
AI coding agents burn through hundreds of thousands of tokens grepping files and hallucinating APIs. A new class of context infrastructure tools is emerging to fix both problems — for your codebase and for external libraries.
A developer reimplemented SQLite in Rust with LLMs — 576,000 lines that compiled, passed tests, and ran 20,171x slower than the real thing. The bugs weren't syntactic. They were semantic. Here's why architecture, specs, test-driven contracts, and targeted review are the fix.
GPT-5.4's coding benchmarks barely moved. But computer use jumped from 47% to 75%, tool search cuts MCP token usage by 47%, and knowledge work hit 83% across 44 professions. Here's what actually matters for developers.
Four industry leaders independently converged on the same conclusion: engineering discipline is the competitive moat when building with AI agents. Here's the day-one infrastructure that makes agent-generated code reliable.
A technical deep dive into harness engineering — the converging discipline across OpenAI, Anthropic, and independent practitioners that makes coding agents reliable on complex work.
A technical deep dive into the isolated VM infrastructure that lets AI coding agents operate for hours without human intervention — from Cursor's cloud agents and Firecracker microVMs to snapshot bootstrapping, computer use, and secrets management.
The biggest constraint in multi-agent development isn't model capability. It's that nobody's built the orchestration, window management, and resource isolation layers end to end. A technical deep dive into what each tool does architecturally, where it breaks, and what the missing product looks like.
The discourse assumes juniors need protection from AI tools. They don't. They need trust, a disciplined workflow, and room to build capability on their own terms.
Karpathy just named the layer most engineers are missing: Claws. Here's the data behind it, and how to start building it today.
Google just reclaimed #1 on SWE-Bench Verified with Gemini 3.1 Pro. But Codex still leads terminal work, and Claude still leads real-world preference. Here's what's technically different about each model—and what engineers are actually experiencing.
Coding is practically solved. The engineer's job is shifting from writing code to designing systems, writing specs, and orchestrating agents. Here's what the new software development lifecycle looks like and how to adopt it today.
Sonnet 4.6 scores within 1.2 points of Opus 4.6 on SWE-bench at roughly 60% of the cost. We break down the benchmarks, architecture changes, pricing math, developer reactions, and what it means for your agentic workflows.
Google DeepMind's new paper formalizes delegation as more than task decomposition — it's a transfer of authority, accountability, and trust. Here's what that means for how we build coding agents, with concrete patterns you can apply today.
GLM-5 hit 77.8% on SWE-bench Verified under an MIT license. The benchmark gap between open and closed models is closing fast. Here's what that means for how you architect your coding agent infrastructure—and what to do about it.
OpenAI's Codex Spark trades intelligence for speed at 1,000+ tokens/sec on Cerebras hardware. The real story isn't the model—it's the infrastructure overhaul and the emerging split between speed mode and depth mode in coding agents.
OpenAI shipped a million lines of code with zero human-written code. The engineering patterns they discovered—progressive disclosure, layered architecture, feedback loops—are patterns you can adopt today. Here's a practical breakdown.
Cursor ran thousands of agents to build a browser. Anthropic ran 16 to build a C compiler. Both independently converged on the same five design patterns. Here's the technical breakdown of why, and how you can apply them.
Factory's Signals system auto-resolves 73% of agent issues in under 4 hours using LLM judges, friction telemetry, and a closed-loop pipeline. Here's how it works and how you can adopt similar patterns in your own agent infrastructure.
Four major AI releases dropped within 24 hours. Here's a technical deep dive into Opus 4.6, GPT-5.3 Codex, Claude Code's agent teams, and Copilot CLI's Fleet Mode—and how to start using them effectively.
I spent a week exploring OpenAI's new Codex macOS app. Here's what I learned about its orchestration-first approach, how it differs from the Claude workflow I've grown attached to, and whether it's worth adding to your toolkit.
A practical guide to wiring AI coding agents into your CI/CD pipeline with GitHub Actions. Includes working configurations for Copilot Autofix, OpenAI Codex, and Claude Code with proper guardrails.
How hooks, skills, and tool orchestration are transforming developer infrastructure. A deep dive into Claude Code's layered stack and why the most important code you write this year won't be features.
OpenAI built a data agent serving 3.5k users across 600 petabytes. The architectural patterns that made it work are the same ones that power a 3,000-line coding agent CLI.
A technical guide to implementing procedural memory, specialized sub-agents, and autonomous ralph loops for AI coding assistants cross platform.
Building on AI Coding Infrastructure, Atomic introduces a research-to-execution flywheel where specifications become lasting memory. Here's what we learned scaling multi-agent workflows.
Open sourcing my developer workflow with AI agents—skills, sub-agents, and autonomous execution. A 5-minute setup that provides the missing infrastructure layer for AI coding tools.
An overview of two frameworks for memory and context management to enable continous self-learning systems
An interactive cheat sheet covering context engineering techniques for LLMs including retrieval, processing, management, and dynamic assembly strategies.
How context engineering transforms AI-powered development tools from disappointing to transformative through smart prompting, MCP servers, and strategic tool integration.
A deep dive into the concepts of memorization, generalization, and reasoning in large language models.