Alex Lavaee's Blog

Alex Lavaee's BlogPosts about machine learning, AI, and all things tech.https://alexlavaee.me/en-usInside the Cloud VMs Powering Autonomous Coding Agentshttps://alexlavaee.me/blog/cloud-vms-autonomous-agent-infrastructure/https://alexlavaee.me/blog/cloud-vms-autonomous-agent-infrastructure/A technical deep dive into the isolated VM infrastructure that lets AI coding agents operate for hours without human intervention — from Cursor's cloud agents and Firecracker microVMs to snapshot bootstrapping, computer use, and secrets management.Thu, 26 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.UELI87o6.png" alt="Inside the Cloud VMs Powering Autonomous Coding Agents" /><p>A technical deep dive into the isolated VM infrastructure that lets AI coding agents operate for hours without human intervention — from Cursor's cloud agents and Firecracker microVMs to snapshot bootstrapping, computer use, and secrets management.</p><p><a href="https://alexlavaee.me/blog/cloud-vms-autonomous-agent-infrastructure/">Read more on the blog →</a></p>Designing the Multi-Agent Development Environmenthttps://alexlavaee.me/blog/parallel-agent-sessions-infrastructure-gap/https://alexlavaee.me/blog/parallel-agent-sessions-infrastructure-gap/The biggest constraint in multi-agent development isn't model capability. It's that nobody's built the orchestration, window management, and resource isolation layers end to end. A technical deep dive into what each tool does architecturally, where it breaks, and what the missing product looks like.Wed, 25 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.BMVlpRLK.png" alt="Designing the Multi-Agent Development Environment" /><p>The biggest constraint in multi-agent development isn't model capability. It's that nobody's built the orchestration, window management, and resource isolation layers end to end. A technical deep dive into what each tool does architecturally, where it breaks, and what the missing product looks like.</p><p><a href="https://alexlavaee.me/blog/parallel-agent-sessions-infrastructure-gap/">Read more on the blog →</a></p>Junior Engineers Don't Need Protection from AI. They Need Agency.https://alexlavaee.me/blog/junior-engineers-agency-not-protection/https://alexlavaee.me/blog/junior-engineers-agency-not-protection/The discourse assumes juniors need protection from AI tools. They don't. They need trust, a disciplined workflow, and room to build capability on their own terms.Tue, 24 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.-8U1hdu1.png" alt="Junior Engineers Don't Need Protection from AI. They Need Agency." /><p>The discourse assumes juniors need protection from AI tools. They don't. They need trust, a disciplined workflow, and room to build capability on their own terms.</p><p><a href="https://alexlavaee.me/blog/junior-engineers-agency-not-protection/">Read more on the blog →</a></p>If Your Claws Aren't Out, You're Already Falling Behindhttps://alexlavaee.me/blog/claws-layer-autonomous-agents/https://alexlavaee.me/blog/claws-layer-autonomous-agents/Karpathy just named the layer most engineers are missing: Claws. Here's the data behind it, and how to start building it today.Mon, 23 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.s-eW-AS1.png" alt="If Your Claws Aren't Out, You're Already Falling Behind" /><p>Karpathy just named the layer most engineers are missing: Claws. Here's the data behind it, and how to start building it today.</p><p><a href="https://alexlavaee.me/blog/claws-layer-autonomous-agents/">Read more on the blog →</a></p>Gemini 3.1 Pro, Opus 4.6, and Codex 5.3: A Technical Breakdown of Three Models, Three #1 Positionshttps://alexlavaee.me/blog/gemini-3-1-pro-opus-codex-technical-comparison/https://alexlavaee.me/blog/gemini-3-1-pro-opus-codex-technical-comparison/Google just reclaimed #1 on SWE-Bench Verified with Gemini 3.1 Pro. But Codex still leads terminal work, and Claude still leads real-world preference. Here's what's technically different about each model—and what engineers are actually experiencing.Thu, 19 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.Dn_j63h5.png" alt="Gemini 3.1 Pro, Opus 4.6, and Codex 5.3: A Technical Breakdown of Three Models, Three #1 Positions" /><p>Google just reclaimed #1 on SWE-Bench Verified with Gemini 3.1 Pro. But Codex still leads terminal work, and Claude still leads real-world preference. Here's what's technically different about each model—and what engineers are actually experiencing.</p><p><a href="https://alexlavaee.me/blog/gemini-3-1-pro-opus-codex-technical-comparison/">Read more on the blog →</a></p>The New SDLC: A Practical Guide to Agentic Engineeringhttps://alexlavaee.me/blog/new-sdlc-agentic-engineering/https://alexlavaee.me/blog/new-sdlc-agentic-engineering/Coding is practically solved. The engineer's job is shifting from writing code to designing systems, writing specs, and orchestrating agents. Here's what the new software development lifecycle looks like and how to adopt it today.Wed, 18 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.CZzdjfSW.png" alt="The New SDLC: A Practical Guide to Agentic Engineering" /><p>Coding is practically solved. The engineer's job is shifting from writing code to designing systems, writing specs, and orchestrating agents. Here's what the new software development lifecycle looks like and how to adopt it today.</p><p><a href="https://alexlavaee.me/blog/new-sdlc-agentic-engineering/">Read more on the blog →</a></p>Claude Sonnet 4.6: What Developers Actually Need to Knowhttps://alexlavaee.me/blog/sonnet-4-6-technical-breakdown/https://alexlavaee.me/blog/sonnet-4-6-technical-breakdown/Sonnet 4.6 scores within 1.2 points of Opus 4.6 on SWE-bench at roughly 60% of the cost. We break down the benchmarks, architecture changes, pricing math, developer reactions, and what it means for your agentic workflows.Tue, 17 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.DhmtH1cM.png" alt="Claude Sonnet 4.6: What Developers Actually Need to Know" /><p>Sonnet 4.6 scores within 1.2 points of Opus 4.6 on SWE-bench at roughly 60% of the cost. We break down the benchmarks, architecture changes, pricing math, developer reactions, and what it means for your agentic workflows.</p><p><a href="https://alexlavaee.me/blog/sonnet-4-6-technical-breakdown/">Read more on the blog →</a></p>Google DeepMind's Delegation Framework for Coding Agent Architecturehttps://alexlavaee.me/blog/intelligent-agent-delegation/https://alexlavaee.me/blog/intelligent-agent-delegation/Google DeepMind's new paper formalizes delegation as more than task decomposition — it's a transfer of authority, accountability, and trust. Here's what that means for how we build coding agents, with concrete patterns you can apply today.Mon, 16 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.B-D3swmq.png" alt="Google DeepMind's Delegation Framework for Coding Agent Architecture" /><p>Google DeepMind's new paper formalizes delegation as more than task decomposition — it's a transfer of authority, accountability, and trust. Here's what that means for how we build coding agents, with concrete patterns you can apply today.</p><p><a href="https://alexlavaee.me/blog/intelligent-agent-delegation/">Read more on the blog →</a></p>GLM-5 and the Open Model Convergencehttps://alexlavaee.me/blog/glm5-open-model-convergence/https://alexlavaee.me/blog/glm5-open-model-convergence/GLM-5 hit 77.8% on SWE-bench Verified under an MIT license. The benchmark gap between open and closed models is closing fast. Here's what that means for how you architect your coding agent infrastructure—and what to do about it.Thu, 12 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.DAtELm9U.png" alt="GLM-5 and the Open Model Convergence" /><p>GLM-5 hit 77.8% on SWE-bench Verified under an MIT license. The benchmark gap between open and closed models is closing fast. Here's what that means for how you architect your coding agent infrastructure—and what to do about it.</p><p><a href="https://alexlavaee.me/blog/glm5-open-model-convergence/">Read more on the blog →</a></p>Codex Spark and the Two-Mode Future of Coding Agentshttps://alexlavaee.me/blog/codex-spark-speed-depth-modes/https://alexlavaee.me/blog/codex-spark-speed-depth-modes/OpenAI's Codex Spark trades intelligence for speed at 1,000+ tokens/sec on Cerebras hardware. The real story isn't the model—it's the infrastructure overhaul and the emerging split between speed mode and depth mode in coding agents.Thu, 12 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.y5zn6w6M.png" alt="Codex Spark and the Two-Mode Future of Coding Agents" /><p>OpenAI's Codex Spark trades intelligence for speed at 1,000+ tokens/sec on Cerebras hardware. The real story isn't the model—it's the infrastructure overhaul and the emerging split between speed mode and depth mode in coding agents.</p><p><a href="https://alexlavaee.me/blog/codex-spark-speed-depth-modes/">Read more on the blog →</a></p>OpenAI's Agent-First Codebase Learningshttps://alexlavaee.me/blog/openai-agent-first-codebase-learnings/https://alexlavaee.me/blog/openai-agent-first-codebase-learnings/OpenAI shipped a million lines of code with zero human-written code. The engineering patterns they discovered—progressive disclosure, layered architecture, feedback loops—are patterns you can adopt today. Here's a practical breakdown.Wed, 11 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.I9-Syhqq.png" alt="OpenAI's Agent-First Codebase Learnings" /><p>OpenAI shipped a million lines of code with zero human-written code. The engineering patterns they discovered—progressive disclosure, layered architecture, feedback loops—are patterns you can adopt today. Here's a practical breakdown.</p><p><a href="https://alexlavaee.me/blog/openai-agent-first-codebase-learnings/">Read more on the blog →</a></p>Five Architectural Primitives Every Agent Swarm Rediscovershttps://alexlavaee.me/blog/five-primitives-agent-swarms/https://alexlavaee.me/blog/five-primitives-agent-swarms/Cursor ran thousands of agents to build a browser. Anthropic ran 16 to build a C compiler. Both independently converged on the same five design patterns. Here's the technical breakdown of why, and how you can apply them.Tue, 10 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.B0RtEU-w.png" alt="Five Architectural Primitives Every Agent Swarm Rediscovers" /><p>Cursor ran thousands of agents to build a browser. Anthropic ran 16 to build a C compiler. Both independently converged on the same five design patterns. Here's the technical breakdown of why, and how you can apply them.</p><p><a href="https://alexlavaee.me/blog/five-primitives-agent-swarms/">Read more on the blog →</a></p>Building Self-Improving Coding Agents: How Factory's Signals Pipeline Closes the Feedback Loophttps://alexlavaee.me/blog/building-self-improving-coding-agents/https://alexlavaee.me/blog/building-self-improving-coding-agents/Factory's Signals system auto-resolves 73% of agent issues in under 4 hours using LLM judges, friction telemetry, and a closed-loop pipeline. Here's how it works and how you can adopt similar patterns in your own agent infrastructure.Mon, 09 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.B8OVgyP7.png" alt="Building Self-Improving Coding Agents: How Factory's Signals Pipeline Closes the Feedback Loop" /><p>Factory's Signals system auto-resolves 73% of agent issues in under 4 hours using LLM judges, friction telemetry, and a closed-loop pipeline. Here's how it works and how you can adopt similar patterns in your own agent infrastructure.</p><p><a href="https://alexlavaee.me/blog/building-self-improving-coding-agents/">Read more on the blog →</a></p>Opus 4.6, GPT-5.3 Codex, Agent Teams, and Fleet Mode: What Developers Actually Need to Knowhttps://alexlavaee.me/blog/opus-codex-agent-teams-deep-dive/https://alexlavaee.me/blog/opus-codex-agent-teams-deep-dive/Four major AI releases dropped within 24 hours. Here's a technical deep dive into Opus 4.6, GPT-5.3 Codex, Claude Code's agent teams, and Copilot CLI's Fleet Mode—and how to start using them effectively.Thu, 05 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.qWk_MX4m.png" alt="Opus 4.6, GPT-5.3 Codex, Agent Teams, and Fleet Mode: What Developers Actually Need to Know" /><p>Four major AI releases dropped within 24 hours. Here's a technical deep dive into Opus 4.6, GPT-5.3 Codex, Claude Code's agent teams, and Copilot CLI's Fleet Mode—and how to start using them effectively.</p><p><a href="https://alexlavaee.me/blog/opus-codex-agent-teams-deep-dive/">Read more on the blog →</a></p>Codex macOS: Orchestration-First Agent Desktophttps://alexlavaee.me/blog/codex-macos-orchestration-desktop/https://alexlavaee.me/blog/codex-macos-orchestration-desktop/I spent a week exploring OpenAI's new Codex macOS app. Here's what I learned about its orchestration-first approach, how it differs from the Claude workflow I've grown attached to, and whether it's worth adding to your toolkit.Wed, 04 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.CdMl0kQx.png" alt="Codex macOS: Orchestration-First Agent Desktop" /><p>I spent a week exploring OpenAI's new Codex macOS app. Here's what I learned about its orchestration-first approach, how it differs from the Claude workflow I've grown attached to, and whether it's worth adding to your toolkit.</p><p><a href="https://alexlavaee.me/blog/codex-macos-orchestration-desktop/">Read more on the blog →</a></p>Agent-Operated CI/CD: The Architecture Making AI Coding Agents Actually Workhttps://alexlavaee.me/blog/agent-operated-cicd-pipelines/https://alexlavaee.me/blog/agent-operated-cicd-pipelines/A practical guide to wiring AI coding agents into your CI/CD pipeline with GitHub Actions. Includes working configurations for Copilot Autofix, OpenAI Codex, and Claude Code with proper guardrails.Tue, 03 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.qLv2GmTh.png" alt="Agent-Operated CI/CD: The Architecture Making AI Coding Agents Actually Work" /><p>A practical guide to wiring AI coding agents into your CI/CD pipeline with GitHub Actions. Includes working configurations for Copilot Autofix, OpenAI Codex, and Claude Code with proper guardrails.</p><p><a href="https://alexlavaee.me/blog/agent-operated-cicd-pipelines/">Read more on the blog →</a></p>Evolving Coding Agent Infrastructure: The Rise of the Meta-Framework Layerhttps://alexlavaee.me/blog/evolving-coding-agent-infrastructure/https://alexlavaee.me/blog/evolving-coding-agent-infrastructure/How hooks, skills, and tool orchestration are transforming developer infrastructure. A deep dive into Claude Code's layered stack and why the most important code you write this year won't be features.Mon, 02 Feb 2026 00:00:00 GMT<img src="https://alexlavaee.me/_astro/cover.CEBBmPQ3.png" alt="Evolving Coding Agent Infrastructure: The Rise of the Meta-Framework Layer" /><p>How hooks, skills, and tool orchestration are transforming developer infrastructure. A deep dive into Claude Code's layered stack and why the most important code you write this year won't be features.</p><p><a href="https://alexlavaee.me/blog/evolving-coding-agent-infrastructure/">Read more on the blog →</a></p>Building AI Agents That Work at Any Scalehttps://alexlavaee.me/blog/openai-data-agent-patterns/https://alexlavaee.me/blog/openai-data-agent-patterns/OpenAI built a data agent serving 3.5k users across 600 petabytes. The architectural patterns that made it work are the same ones that power a 3,000-line coding agent CLI.Thu, 29 Jan 2026 00:00:00 GMT<p>OpenAI built a data agent serving 3.5k users across 600 petabytes. The architectural patterns that made it work are the same ones that power a 3,000-line coding agent CLI.</p><p><a href="https://alexlavaee.me/blog/openai-data-agent-patterns/">Read more on the blog →</a></p>Atomic: Building Reliable AI Coding Agent Infrastructurehttps://alexlavaee.me/blog/building-reliable-ai-coding-agent-infrastructure/https://alexlavaee.me/blog/building-reliable-ai-coding-agent-infrastructure/A technical guide to implementing procedural memory, specialized sub-agents, and autonomous ralph loops for AI coding assistants cross platform.Wed, 28 Jan 2026 00:00:00 GMT<p>A technical guide to implementing procedural memory, specialized sub-agents, and autonomous ralph loops for AI coding assistants cross platform.</p><p><a href="https://alexlavaee.me/blog/building-reliable-ai-coding-agent-infrastructure/">Read more on the blog →</a></p>Atomic: Automated Procedures and Memory for AI Coding Agentshttps://alexlavaee.me/blog/atomic-workflow/https://alexlavaee.me/blog/atomic-workflow/Building on AI Coding Infrastructure, Atomic introduces a research-to-execution flywheel where specifications become lasting memory. Here's what we learned scaling multi-agent workflows.Mon, 08 Dec 2025 00:00:00 GMT<p>Building on AI Coding Infrastructure, Atomic introduces a research-to-execution flywheel where specifications become lasting memory. Here's what we learned scaling multi-agent workflows.</p><p><a href="https://alexlavaee.me/blog/atomic-workflow/">Read more on the blog →</a></p>How I Shipped 100k LOC in 2 Weeks with Coding Agentshttps://alexlavaee.me/blog/ai-coding-infrastructure/https://alexlavaee.me/blog/ai-coding-infrastructure/Open sourcing my developer workflow with AI agents—skills, sub-agents, and autonomous execution. A 5-minute setup that provides the missing infrastructure layer for AI coding tools.Wed, 12 Nov 2025 00:00:00 GMT<p>Open sourcing my developer workflow with AI agents—skills, sub-agents, and autonomous execution. A 5-minute setup that provides the missing infrastructure layer for AI coding tools.</p><p><a href="https://alexlavaee.me/blog/ai-coding-infrastructure/">Read more on the blog →</a></p>Continuous Self-Learning in AI Agentshttps://alexlavaee.me/blog/self-evolving-llm-agents/https://alexlavaee.me/blog/self-evolving-llm-agents/An overview of two frameworks for memory and context management to enable continous self-learning systemsMon, 10 Nov 2025 00:00:00 GMT<p>An overview of two frameworks for memory and context management to enable continous self-learning systems</p><p><a href="https://alexlavaee.me/blog/self-evolving-llm-agents/">Read more on the blog →</a></p>Context Engineering Navigatorhttps://alexlavaee.me/blog/context-engineering-cheat-sheet/https://alexlavaee.me/blog/context-engineering-cheat-sheet/An interactive cheat sheet covering context engineering techniques for LLMs including retrieval, processing, management, and dynamic assembly strategies.Fri, 19 Sep 2025 00:00:00 GMT<p>An interactive cheat sheet covering context engineering techniques for LLMs including retrieval, processing, management, and dynamic assembly strategies.</p><p><a href="https://alexlavaee.me/blog/context-engineering-cheat-sheet/">Read more on the blog →</a></p>Building Products with Agentic-Powered IDEshttps://alexlavaee.me/blog/context-engineering-ai-ides/https://alexlavaee.me/blog/context-engineering-ai-ides/How context engineering transforms AI-powered development tools from disappointing to transformative through smart prompting, MCP servers, and strategic tool integration.Wed, 23 Jul 2025 00:00:00 GMT<p>How context engineering transforms AI-powered development tools from disappointing to transformative through smart prompting, MCP servers, and strategic tool integration.</p><p><a href="https://alexlavaee.me/blog/context-engineering-ai-ides/">Read more on the blog →</a></p>Memorization, Generalization, and Reasoninghttps://alexlavaee.me/blog/memorization-generalization-and-reasoning/https://alexlavaee.me/blog/memorization-generalization-and-reasoning/A deep dive into the concepts of memorization, generalization, and reasoning in large language models.Mon, 23 Jun 2025 00:00:00 GMT<p>A deep dive into the concepts of memorization, generalization, and reasoning in large language models.</p><p><a href="https://alexlavaee.me/blog/memorization-generalization-and-reasoning/">Read more on the blog →</a></p>