Atomic: Automated Procedures and Memory for AI Coding Agents

Building on AI Coding Infrastructure

Atomic logo

In our previous post on AI Coding Infrastructure, we introduced a foundational layer for AI coding tools: skills, specialized subagents, ExecPlans, and autonomous execution with Ralph.

But we learned two critical lessons as we continued to scale it:

Lesson 1: Less is more. Having 114+ subagents and dozens of skills sounds powerful, but in practice it created confusion. Agents didn’t know which specialist to call. Skills overlapped and contradicted each other. We learned to curate a focused set of agents and skills that work cohesively in a single workflow.

Lesson 2: Specs were the main memory layer. Each coding session produced valuable insights that evaporated when the context window reset. Research findings, architectural decisions, debugging discoveries—all lost. We needed a system where work compounds rather than restarts and that maintains active working memory to keep the agent on track.

Atomic is that system. It’s an automated procedure with focused agents, commands/prompts, and skills designed to work together. The procedure for Atomic follows Software Development Lifecycle best practices and enables engineers to effectively steer agents, providing alignment.

The Core Insight: Lasting Memory Through Specs & Active Memory Through Progress.txt

In Atomic, specifications live in specs/ and survive across sessions. When a new session starts, agents read the existing specs to understand context, decisions, and progress. Research happens within a session, but the spec is what persists.

The workflow follows a clear progression:

  1. Research → Multiple agents analyze the codebase in parallel
  2. Spec Creation → Synthesize research into a specification (this is what persists)
  3. Feature Decomposition → Break the spec into discrete, implementable features (provides agent with actual status so it doesn’t erroneously declare a feature complete)
  4. Implementation → Build each feature with tests and validation (this is where active memory is tracked through progress.txt)
  5. Pull Request → Package changes for review

The features and progress.txt are what maintain active memory during implementation. The spec is the source of truth that carries context forward. Future sessions read existing specs to understand decisions, progress, and architectural choices.

Architecture: Three Primitives

Atomic operates through three interconnected primitives:

PrimitiveRoleExamples
CommandsOrchestrate workflows/research-codebase, /creat-spec, /create-feature-list, /implement-feature, /compact, /commit, /create-debug-report, create-pr, explain-code
subagentsExecute specialized taskscodebase-analyzer, codebase-locator, codebase-online-researcher, codebase-pattern-finder, codebase-research-analyzer, codebase-research-locater
SkillsInject domain knowledgeprompt-engineer, testing-anti-patterns
MCP ServersResearch online and debugplaywright, deepwiki

Nine commands automate our procedure, leveraging six subagents. Two skills available for prompt enhancement and testing best practices. Playwright is leveraged for debugging and documentation look up where we’ve seen Search APIs and webFetch fall short. While, deepwiki is leveraged for critical information on open source code relevant to the project being developed. To be able to effectively steer the agents, engineer is always in the loop for every step as a core principle.

The Research-to-Execution Flywheel

Here’s the complete workflow. Notice how research outputs feed into specs, specs decompose into features, and implementation failures loop back through debugging, all while accumulating documentation as memory.

100%
Scroll to zoom • Drag to pan
Atomic workflow architecture diagram showing the research-to-execution flywheel

Phase 1: Parallel Research

Six agents work simultaneously:

AgentOutputPersists To
Pattern FinderStructural patterns to followresearch/notes/
LocatorRelevant code sectionsresearch/tickets/
AnalyzerPatterns & architectureresearch/docs/
Research LocatorDeeper code location analysisresearch/tickets/
Research AnalyzerExtended analysis findingsresearch/docs/
Online ResearcherExternal docs & best practicesresearch/docs/

Why parallel? Research tasks are independent. One agent finding authentication patterns doesn’t block another analyzing database schemas. Review the spec and make sure it’s not missing any key information.

Phase 2: Specification

The create-spec command synthesizes all research into a single specification. Critically, this spec references the research documents so anyone (engineer or agent) reading the spec can trace decisions back to their source.

Specs include: problem statement, proposed solution, architectural decisions, acceptance criteria, and links to relevant research files. Review the spec and make sure it is in line with what you’re looking to implement.

Phase 3: Feature Decomposition

The create-feature-list command breaks specs into discrete features, each with:

{
"category": "functional",
"description": "New chat button creates a fresh conversation",
"steps": [
"Navigate to main interface",
"Click the 'New Chat' button",
"Verify a new conversation is created",
"Check that chat area shows welcome state",
"Verify conversation appears in sidebar"
],
"passes": false
}

Learning: Features must be small enough to implement in one session. If a feature requires multiple context switches, it’s too big, decompose further. Adjust depending on model you’re using and whether or not task falls within common training distribution set of models. Review all features and make any adjustments as necessary before proceeding.

Phase 4: Atomic Implementation

Features implement one at a time, each following this loop:

  1. Implement the change
  2. Create tests
  3. Validate all tests pass (including previous features)
  4. Review all code and make any adjustments as necessary
  5. Mark complete & commit

Why atomic? Parallel implementation creates merge conflicts and tangled dependencies making it harder to debug depending on the features. Additionally, as your codebase grows, it becomes more difficult for coding agents to follow instructions and make correct changes. Atomic implementation with validation gates catches issues immediately and prevents cascading failures. We only recommend parallel implementation when you’re sure that you’re working on areas that do not conflict with each other. In the future, as models and scaffolding/harnesses around these agents improve, we anticipate being able to increase the scope and greater parallelization for features.

In case of bugs, the create-debug-report command handles failures, producing diagnostic reports that naturally feed back into the implementation loop.

Phase 5: Quality Gates

Deterministic hooks enforce standards on every commit:

  • Pre-commit validation
  • Linting and formatting
  • Full test suite

These aren’t optional. Broken code doesn’t progress.

What We Learned

1. Agents Deliver 40-60%, Not 100%

Set expectations correctly. Atomic gets you to 40-60% completion on complex features. The remaining work requires human judgment: edge cases, performance optimization, integration testing, architectural refinement.

This isn’t a failure, it’s the right division of labor and realistic for today’s systems. Agents handle the mechanical work while you handle the judgment calls and code review.

2. Research Quality Determines Everything

Garbage research produces garbage specs which produce garbage implementations. We spent significant time refining the Research Agents because their output quality cascades through the entire pipeline.

3. Feature Size Matters More Than You Think

Initially we let the Feature List Agent produce features of any size. Large features failed constantly they exceeded context windows, created merge conflicts, made it harder to review, and led to more cascading failures.

The fix: enforce small features. If implementing a feature requires more than one session, decompose it. This constraint improved completion rates dramatically. This is something you’ll need to test and get a feel for through trial and error and is also dependent on the model you’re using.

4. Built-In Debug Loops Within Active Memory

As engineers are developing, they naturally run into bugs. During implementation, if the agent runs into a bug, this is added back into the feature list to be addressed in the next implementation loop. This mirrors just how an engineer works and resolves bugs as they go along with their implementation plan.

5. Documentation Compounds

The specs/ directories become genuinely valuable. New features reference old research. Debugging sessions read past notes. The system gets smarter over time.

This is the real payoff: lasting memory that compounds.

You Own Decisions, Agents Own Execution

Atomic doesn’t replace engineering judgment. It amplifies it.

  • You approve specifications before implementation begins
  • You review the feature list decomposition
  • You decide when to accept agent output vs. iterate

Agents handle the mechanical work: researching codebases, writing boilerplate, running tests, creating PRs. They’re excellent at execution. They’re not (yet) excellent at decisions.

That division is the key to making this work.

Getting Started

Repository: github.com/flora131/atomic

Key Takeaways

  • Specs as lasting memory: Specifications aren’t throwaway prompts— they persist across sessions and inform all future work
  • Three primitives: Commands orchestrate, Agents execute specialized tasks, Skills improve how tasks are performed
  • Research-to-execution flywheel: A five-step procedure where research feeds specs, specs feed features, and documentation accumulates
  • Parallel research, atomic implementation: Multiple agents analyze simultaneously, but features implement one at a time with quality gates
  • You own decisions and steer, agents own execution: Expect 40-60% completion requiring engineer polish and refinement

What’s Next

Atomic is actively evolving.

If you’re building with AI coding agents and want workflows that compound, Atomic is the next step.

PRs welcome.


References

[1] AI Coding Infrastructure - Previous post on foundational agent setup - https://alexlavaee.me/blog/ai-coding-infrastructure/
[3] Superpowers Framework - https://github.com/obra/superpowers