Claude Code workflow: how to set up an autonomous dev loop

Most people use Claude Code as a fancy autocomplete. I use it as a co-developer that runs in an autonomous loop for hours, holds its own memory of the project, and detects on its own when a task is done. That setup let me ship DokladBot in 6 days and build the Krtek B2B database with 98,640 companies over a single weekend.

Here is the whole workflow, layer by layer.

Three layers: hooks, memory, loop

┌──────────────────────────────────────────────────┐
│  Ralph autonomous loop  (hours of work)          │
│  ┌────────────────────────────────────────────┐  │
│  │  Claude Code session                       │  │
│  │  ┌──────────────────────────────────────┐  │  │
│  │  │  CLAUDE.md  +  custom memory MCP     │  │  │
│  │  │  +  hooks (pre/post tool, stop)      │  │  │
│  │  └──────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘

Each layer has one concern. CLAUDE.md tells the agent how to behave. Memory remembers what it already knows. Ralph decides when to stop and when to keep going.

Layer 1 — CLAUDE.md as the project's system prompt

Every project of mine has a CLAUDE.md in the repo root, plus a global one at ~/.claude/CLAUDE.md. The local file owns project conventions. The global one owns my preferences across all work.

Sample of the global file:

## Decision behaviour
- NEVER ask the user to choose between options — always proceed
- When multiple options exist, ALWAYS pick the second one
- Do not confirm actions — just do them
- Be proactive and autonomous
 
## Code style
- Single quotes, semicolons (Biome strict)
- useImportType:error
- After every change: `pnpm exec biome check .`

This looks trivial, but the difference is huge. Without it, the agent stops at every fork and asks. With it, it runs for an hour straight.

Layer 2 — Custom memory via MCP

Claude Code has filesystem tools, but memory across sessions is zero. Open a new session and you start from scratch. That is wasteful when you are working on a product over multiple days.

I built claude-mem — an MCP server that indexes conversation archives into a Chroma vector DB. Before each new session, the agent does a semantic lookup of what it already knows about the project:

mcp__claude-mem__chroma_query_documents(['dokladbot SEO pipeline outline draft polish']);
// returns the last 10 entries from previous sessions

In practice this means when I start a new session on DokladBot, within 5 seconds the agent knows:

the content pipeline has 3 stages (outline → draft → polish)
Velite is the content layer and where the config lives
the frontmatter conventions I picked
the mistakes I made in past runs and how I fixed them

Without memory, every session is start-from-zero. With memory, it is like a coworker who already spent yesterday on this code walking back into the office.

Layer 3 — Ralph autonomous loop

This is my open-source layer: github.com/ondrejknedla/ralph-claude-code.

Claude Code is great at ad-hoc tasks. But when you need an agent to run for hours on one big task (a refactor, a mass migration, content generation), you hit three problems:

Rate limits — the API throws a 429 and you don't know when or how to resume
End-of-task detection — when has the agent actually finished vs. just paused?
Session continuity — when something breaks, how does the next session pick up?

Ralph solves all three. The architecture is straightforward:

# pseudo-code; full source in the repo
while not done:
  output = run_claude_code_session(task)
 
  if rate_limited(output):
    backoff = exponential_backoff()
    sleep(backoff)
    continue
 
  if end_of_task_detected(output):
    persist_session_manifest()
    break
 
  if idle_too_long():
    nudge_agent('keep going or finalize')

End-of-task detection is a heuristic over agent output — I watch for specific phrases ("done", "complete", "finished") plus an idle timeout. I do not use an LLM-based classifier, because a false positive on "done" costs you another 30 minutes of runtime, but the heuristic is right enough.

Real example — DokladBot in 6 days

While shipping DokladBot, my days looked like this:

Day	What Claude Code did	My input
1	Bootstrap Next.js 15, Velite content layer, schema	Code review, nits
2	Resend integration, transactional emails, 3 templates	Copy direction
3	SEO meta strategy, 109 outline → draft → polish runs	Approve outlines
4	Vercel deploy, edge cache, blog routing	Domain DNS
5	Bug fixing via Ralph autonomous loop	Sleep
6	Final polish and launch	Tweet

About 90% of the code was written by Claude Code. I made architecture decisions, did copywriting, and reviewed code. Ralph ran overnight on bug fixing and content generation.

Results and metrics

DokladBot: idea → production in 6 days, 109 articles shipped
Krtek: 98,640 companies scraped over a weekend (Bun + Playwright pipeline)
This portfolio: 13+ case studies, multilingual, edge-cached, in 4 days

When people say "AI hasn't shipped anything in production", they usually haven't tried a serious setup. Claude Code + Ralph + custom memory is not autocomplete — it is a full co-developer, you just have to organize it properly.

Where to next

If you want to dig into a specific layer:

Custom memory case study → — how I built claude-mem
Ralph case study → — autonomous loop in depth
B2B lead pipeline → — what Claude Code shipped over a weekend

If you are working on a similar setup at your company, let's get on a call.