Claude Code workflow: how to set up an autonomous dev loop
How I use Claude Code as my primary dev tool — Ralph autonomous loop, custom memory, CLAUDE.md instructions, and hooks. The real setup that shipped DokladBot in 6 days.
Most people use Claude Code as a fancy autocomplete. I use it as a co-developer that runs in an autonomous loop for hours, holds its own memory of the project, and detects on its own when a task is done. That setup let me ship DokladBot in 6 days and build the Krtek B2B database with 98,640 companies over a single weekend.
Here is the whole workflow, layer by layer.
Three layers: hooks, memory, loop
┌──────────────────────────────────────────────────┐
│ Ralph autonomous loop (hours of work) │
│ ┌────────────────────────────────────────────┐ │
│ │ Claude Code session │ │
│ │ ┌──────────────────────────────────────┐ │ │
│ │ │ CLAUDE.md + custom memory MCP │ │ │
│ │ │ + hooks (pre/post tool, stop) │ │ │
│ │ └──────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
Each layer has one concern. CLAUDE.md tells the agent how to behave. Memory remembers what it already knows. Ralph decides when to stop and when to keep going.
Layer 1 — CLAUDE.md as the project's system prompt
Every project of mine has a CLAUDE.md in the repo root, plus a global one at ~/.claude/CLAUDE.md. The local file owns project conventions. The global one owns my preferences across all work.
Sample of the global file:
## Decision behaviour
- NEVER ask the user to choose between options — always proceed
- When multiple options exist, ALWAYS pick the second one
- Do not confirm actions — just do them
- Be proactive and autonomous
## Code style
- Single quotes, semicolons (Biome strict)
- useImportType:error
- After every change: `pnpm exec biome check .`This looks trivial, but the difference is huge. Without it, the agent stops at every fork and asks. With it, it runs for an hour straight.
Layer 2 — Custom memory via MCP
Claude Code has filesystem tools, but memory across sessions is zero. Open a new session and you start from scratch. That is wasteful when you are working on a product over multiple days.
I built claude-mem — an MCP server that indexes conversation archives into a Chroma vector DB. Before each new session, the agent does a semantic lookup of what it already knows about the project:
mcp__claude-mem__chroma_query_documents(['dokladbot SEO pipeline outline draft polish']);
// returns the last 10 entries from previous sessionsIn practice this means when I start a new session on DokladBot, within 5 seconds the agent knows:
- the content pipeline has 3 stages (outline → draft → polish)
- Velite is the content layer and where the config lives
- the frontmatter conventions I picked
- the mistakes I made in past runs and how I fixed them
Without memory, every session is start-from-zero. With memory, it is like a coworker who already spent yesterday on this code walking back into the office.
Layer 3 — Ralph autonomous loop
This is my open-source layer: github.com/ondrejknedla/ralph-claude-code.
Claude Code is great at ad-hoc tasks. But when you need an agent to run for hours on one big task (a refactor, a mass migration, content generation), you hit three problems:
- Rate limits — the API throws a 429 and you don't know when or how to resume
- End-of-task detection — when has the agent actually finished vs. just paused?
- Session continuity — when something breaks, how does the next session pick up?
Ralph solves all three. The architecture is straightforward:
# pseudo-code; full source in the repo
while not done:
output = run_claude_code_session(task)
if rate_limited(output):
backoff = exponential_backoff()
sleep(backoff)
continue
if end_of_task_detected(output):
persist_session_manifest()
break
if idle_too_long():
nudge_agent('keep going or finalize')End-of-task detection is a heuristic over agent output — I watch for specific phrases ("done", "complete", "finished") plus an idle timeout. I do not use an LLM-based classifier, because a false positive on "done" costs you another 30 minutes of runtime, but the heuristic is right enough.
Real example — DokladBot in 6 days
While shipping DokladBot, my days looked like this:
| Day | What Claude Code did | My input |
|---|---|---|
| 1 | Bootstrap Next.js 15, Velite content layer, schema | Code review, nits |
| 2 | Resend integration, transactional emails, 3 templates | Copy direction |
| 3 | SEO meta strategy, 109 outline → draft → polish runs | Approve outlines |
| 4 | Vercel deploy, edge cache, blog routing | Domain DNS |
| 5 | Bug fixing via Ralph autonomous loop | Sleep |
| 6 | Final polish and launch | Tweet |
About 90% of the code was written by Claude Code. I made architecture decisions, did copywriting, and reviewed code. Ralph ran overnight on bug fixing and content generation.
Results and metrics
- DokladBot: idea → production in 6 days, 109 articles shipped
- Krtek: 98,640 companies scraped over a weekend (Bun + Playwright pipeline)
- This portfolio: 13+ case studies, multilingual, edge-cached, in 4 days
When people say "AI hasn't shipped anything in production", they usually haven't tried a serious setup. Claude Code + Ralph + custom memory is not autocomplete — it is a full co-developer, you just have to organize it properly.
Where to next
If you want to dig into a specific layer:
- Custom memory case study → — how I built claude-mem
- Ralph case study → — autonomous loop in depth
- B2B lead pipeline → — what Claude Code shipped over a weekend
If you are working on a similar setup at your company, let's get on a call.