Claude Code Works Better When You Let Sessions Die

I caught myself avoiding /clear for the third day in a row before I realized I was treating chat history like working memory. The context window was at something like 70%, the model had started repeating itself, and I was still adding to the same session because some half-load-bearing detail from Tuesday afternoon was buried in there and I did not want to lose it. That is the moment I stopped writing code for ten minutes and admitted I had a problem.

The problem is not new. It is the same psychological trap people had with browser tabs and Slack channels they could not bring themselves to leave. You build up state in one place, the state starts to feel valuable because of the work that went into producing it, and the cost of starting over feels higher than the cost of dragging the old state forward. With an LLM context window, the math is the other way. Dragging stale context forward is the expensive thing. The new chat is the cheap thing, as long as the memory layer has captured what matters.

Most write-ups about Claude Code memory skip past this part. There are plenty of guides that walk through CLAUDE.md syntax. Far fewer ask the question I had to ask myself: why was I, an engineer who allegedly understands caches and statelessness, behaving like a hoarder around a chat window? And once I worked that out, what did the daily workflow actually look like?

The hoarding behavior

Here is the shape of it, in case you have not noticed it in yourself. You start a chat on Monday morning. By Monday afternoon you have done good work in it. Tuesday morning you continue in the same chat because the model “knows what we are working on.” Wednesday you are still there. The compaction summary has fired at least once. The model has started forgetting things it remembered yesterday. You feel a low-grade anxiety any time you think about closing the session.

The irrational bit is that almost nothing in that chat is actually load-bearing. The code is in the repo. The decisions are (or should be) in commits or a design doc. The configuration is in CLAUDE.md. The interesting surprises and project-specific quirks that aren’t obvious from reading the code, those should be in auto-memory. If those three things are true, then the chat is just the route you took to get the work done, not the work itself. Routes are not worth preserving.

The trap is that it never feels that way in the moment. The model said something insightful four hours ago. You half-remember it. You assume you would lose it if you cleared. Most of the time, if you go look, the insight is either already encoded somewhere durable, or it was less load-bearing than memory made it feel. Memory inflates the value of conversational state in proportion to the effort that produced it. That is a behavioral bias, not a feature of the tool.

The layers, briefly

There are three places state actually lives in a Claude Code project. The chat is not one of them.

CLAUDE.md is the stable layer (official memory docs). Project conventions, architecture overview, test commands, what the codebase is and is not, things that change on the timescale of weeks or months. Both the user-global ~/.claude/CLAUDE.md and the project-local file get loaded into every new chat in that working directory. This is where you put the things you want the model to know on session one, without having to re-explain them.

Auto-memory is the medium-term layer. The harness writes notes here as you work, capturing things it has learned that are not obvious from the code: which utility actually gets used vs. which one is dead code, which API endpoint is flaky, which test you keep meaning to delete. The point of auto-memory is to absorb the kind of context that would otherwise live in your head, or worse, in the chat. It is scoped to a working directory, so each project gets its own.

Slash commands and subagents are the orchestration layer. Custom commands you can invoke from any chat. Subagents that run in their own context, do a specific job, and report back. They are how you do work that would otherwise pollute the main chat with research, search results, file listings, transient state you do not want to keep around.

Together the three layers are enough to keep almost everything that matters out of the main chat. The chat becomes a place to think out loud and execute, not somewhere you stash whatever the model just said.

Sessions per task, not per day

The single workflow change that did the most for me was starting a new chat when the task changed, not when the day ended. The chat scope should match the unit of work, not the calendar.

This sounds obvious. It is not how most people behave, and it was not how I behaved. I would start a chat for “the Tuesday morning thing,” do that work, then segue into a different feature, then a code review, then a debugging session, all in the same chat. By the third unrelated thing, the model was being asked to hold four mental models at once, the cache hit rate had collapsed, and I was paying full inference cost on every turn for the privilege of confusing the model.

A task ends when its acceptance criteria are met, or when you have to stop and design something different, or when the conversation pivots to a new file or subsystem. That is when you /clear. The next task gets a clean chat with a fresh prompt cache, fresh attention, and (if you have done the work) a memory layer that already knows the relevant context.

The piece nobody writes about: the cache. Anthropic’s prompt cache has a 5-minute TTL by default. Long conversations past that window are paying full token cost on the entire history every turn. New chats with good memory hydration are sometimes literally cheaper to run than the marathon session you have been nursing for three days, and they are faster. The cost curve of one giant chat across several days vs. several focused chats across the same span is not close, in my experience.

Write the memory before the work

The second pattern that took me a while to learn: when you start a non-trivial task, write the memory entry first.

Most people write project notes after the fact. They finish the work, then summarize what they decided. By that point you have forgotten which detail was load-bearing and which was incidental. The note ends up being a recap, not a useful primer for the next session.

If instead you start a task by writing a short note about what success looks like, what files are in scope, what the constraint is, what the model should not do, that note becomes the bootstrap for whatever chat picks up the work next. It does not have to be elaborate. Five lines. “Task: add the X migration. Touch only files in migrations/ and models/Y.py. The existing migration in 0042 is the template. Do not touch the runtime config.” If the chat gets cleared mid-task, the next chat starts hot. If it does not, you have a record of the original intent in case the work drifts.

I keep these in a /scratch directory in the project that is gitignored, one file per task. It is not glamorous. It works.

Auto-memory is for surprises, not facts

The temptation with auto-memory is to write everything down. Resist that. Auto-memory exists to capture the things that surprised you, the things you would not have predicted from reading the code. Not the things you would.

“The users table has an email column” is not a surprise. The code says that. Do not write it down.

“The users table has an email column that is silently lowercased on write but case-sensitively compared on read, so login fails for any user who signed up with a capital letter” is a surprise. That goes in memory. The next chat will not derive it from reading the code unless it reads three specific functions in two specific files. Writing it down once saves the next chat from rediscovering it the hard way.

The signal I use: would a competent engineer reading the repo for the first time guess this? If yes, it is in the code, do not duplicate. If no, it goes in memory. Bloated memory is worse than no memory, because the model has to read all of it on every turn and the load-bearing notes get buried under the trivial ones.

Subagents are clean-context machines

When I need fresh eyes on something, I delegate to a subagent and treat it like a research intern, not a coworker. The subagent gets the question, the relevant file paths, and nothing else. It does not inherit the assumptions I have built up in the main chat. It cannot defer to a half-remembered conversation from yesterday. It reads what is actually there. Anthropic’s writeup on effective harnesses for long-running agents covers the architectural reasoning behind this pattern.

This is most useful for two things. One, when I suspect the main chat has talked itself into a corner and is repeating a wrong assumption. Send a subagent to read the actual code and report back, no priming. Two, when I want to search the codebase for something specific without dragging a hundred grep results into the main context.

The mistake I made early on was treating subagents like extensions of the main chat. Passing them long preambles, expecting them to carry over reasoning, getting frustrated when they did not. They are not for that. They are for cases where context isolation is the feature.

The third-tangent rule

A heuristic I use that has saved me hours: when the chat has wandered into its third unrelated subtopic, /clear.

The first tangent is usually fine. You ask about a function, the model mentions a related issue, you decide whether to investigate. The second tangent is borderline. By the third, you are not in a working session anymore, you are in a conversation. Conversations are nice with people. With LLMs, they are how the model loses the thread and how your cache hit rate goes to zero.

The rule is not “stay focused or else.” It is “notice when you have drifted, and reset before drift becomes the dominant cost.” Most of the value of /clear is psychological, but the cache savings are real and the model focus is real.

What I stopped doing

I stopped using the context window as scratchpad. Anything I wanted to remember went into a file. The chat is for execution, not storage.

I stopped writing memory entries after the fact. Memory before work, not after. By the end of the task I have forgotten which details mattered.

I stopped duplicating context that was already in CLAUDE.md into the chat preamble. If it is in CLAUDE.md, the model already has it. Re-pasting it just wastes tokens and signals that I do not trust the memory layer, which means I will not use it correctly.

I stopped trying to “rescue” degraded chats. Once a chat has gone through one compaction, two unrelated tangents, and started repeating itself, the right move is /clear. The instinct is to push through because of sunk cost. The sunk cost is already gone. Pushing through just costs more.

I stopped opening a new chat for every micro-question and then abandoning it. A new chat for every “what does this regex do” is wasteful in a different way, because none of those sessions ever justify writing a memory entry, and the cumulative cold-start cost adds up. New chats are for new tasks, not new sentences.

What still annoys me

Cross-project memory is the biggest one. Memory is scoped to a working directory, which is correct most of the time, and a real pain when you work across multiple repos that share design language or conventions. The user-global CLAUDE.md helps for stable preferences, but it is not the same as having an answer to “what did we decide about pagination in the other service” when you are now working in this service. I have ended up keeping a small notes repo that I read manually when context needs to cross a project boundary. It is not great.

Memory becoming stale and you not noticing is the second one. An auto-memory note from three months ago can still be loaded into context today even if the codebase has moved past it. The model treats it as current. The mitigation is to occasionally read your own memory directory the way you would read a CHANGELOG, but I do not do this nearly often enough, and I have been bitten by it.

Compaction is uneven. The summary the model produces when the context gets too long is sometimes great and sometimes drops the one detail that was load-bearing. There is no way to nudge it toward keeping specific things. I have started writing those load-bearing details to a scratch file the moment they come up, on the assumption that compaction will eat them otherwise. It is a workaround, not a fix.

You also cannot reliably ask “what did we decide three sessions ago about X.” The model does not have access to old chats. If the answer is not in memory or a file, the answer is gone. This is the right design for privacy and statelessness reasons, and it is also occasionally inconvenient. I have learned to write decisions down when I make them, which is a habit I should have had anyway.

In Closing

The thing that surprised me most about all of this is how much of the change was behavioral rather than technical. The memory features exist. They are not hard to use. What was hard was breaking the habit of treating the chat as the source of truth, the habit of dread around /clear, the habit of stuffing more into one session because more felt safer.

Once that broke, the rest of the patterns started feeling natural rather than like things I had to remind myself to do. Session-per-task. Memory before the work. Subagent when context isolation is the point. The features stopped feeling like things you opt into and started feeling like the obvious way to use the tool.

I still catch myself reaching for a chat that has been open too long. The instinct does not go away, it just gets easier to override. That is the most honest thing I can say about the whole shift. The habit took maybe a month to actually change, which is longer than learning the features themselves, and that is probably the only thing worth telling someone who is about to try it.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.