Early on, I thought better memory meant storing everything. In practice, that made agents slower and less reliable. Useful memory is selective memory.
Three Buckets
- Session memory: what was just said in the active conversation window.
- Working memory: current objective, constraints, and partial reasoning state.
- Long-term memory: stable preferences and recurring responsibilities.
Compression Is Mandatory
Raw transcripts are expensive and noisy. Structured summaries are usually better because they preserve intent without flooding the model with irrelevant detail.
Retrieval Should Be Intent-Driven
Retrieval should start with one question: what decision is the agent making right now? Pull only memory that helps that decision. This keeps latency low and reduces fake confidence caused by irrelevant context.