MemGPT / Letta: OS-Style Memory Hierarchy
MemGPT (now Letta) introduced the idea of treating LLM memory like an operating system's memory hierarchy, with explicit paging between tiers.
Core Idea
Instead of stuffing everything into the context window, MemGPT manages three memory tiers:
- Core Memory - Always in context. Small, editable blocks (persona, human info, system instructions). Think of it as RAM.
- Recall Memory - Conversation history stored in a database. Searchable by recency or keyword. Think of it as a page file.
- Archival Memory - Long-term vector store for facts, documents, and knowledge. Think of it as disk storage.
The LLM itself decides when to read/write across tiers using function calls (core_memory_append, archival_memory_search, etc.).
Key Design Decisions
- Self-directed memory management: The agent decides what to remember, not the application developer.
- Explicit function calls: Memory operations are tool calls, making them auditable and debuggable.
- Inner monologue: The agent has a "thinking" step before each response, used to reason about what information it needs.
- Pagination: When recall or archival search returns too many results, the agent can page through them.
Architecture
Loading diagram...
Relevance to Memory Platform
MemGPT's tier model maps well to our architecture:
| MemGPT Tier | Memory Platform Equivalent |
|---|---|
| Core Memory | Project metadata + pinned memories |
| Recall Memory | Session-scoped memories (episodic) |
| Archival Memory | All memories with vector/FTS retrieval |
The key insight is that the agent should control its own memory operations rather than having memory injected by the application. This aligns with our MCP tool-based approach where the agent calls memories/search and memories/create explicitly.
References
- Packer et al., "MemGPT: Towards LLMs as Operating Systems" (2023)
- Letta framework: open-source implementation with stateful agents
- The "virtual context management" approach has been adopted by multiple agent frameworks