Memory

MemGPT / Letta: OS-Style Memory Hierarchy

MemGPT (now Letta) introduced the idea of treating LLM memory like an operating system's memory hierarchy, with explicit paging between tiers.

Core Idea

Instead of stuffing everything into the context window, MemGPT manages three memory tiers:

Core Memory - Always in context. Small, editable blocks (persona, human info, system instructions). Think of it as RAM.
Recall Memory - Conversation history stored in a database. Searchable by recency or keyword. Think of it as a page file.
Archival Memory - Long-term vector store for facts, documents, and knowledge. Think of it as disk storage.

The LLM itself decides when to read/write across tiers using function calls (core_memory_append, archival_memory_search, etc.).

Key Design Decisions

Self-directed memory management: The agent decides what to remember, not the application developer.
Explicit function calls: Memory operations are tool calls, making them auditable and debuggable.
Inner monologue: The agent has a "thinking" step before each response, used to reason about what information it needs.
Pagination: When recall or archival search returns too many results, the agent can page through them.

Architecture

Loading diagram...

Relevance to Memory Platform

MemGPT's tier model maps well to our architecture:

MemGPT Tier	Memory Platform Equivalent
Core Memory	Project metadata + pinned memories
Recall Memory	Session-scoped memories (episodic)
Archival Memory	All memories with vector/FTS retrieval

The key insight is that the agent should control its own memory operations rather than having memory injected by the application. This aligns with our MCP tool-based approach where the agent calls memories/search and memories/create explicitly.

References

Packer et al., "MemGPT: Towards LLMs as Operating Systems" (2023)
Letta framework: open-source implementation with stateful agents
The "virtual context management" approach has been adopted by multiple agent frameworks