Zep / Graphiti: Temporal Knowledge Graphs for Agent Memory
Zep's Graphiti is a temporal knowledge graph system designed for agent memory. Its standout feature: no LLM calls at retrieval time, achieving P95 latency of 300ms for memory lookups.
Core Architecture
Graphiti builds a knowledge graph where:
- Nodes are entities (people, concepts, projects, tools).
- Edges are relationships with temporal metadata (when the relationship was established, modified, or invalidated).
- Episodes are conversation segments that sourced the knowledge.
Ingestion (Write Path - uses LLM)
- Extract entities and relationships from conversation using an LLM.
- Resolve entities against existing graph (deduplication + merging).
- Add temporal metadata (valid_from, valid_to, source_episode).
- Update graph edges and invalidate contradicted facts.
Retrieval (Read Path - no LLM)
- Parse the query into entity mentions using lightweight NLP (no LLM).
- Traverse the graph from matched entities.
- Filter by temporal validity (only return currently-valid facts).
- Rank by relevance and recency.
- Return structured results with provenance.
Key Design Decisions
- LLM at write, not read: Expensive extraction happens once during ingestion; retrieval is pure graph traversal.
- Temporal validity: Facts have time ranges, enabling "what was true at time T?" queries.
- Episode-level provenance: Every fact traces back to the conversation that created it.
- Incremental updates: New information updates the graph; it doesn't rebuild it.
Performance
- P95 retrieval latency: ~300ms (no LLM in the loop)
- Graph queries: Sub-second even on large graphs
- Write latency: Higher (1-3s) due to LLM extraction, but acceptable for background processing
Relevance to Memory Platform
Graphiti's approach informs several platform design decisions:
- Separate write and read complexity: Our API can do expensive processing on create/update but keep retrieval fast.
- Temporal awareness: Our
created_at/updated_at+ confidence scores serve a similar purpose; a graph layer could enhance this. - Provenance: Our
session_idandsource_typefields provide episode-level tracking. - No-LLM retrieval: Our FTS/SQL retrieval path is already LLM-free; the agent decides what to search for.
References
- Zep Graphiti documentation and architecture
- Temporal knowledge graphs for conversational AI
- The "LLM at write, graph at read" pattern is increasingly common in production systems