RESEARCH

Zep / Graphiti

Temporal knowledge graphs; no LLM calls at retrieval.

Zep / Graphiti: Temporal Knowledge Graphs for Agent Memory

Zep's Graphiti is a temporal knowledge graph system designed for agent memory. Its standout feature: no LLM calls at retrieval time, achieving P95 latency of 300ms for memory lookups.

Core Architecture

Graphiti builds a knowledge graph where:

  • Nodes are entities (people, concepts, projects, tools).
  • Edges are relationships with temporal metadata (when the relationship was established, modified, or invalidated).
  • Episodes are conversation segments that sourced the knowledge.

Ingestion (Write Path - uses LLM)

  1. Extract entities and relationships from conversation using an LLM.
  2. Resolve entities against existing graph (deduplication + merging).
  3. Add temporal metadata (valid_from, valid_to, source_episode).
  4. Update graph edges and invalidate contradicted facts.

Retrieval (Read Path - no LLM)

  1. Parse the query into entity mentions using lightweight NLP (no LLM).
  2. Traverse the graph from matched entities.
  3. Filter by temporal validity (only return currently-valid facts).
  4. Rank by relevance and recency.
  5. Return structured results with provenance.

Key Design Decisions

  • LLM at write, not read: Expensive extraction happens once during ingestion; retrieval is pure graph traversal.
  • Temporal validity: Facts have time ranges, enabling "what was true at time T?" queries.
  • Episode-level provenance: Every fact traces back to the conversation that created it.
  • Incremental updates: New information updates the graph; it doesn't rebuild it.

Performance

  • P95 retrieval latency: ~300ms (no LLM in the loop)
  • Graph queries: Sub-second even on large graphs
  • Write latency: Higher (1-3s) due to LLM extraction, but acceptable for background processing

Relevance to Memory Platform

Graphiti's approach informs several platform design decisions:

  • Separate write and read complexity: Our API can do expensive processing on create/update but keep retrieval fast.
  • Temporal awareness: Our created_at/updated_at + confidence scores serve a similar purpose; a graph layer could enhance this.
  • Provenance: Our session_id and source_type fields provide episode-level tracking.
  • No-LLM retrieval: Our FTS/SQL retrieval path is already LLM-free; the agent decides what to search for.

References

  • Zep Graphiti documentation and architecture
  • Temporal knowledge graphs for conversational AI
  • The "LLM at write, graph at read" pattern is increasingly common in production systems