Memory Platform: Technical Architecture

A complete long-term memory system for AI agents. Write once, retrieve from anywhere — via REST, MCP, or CLI. The system autonomously evolves its own retrieval quality.


How It Works

Memory has three layers. You interact with the first. The other two run themselves.

Loading diagram...

Layer 1 is what you see. Create memories, search, browse. The API shapes never change.

Layer 2 decides what to return. Full-text search, vector similarity, and knowledge graph traversal are fused together to find the best results.

Layer 3 runs at night (9:00 UTC daily cron). It embeds memories, extracts entities, merges duplicates, decays stale knowledge, and optimizes retrieval — all without human intervention.


Research Foundations

Every major design decision maps to a published paper or system.

DecisionResearch basisWhat we implemented
Memory taxonomy (episodic, semantic, procedural)Memory Survey Dec 2025 — unified forms/functions/dynamics frameworkCategories: fact, decision, pattern, lesson, procedure, episode, note
Structured derivation (event logs + foresight)EverMemOS — event atomicization + future-oriented lanePOST /memories/:id/derive extracts atomic facts + deadlines
Hybrid vector + FTS retrievalMem0 — 26% over OpenAI baseline with hybrid three-storeReciprocal Rank Fusion over FTS + pgvector (1536-dim)
Temporal knowledge graphZep/Graphiti — no LLM at retrieval, P95 ~300msEntity extraction + entity_links with temporal validity
Self-organizing linksA-MEM — Zettelkasten for agents (NeurIPS 2025)entity_links table: related, supports, contradicts, supersedes
Retrieval bandit optimizationUCB1 multi-armed bandit literatureMemory Arena: 6 arms evaluated against real agent sessions
Privacy-aware forgettingMaRS/FiFA — hybrid time-decay + importanceConsolidation: decay stale, merge duplicates, promote quality
Tenant memory isolationMEXTRA — prompt injection extraction attacksSQL-level (tenant_type, tenant_id) on every query

Data Flow

Write path

Loading diagram...

The write path is fast — INSERT + audit event. The expensive work (embedding, derivation, graph extraction) happens in background evolution jobs.

Read path

Loading diagram...

FTS always runs. Vector search runs in parallel when OPENAI_API_KEY is configured. If no embeddings exist, you get pure FTS — no degradation.

Retrieval modes

ModeStrategyLatencyUse case
fastFTS only~10msHigh-frequency agent lookups
balancedFTS + vector + ILIKE fallback~50msDefault for most queries
deepFTS + vector + graph expansion~100msComplex multi-hop questions

Evolution System

The cron runs daily at 9:00 UTC. It does two things:

1. Schedule — inspect each tenant's state, enqueue work:

PriorityJobWhen it triggersWhat it does
10ArenaProject has sessions, no arena in 24hEvaluate 6 retrieval strategies via UCB1 bandit
8EmbeddingActive memories without embeddingsGenerate 1536-dim vectors via OpenAI
6DerivationActive memories without LLM derivationExtract facts, foresight, entities via Claude
4ConsolidationNear-duplicate embeddings (cosine > 0.92)Merge duplicates, decay stale, promote quality
2GraphActive memories without entity extractionBuild knowledge graph (entities + relationships)
1LearningNo propagation in 7 daysSpread arena winners: global > tenant > project

2. Execute — claim up to 5 jobs, 25-second budget, highest priority first.

Jobs are claimed with SELECT FOR UPDATE SKIP LOCKED — safe for concurrent execution.

Memory Arena

The arena evaluates how well each retrieval strategy performs on real agent conversations.

Arms (strategies being compared):

ArmModeDocumentsVector
fast_memoriesfastnono
balanced_memoriesbalancednono
deep_memoriesdeepnono
balanced_hybridbalancedyesno
deep_hybriddeepyesno
balanced_vectorbalancednoyes

Evaluation — each arm is tested against episodes extracted from real sessions:

episode = (user query, assistant response with evidence references)

score = recall * 0.55      (did we find what the agent actually used?)
      + precision * 0.15   (were results relevant?)
      + doc_recall * 0.20  (did we find the right documents?)
      + latency * 0.07     (was it fast?)
      + diversity * 0.03   (were results diverse?)

Bandit — UCB1 balances exploitation vs exploration:

ucb_score = mean_reward + 0.35 * sqrt(ln(total_pulls + 1) / pulls)

Winners are stored per-project. New projects inherit: project > tenant > global default.

Consolidation

Three actions, all recorded in the audit trail:

ActionTriggerEffect
MergeCosine similarity > 0.92LLM-summarized merge, originals superseded
DecayZero access + stale > 60 daysconfidence -= 0.15 (floor at 0.1)
Promoteaccess_count >= 5 + confidence >= 0.7quality = 'good' (boosts retrieval ranking)

Data Model

Core tables

memories
  id             UUID PK
  tenant_type    TEXT          -- 'user' or 'org'
  tenant_id      TEXT          -- Clerk user/org ID
  project_id     UUID FK
  session_id     UUID FK (nullable)
  category       TEXT          -- fact, decision, pattern, lesson, ...
  title          TEXT
  content        TEXT (markdown)
  tags           JSONB []
  context        JSONB {}
  confidence     FLOAT 0-1
  access_count   INT
  state          TEXT          -- active | superseded | quarantined
  quality        TEXT          -- unknown | good | bad

memory_embeddings
  memory_id      UUID PK FK → memories
  embedding      vector(1536)  -- pgvector, HNSW index
  content_hash   TEXT          -- staleness detection

entities
  id             UUID PK
  entity_type    TEXT          -- person | system | concept | technology | api | file
  name           TEXT
  mention_count  INT
  UNIQUE(tenant, project, type, name_normalized)

entity_links
  from_type/id   TEXT/UUID     -- memory, entity, artifact
  to_type/id     TEXT/UUID
  relation       TEXT          -- related, supports, contradicts, supersedes, merged_into, mentions
  confidence     FLOAT
  valid_from/to  TIMESTAMPTZ   -- temporal validity

memory_events                  -- immutable audit trail
  memory_id      UUID (no FK — history outlives data)
  event_type     TEXT          -- create, update, delete, derive, llm_derive, graph_extract, consolidation_*
  event_data     JSONB
  created_by     TEXT

evolution_jobs                 -- background job queue
  job_type       TEXT          -- arena, embedding, derivation, consolidation, graph, learning_propagation
  status         TEXT          -- pending, running, completed, failed
  priority       INT

Tenant isolation

Every query includes WHERE tenant_type = $1 AND tenant_id = $2. This is enforced at the data access layer — there is no code path that can bypass it.

  • Organization scope: Shared across org members. Projects, memories, and tokens created under an org are visible to all members.
  • Personal scope: Private to a single user. Completely isolated.

API Surface

REST API

MethodPathDescription
GET/api/memoriesList/search memories
POST/api/memoriesCreate memory
GET/api/memories/:idGet memory (increments access_count)
PUT/api/memories/:idUpdate memory
DELETE/api/memories/:idDelete memory
POST/api/memories/:id/deriveDerive facts + foresight
POST/api/memories/:id/lifecycleSet state/quality
GET/api/memories/search-indexHybrid search (FTS + vector + RRF)
GET/api/memories/timelineTime-ordered feed
POST/api/memories/batch-getFetch multiple by ID
GET/api/memories/foresight/activeUpcoming deadlines
GET/POST/api/projectsList/create projects
GET/POST/api/sessionsList/start sessions
GET/POST/api/evolve/*Arena, signals, jobs
POST/api/agent/askAgent query (retrieval + reasoning)

MCP JSON-RPC (/mcp)

11 tools for LLM agents:

ToolDescription
projects.list / projects.createProject management
memories.search_indexHybrid search (compact ranked hits)
memories.get / memories.create / memories.listMemory CRUD
memories.batch_getFetch multiple memories
memories.timelineTime-ordered browsing
memories.deriveExtract facts + foresight
memories.foresight_activeUpcoming deadlines
memories.providersAvailable search providers

Protocol: JSON-RPC 2.0, versions 2025-03-26 and 2024-11-05. Auth: OAuth 2.1 PKCE with dynamic client registration.

CLI

npm i -g @pajamadot/pajama
pajama login
pajama memories search-index --query "auth bug" --memory-mode balanced
pajama memories create --project-id <id> --category fact --title "..." --content "..."
pajama evolve arena-campaign --max-projects 10 --time-budget-ms 600000

Infrastructure

Loading diagram...
ComponentTechnologyPurpose
WebNext.js 16 on VercelDashboard, docs, research
APICloudflare Workers + HonoREST, MCP, OAuth, cron
DatabaseNeon Postgres + pgvector + HyperdriveStorage, search, embeddings
StorageCloudflare R2Large files (logs, artifacts)
AgentCloudflare Sandbox (Durable Objects)Streaming multi-turn sessions
AuthClerkJWT, orgs, user management
LLMClaude (Anthropic)Agent reasoning, derivation, graph extraction
EmbeddingsOpenAI text-embedding-3-small1536-dim vectors for semantic search

Migrations

14 migrations from initial schema to knowledge graph + performance indexes. Applied in sequence:

0001 init > 0002 multi-tenant > 0003 auth > 0004 assets > 0005 FTS > 0006 audit > 0007-0008 indexes > 0009 arena policies > 0010 project types > 0011 evolution jobs > 0012 pgvector > 0013 knowledge graph > 0014 performance indexes

Cron (daily, 9:00 UTC)

Three phases, in order:

  1. Research digests — fetch arXiv + GitHub feeds, store as memories
  2. New projects radar — same pattern, discovery feeds
  3. Evolution — schedule + execute up to 5 background jobs

Security & Privacy

LayerImplementation
Auth (web)Clerk JWT verified against JWKS
Auth (API)Scoped Bearer tokens
Auth (MCP)OAuth 2.1 PKCE with dynamic client registration
Tenant isolationSQL-level WHERE tenant_type = $1 AND tenant_id = $2
Encryption at restAES-256 (Neon + R2)
Encryption in transitTLS 1.3 (all connections)
Memory lifecyclequarantine (GDPR right to restriction), superseded (soft delete)
Audit trailmemory_events — immutable, no FK, survives deletes
Cross-tenant learningAnonymized aggregates only (arm IDs, confidence scores)

Full threat model and academic references: Memory Privacy & Data Protection.


E2E Test Coverage

All tests in e2e/ directory, run via Playwright:

SuiteWhat it tests
smoke-public.spec.tsEvery public page renders (home, docs, research, evolve, agent, assets, OAuth, settings)
smoke-live-api-mcp.spec.tsAPI health, MCP OAuth discovery, agent metadata, auth protection, provider list, agent diagnostics, MCP tools/list
mcp-integration.spec.tsFull MCP CRUD lifecycle: project create > memory create > search > get > batch_get > timeline > derive > list > foresight. Batch requests. Error handling.
cli-integration.spec.tsAll CLI subcommands (help output), authenticated operations (search, timeline, providers, evolve policy, agent ask, foresight)

Run with:

E2E_LIVE=true E2E_API_TOKEN=gdm_... npx playwright test