Memory Platform: Technical Architecture
A complete long-term memory system for AI agents. Write once, retrieve from anywhere — via REST, MCP, or CLI. The system autonomously evolves its own retrieval quality.
How It Works
Memory has three layers. You interact with the first. The other two run themselves.
Loading diagram...
Layer 1 is what you see. Create memories, search, browse. The API shapes never change.
Layer 2 decides what to return. Full-text search, vector similarity, and knowledge graph traversal are fused together to find the best results.
Layer 3 runs at night (9:00 UTC daily cron). It embeds memories, extracts entities, merges duplicates, decays stale knowledge, and optimizes retrieval — all without human intervention.
Research Foundations
Every major design decision maps to a published paper or system.
| Decision | Research basis | What we implemented |
|---|---|---|
| Memory taxonomy (episodic, semantic, procedural) | Memory Survey Dec 2025 — unified forms/functions/dynamics framework | Categories: fact, decision, pattern, lesson, procedure, episode, note |
| Structured derivation (event logs + foresight) | EverMemOS — event atomicization + future-oriented lane | POST /memories/:id/derive extracts atomic facts + deadlines |
| Hybrid vector + FTS retrieval | Mem0 — 26% over OpenAI baseline with hybrid three-store | Reciprocal Rank Fusion over FTS + pgvector (1536-dim) |
| Temporal knowledge graph | Zep/Graphiti — no LLM at retrieval, P95 ~300ms | Entity extraction + entity_links with temporal validity |
| Self-organizing links | A-MEM — Zettelkasten for agents (NeurIPS 2025) | entity_links table: related, supports, contradicts, supersedes |
| Retrieval bandit optimization | UCB1 multi-armed bandit literature | Memory Arena: 6 arms evaluated against real agent sessions |
| Privacy-aware forgetting | MaRS/FiFA — hybrid time-decay + importance | Consolidation: decay stale, merge duplicates, promote quality |
| Tenant memory isolation | MEXTRA — prompt injection extraction attacks | SQL-level (tenant_type, tenant_id) on every query |
Data Flow
Write path
Loading diagram...
The write path is fast — INSERT + audit event. The expensive work (embedding, derivation, graph extraction) happens in background evolution jobs.
Read path
Loading diagram...
FTS always runs. Vector search runs in parallel when OPENAI_API_KEY is configured. If no embeddings exist, you get pure FTS — no degradation.
Retrieval modes
| Mode | Strategy | Latency | Use case |
|---|---|---|---|
fast | FTS only | ~10ms | High-frequency agent lookups |
balanced | FTS + vector + ILIKE fallback | ~50ms | Default for most queries |
deep | FTS + vector + graph expansion | ~100ms | Complex multi-hop questions |
Evolution System
The cron runs daily at 9:00 UTC. It does two things:
1. Schedule — inspect each tenant's state, enqueue work:
| Priority | Job | When it triggers | What it does |
|---|---|---|---|
| 10 | Arena | Project has sessions, no arena in 24h | Evaluate 6 retrieval strategies via UCB1 bandit |
| 8 | Embedding | Active memories without embeddings | Generate 1536-dim vectors via OpenAI |
| 6 | Derivation | Active memories without LLM derivation | Extract facts, foresight, entities via Claude |
| 4 | Consolidation | Near-duplicate embeddings (cosine > 0.92) | Merge duplicates, decay stale, promote quality |
| 2 | Graph | Active memories without entity extraction | Build knowledge graph (entities + relationships) |
| 1 | Learning | No propagation in 7 days | Spread arena winners: global > tenant > project |
2. Execute — claim up to 5 jobs, 25-second budget, highest priority first.
Jobs are claimed with SELECT FOR UPDATE SKIP LOCKED — safe for concurrent execution.
Memory Arena
The arena evaluates how well each retrieval strategy performs on real agent conversations.
Arms (strategies being compared):
| Arm | Mode | Documents | Vector |
|---|---|---|---|
fast_memories | fast | no | no |
balanced_memories | balanced | no | no |
deep_memories | deep | no | no |
balanced_hybrid | balanced | yes | no |
deep_hybrid | deep | yes | no |
balanced_vector | balanced | no | yes |
Evaluation — each arm is tested against episodes extracted from real sessions:
episode = (user query, assistant response with evidence references)
score = recall * 0.55 (did we find what the agent actually used?)
+ precision * 0.15 (were results relevant?)
+ doc_recall * 0.20 (did we find the right documents?)
+ latency * 0.07 (was it fast?)
+ diversity * 0.03 (were results diverse?)
Bandit — UCB1 balances exploitation vs exploration:
ucb_score = mean_reward + 0.35 * sqrt(ln(total_pulls + 1) / pulls)
Winners are stored per-project. New projects inherit: project > tenant > global default.
Consolidation
Three actions, all recorded in the audit trail:
| Action | Trigger | Effect |
|---|---|---|
| Merge | Cosine similarity > 0.92 | LLM-summarized merge, originals superseded |
| Decay | Zero access + stale > 60 days | confidence -= 0.15 (floor at 0.1) |
| Promote | access_count >= 5 + confidence >= 0.7 | quality = 'good' (boosts retrieval ranking) |
Data Model
Core tables
memories
id UUID PK
tenant_type TEXT -- 'user' or 'org'
tenant_id TEXT -- Clerk user/org ID
project_id UUID FK
session_id UUID FK (nullable)
category TEXT -- fact, decision, pattern, lesson, ...
title TEXT
content TEXT (markdown)
tags JSONB []
context JSONB {}
confidence FLOAT 0-1
access_count INT
state TEXT -- active | superseded | quarantined
quality TEXT -- unknown | good | bad
memory_embeddings
memory_id UUID PK FK → memories
embedding vector(1536) -- pgvector, HNSW index
content_hash TEXT -- staleness detection
entities
id UUID PK
entity_type TEXT -- person | system | concept | technology | api | file
name TEXT
mention_count INT
UNIQUE(tenant, project, type, name_normalized)
entity_links
from_type/id TEXT/UUID -- memory, entity, artifact
to_type/id TEXT/UUID
relation TEXT -- related, supports, contradicts, supersedes, merged_into, mentions
confidence FLOAT
valid_from/to TIMESTAMPTZ -- temporal validity
memory_events -- immutable audit trail
memory_id UUID (no FK — history outlives data)
event_type TEXT -- create, update, delete, derive, llm_derive, graph_extract, consolidation_*
event_data JSONB
created_by TEXT
evolution_jobs -- background job queue
job_type TEXT -- arena, embedding, derivation, consolidation, graph, learning_propagation
status TEXT -- pending, running, completed, failed
priority INT
Tenant isolation
Every query includes WHERE tenant_type = $1 AND tenant_id = $2. This is enforced at the data access layer — there is no code path that can bypass it.
- Organization scope: Shared across org members. Projects, memories, and tokens created under an org are visible to all members.
- Personal scope: Private to a single user. Completely isolated.
API Surface
REST API
| Method | Path | Description |
|---|---|---|
| GET | /api/memories | List/search memories |
| POST | /api/memories | Create memory |
| GET | /api/memories/:id | Get memory (increments access_count) |
| PUT | /api/memories/:id | Update memory |
| DELETE | /api/memories/:id | Delete memory |
| POST | /api/memories/:id/derive | Derive facts + foresight |
| POST | /api/memories/:id/lifecycle | Set state/quality |
| GET | /api/memories/search-index | Hybrid search (FTS + vector + RRF) |
| GET | /api/memories/timeline | Time-ordered feed |
| POST | /api/memories/batch-get | Fetch multiple by ID |
| GET | /api/memories/foresight/active | Upcoming deadlines |
| GET/POST | /api/projects | List/create projects |
| GET/POST | /api/sessions | List/start sessions |
| GET/POST | /api/evolve/* | Arena, signals, jobs |
| POST | /api/agent/ask | Agent query (retrieval + reasoning) |
MCP JSON-RPC (/mcp)
11 tools for LLM agents:
| Tool | Description |
|---|---|
projects.list / projects.create | Project management |
memories.search_index | Hybrid search (compact ranked hits) |
memories.get / memories.create / memories.list | Memory CRUD |
memories.batch_get | Fetch multiple memories |
memories.timeline | Time-ordered browsing |
memories.derive | Extract facts + foresight |
memories.foresight_active | Upcoming deadlines |
memories.providers | Available search providers |
Protocol: JSON-RPC 2.0, versions 2025-03-26 and 2024-11-05. Auth: OAuth 2.1 PKCE with dynamic client registration.
CLI
npm i -g @pajamadot/pajama pajama login pajama memories search-index --query "auth bug" --memory-mode balanced pajama memories create --project-id <id> --category fact --title "..." --content "..." pajama evolve arena-campaign --max-projects 10 --time-budget-ms 600000
Infrastructure
Loading diagram...
| Component | Technology | Purpose |
|---|---|---|
| Web | Next.js 16 on Vercel | Dashboard, docs, research |
| API | Cloudflare Workers + Hono | REST, MCP, OAuth, cron |
| Database | Neon Postgres + pgvector + Hyperdrive | Storage, search, embeddings |
| Storage | Cloudflare R2 | Large files (logs, artifacts) |
| Agent | Cloudflare Sandbox (Durable Objects) | Streaming multi-turn sessions |
| Auth | Clerk | JWT, orgs, user management |
| LLM | Claude (Anthropic) | Agent reasoning, derivation, graph extraction |
| Embeddings | OpenAI text-embedding-3-small | 1536-dim vectors for semantic search |
Migrations
14 migrations from initial schema to knowledge graph + performance indexes. Applied in sequence:
0001 init > 0002 multi-tenant > 0003 auth > 0004 assets > 0005 FTS > 0006 audit > 0007-0008 indexes > 0009 arena policies > 0010 project types > 0011 evolution jobs > 0012 pgvector > 0013 knowledge graph > 0014 performance indexes
Cron (daily, 9:00 UTC)
Three phases, in order:
- Research digests — fetch arXiv + GitHub feeds, store as memories
- New projects radar — same pattern, discovery feeds
- Evolution — schedule + execute up to 5 background jobs
Security & Privacy
| Layer | Implementation |
|---|---|
| Auth (web) | Clerk JWT verified against JWKS |
| Auth (API) | Scoped Bearer tokens |
| Auth (MCP) | OAuth 2.1 PKCE with dynamic client registration |
| Tenant isolation | SQL-level WHERE tenant_type = $1 AND tenant_id = $2 |
| Encryption at rest | AES-256 (Neon + R2) |
| Encryption in transit | TLS 1.3 (all connections) |
| Memory lifecycle | quarantine (GDPR right to restriction), superseded (soft delete) |
| Audit trail | memory_events — immutable, no FK, survives deletes |
| Cross-tenant learning | Anonymized aggregates only (arm IDs, confidence scores) |
Full threat model and academic references: Memory Privacy & Data Protection.
E2E Test Coverage
All tests in e2e/ directory, run via Playwright:
| Suite | What it tests |
|---|---|
smoke-public.spec.ts | Every public page renders (home, docs, research, evolve, agent, assets, OAuth, settings) |
smoke-live-api-mcp.spec.ts | API health, MCP OAuth discovery, agent metadata, auth protection, provider list, agent diagnostics, MCP tools/list |
mcp-integration.spec.ts | Full MCP CRUD lifecycle: project create > memory create > search > get > batch_get > timeline > derive > list > foresight. Batch requests. Error handling. |
cli-integration.spec.ts | All CLI subcommands (help output), authenticated operations (search, timeline, providers, evolve policy, agent ask, foresight) |
Run with:
E2E_LIVE=true E2E_API_TOKEN=gdm_... npx playwright test