Memory Platform: Technical Architecture

A complete long-term memory system for AI agents. Write once, retrieve from anywhere — via REST, MCP, or CLI. The system autonomously evolves its own retrieval quality.

How It Works

Memory has three layers. You interact with the first. The other two run themselves.

Loading diagram...

Layer 1 is what you see. Create memories, search, browse. The API shapes never change.

Layer 2 decides what to return. Full-text search, vector similarity, and knowledge graph traversal are fused together to find the best results.

Layer 3 runs at night (9:00 UTC daily cron). It embeds memories, extracts entities, merges duplicates, decays stale knowledge, and optimizes retrieval — all without human intervention.

Research Foundations

Every major design decision maps to a published paper or system.

Decision	Research basis	What we implemented
Memory taxonomy (episodic, semantic, procedural)	Memory Survey Dec 2025 — unified forms/functions/dynamics framework	Categories: fact, decision, pattern, lesson, procedure, episode, note
Structured derivation (event logs + foresight)	EverMemOS — event atomicization + future-oriented lane	`POST /memories/:id/derive` extracts atomic facts + deadlines
Hybrid vector + FTS retrieval	Mem0 — 26% over OpenAI baseline with hybrid three-store	Reciprocal Rank Fusion over FTS + pgvector (1536-dim)
Temporal knowledge graph	Zep/Graphiti — no LLM at retrieval, P95 ~300ms	Entity extraction + `entity_links` with temporal validity
Self-organizing links	A-MEM — Zettelkasten for agents (NeurIPS 2025)	`entity_links` table: related, supports, contradicts, supersedes
Retrieval bandit optimization	UCB1 multi-armed bandit literature	Memory Arena: 6 arms evaluated against real agent sessions
Privacy-aware forgetting	MaRS/FiFA — hybrid time-decay + importance	Consolidation: decay stale, merge duplicates, promote quality
Tenant memory isolation	MEXTRA — prompt injection extraction attacks	SQL-level `(tenant_type, tenant_id)` on every query

Data Flow

Write path

Loading diagram...

The write path is fast — INSERT + audit event. The expensive work (embedding, derivation, graph extraction) happens in background evolution jobs.

Read path

Loading diagram...

FTS always runs. Vector search runs in parallel when OPENAI_API_KEY is configured. If no embeddings exist, you get pure FTS — no degradation.

Retrieval modes

Mode	Strategy	Latency	Use case
`fast`	FTS only	~10ms	High-frequency agent lookups
`balanced`	FTS + vector + ILIKE fallback	~50ms	Default for most queries
`deep`	FTS + vector + graph expansion	~100ms	Complex multi-hop questions

Evolution System

The cron runs daily at 9:00 UTC. It does two things:

1. Schedule — inspect each tenant's state, enqueue work:

Priority	Job	When it triggers	What it does
10	Arena	Project has sessions, no arena in 24h	Evaluate 6 retrieval strategies via UCB1 bandit
8	Embedding	Active memories without embeddings	Generate 1536-dim vectors via OpenAI
6	Derivation	Active memories without LLM derivation	Extract facts, foresight, entities via Claude
4	Consolidation	Near-duplicate embeddings (cosine > 0.92)	Merge duplicates, decay stale, promote quality
2	Graph	Active memories without entity extraction	Build knowledge graph (entities + relationships)
1	Learning	No propagation in 7 days	Spread arena winners: global > tenant > project

2. Execute — claim up to 5 jobs, 25-second budget, highest priority first.

Jobs are claimed with SELECT FOR UPDATE SKIP LOCKED — safe for concurrent execution.

Memory Arena

The arena evaluates how well each retrieval strategy performs on real agent conversations.

Arms (strategies being compared):

Arm	Mode	Documents	Vector
`fast_memories`	fast	no	no
`balanced_memories`	balanced	no	no
`deep_memories`	deep	no	no
`balanced_hybrid`	balanced	yes	no
`deep_hybrid`	deep	yes	no
`balanced_vector`	balanced	no	yes

Evaluation — each arm is tested against episodes extracted from real sessions:

episode = (user query, assistant response with evidence references)

score = recall * 0.55      (did we find what the agent actually used?)
      + precision * 0.15   (were results relevant?)
      + doc_recall * 0.20  (did we find the right documents?)
      + latency * 0.07     (was it fast?)
      + diversity * 0.03   (were results diverse?)

Bandit — UCB1 balances exploitation vs exploration:

ucb_score = mean_reward + 0.35 * sqrt(ln(total_pulls + 1) / pulls)

Winners are stored per-project. New projects inherit: project > tenant > global default.

Consolidation

Three actions, all recorded in the audit trail:

Action	Trigger	Effect
Merge	Cosine similarity > 0.92	LLM-summarized merge, originals superseded
Decay	Zero access + stale > 60 days	`confidence -= 0.15` (floor at 0.1)
Promote	`access_count >= 5` + `confidence >= 0.7`	`quality = 'good'` (boosts retrieval ranking)

Data Model

Core tables

memories
  id             UUID PK
  tenant_type    TEXT          -- 'user' or 'org'
  tenant_id      TEXT          -- Clerk user/org ID
  project_id     UUID FK
  session_id     UUID FK (nullable)
  category       TEXT          -- fact, decision, pattern, lesson, ...
  title          TEXT
  content        TEXT (markdown)
  tags           JSONB []
  context        JSONB {}
  confidence     FLOAT 0-1
  access_count   INT
  state          TEXT          -- active | superseded | quarantined
  quality        TEXT          -- unknown | good | bad

memory_embeddings
  memory_id      UUID PK FK → memories
  embedding      vector(1536)  -- pgvector, HNSW index
  content_hash   TEXT          -- staleness detection

entities
  id             UUID PK
  entity_type    TEXT          -- person | system | concept | technology | api | file
  name           TEXT
  mention_count  INT
  UNIQUE(tenant, project, type, name_normalized)

entity_links
  from_type/id   TEXT/UUID     -- memory, entity, artifact
  to_type/id     TEXT/UUID
  relation       TEXT          -- related, supports, contradicts, supersedes, merged_into, mentions
  confidence     FLOAT
  valid_from/to  TIMESTAMPTZ   -- temporal validity

memory_events                  -- immutable audit trail
  memory_id      UUID (no FK — history outlives data)
  event_type     TEXT          -- create, update, delete, derive, llm_derive, graph_extract, consolidation_*
  event_data     JSONB
  created_by     TEXT

evolution_jobs                 -- background job queue
  job_type       TEXT          -- arena, embedding, derivation, consolidation, graph, learning_propagation
  status         TEXT          -- pending, running, completed, failed
  priority       INT

Tenant isolation

Every query includes WHERE tenant_type = $1 AND tenant_id = $2. This is enforced at the data access layer — there is no code path that can bypass it.

Organization scope: Shared across org members. Projects, memories, and tokens created under an org are visible to all members.
Personal scope: Private to a single user. Completely isolated.

API Surface

REST API

Method	Path	Description
GET	`/api/memories`	List/search memories
POST	`/api/memories`	Create memory
GET	`/api/memories/:id`	Get memory (increments access_count)
PUT	`/api/memories/:id`	Update memory
DELETE	`/api/memories/:id`	Delete memory
POST	`/api/memories/:id/derive`	Derive facts + foresight
POST	`/api/memories/:id/lifecycle`	Set state/quality
GET	`/api/memories/search-index`	Hybrid search (FTS + vector + RRF)
GET	`/api/memories/timeline`	Time-ordered feed
POST	`/api/memories/batch-get`	Fetch multiple by ID
GET	`/api/memories/foresight/active`	Upcoming deadlines
GET/POST	`/api/projects`	List/create projects
GET/POST	`/api/sessions`	List/start sessions
GET/POST	`/api/evolve/*`	Arena, signals, jobs
POST	`/api/agent/ask`	Agent query (retrieval + reasoning)

MCP JSON-RPC (`/mcp`)

11 tools for LLM agents:

Tool	Description
`projects.list` / `projects.create`	Project management
`memories.search_index`	Hybrid search (compact ranked hits)
`memories.get` / `memories.create` / `memories.list`	Memory CRUD
`memories.batch_get`	Fetch multiple memories
`memories.timeline`	Time-ordered browsing
`memories.derive`	Extract facts + foresight
`memories.foresight_active`	Upcoming deadlines
`memories.providers`	Available search providers

Protocol: JSON-RPC 2.0, versions 2025-03-26 and 2024-11-05. Auth: OAuth 2.1 PKCE with dynamic client registration.

CLI

npm i -g @pajamadot/pajama
pajama login
pajama memories search-index --query "auth bug" --memory-mode balanced
pajama memories create --project-id <id> --category fact --title "..." --content "..."
pajama evolve arena-campaign --max-projects 10 --time-budget-ms 600000

Infrastructure

Loading diagram...

Component	Technology	Purpose
Web	Next.js 16 on Vercel	Dashboard, docs, research
API	Cloudflare Workers + Hono	REST, MCP, OAuth, cron
Database	Neon Postgres + pgvector + Hyperdrive	Storage, search, embeddings
Storage	Cloudflare R2	Large files (logs, artifacts)
Agent	Cloudflare Sandbox (Durable Objects)	Streaming multi-turn sessions
Auth	Clerk	JWT, orgs, user management
LLM	Claude (Anthropic)	Agent reasoning, derivation, graph extraction
Embeddings	OpenAI text-embedding-3-small	1536-dim vectors for semantic search

Migrations

14 migrations from initial schema to knowledge graph + performance indexes. Applied in sequence:

0001 init > 0002 multi-tenant > 0003 auth > 0004 assets > 0005 FTS > 0006 audit > 0007-0008 indexes > 0009 arena policies > 0010 project types > 0011 evolution jobs > 0012 pgvector > 0013 knowledge graph > 0014 performance indexes

Cron (daily, 9:00 UTC)

Three phases, in order:

Research digests — fetch arXiv + GitHub feeds, store as memories
New projects radar — same pattern, discovery feeds
Evolution — schedule + execute up to 5 background jobs

Security & Privacy

Layer	Implementation
Auth (web)	Clerk JWT verified against JWKS
Auth (API)	Scoped Bearer tokens
Auth (MCP)	OAuth 2.1 PKCE with dynamic client registration
Tenant isolation	SQL-level `WHERE tenant_type = $1 AND tenant_id = $2`
Encryption at rest	AES-256 (Neon + R2)
Encryption in transit	TLS 1.3 (all connections)
Memory lifecycle	`quarantine` (GDPR right to restriction), `superseded` (soft delete)
Audit trail	`memory_events` — immutable, no FK, survives deletes
Cross-tenant learning	Anonymized aggregates only (arm IDs, confidence scores)

Full threat model and academic references: Memory Privacy & Data Protection.

E2E Test Coverage

All tests in e2e/ directory, run via Playwright:

Suite	What it tests
`smoke-public.spec.ts`	Every public page renders (home, docs, research, evolve, agent, assets, OAuth, settings)
`smoke-live-api-mcp.spec.ts`	API health, MCP OAuth discovery, agent metadata, auth protection, provider list, agent diagnostics, MCP tools/list
`mcp-integration.spec.ts`	Full MCP CRUD lifecycle: project create > memory create > search > get > batch_get > timeline > derive > list > foresight. Batch requests. Error handling.
`cli-integration.spec.ts`	All CLI subcommands (help output), authenticated operations (search, timeline, providers, evolve policy, agent ask, foresight)

Run with:

E2E_LIVE=true E2E_API_TOKEN=gdm_... npx playwright test