Ember AI memory research

Rami Maalouf BSc Computer Science - 30151862

1. Background & Motivation

Large-scale LLM applications face a fixed context window, making it hard to recall long-term user data. Platforms like Zep have introduced temporally-aware knowledge graphs to maintain evolving user context (arXiv). Mem0 achieves dynamic extraction and retrieval of salient conversation facts, outperforming many RAG systems in latency and cost (arXiv). MemoryOS proposes an OS-inspired, hierarchical memory (short-, mid-, long-term) for AI agents (arXiv), while HEMA combines compact summaries with vector episodic stores for ultra-long dialogues (arXiv).

This research is being conducted to guide the design and implementation of Ember, a university student AI superconnector for University of Calgary students. By engaging each student in a natural onboarding phone call, Ember extracts atomic facts & preferences (triples + metadata), builds and continually refines a structured profile (courses, interests, availability, etc.). For more info, visit https://heyember.vercel.app

Ember requires a memory store that supports:

Peer matching (semantic similarity + precise filters)
Efficient, scalable storage minimizing compute and storage costs while maximizing use of the LLM context window
Dynamic updates as user profiles evolve over time
Privacy compliance through encryption, minimal retention, and clear separation of public vs. private data)

2. Objectives

Architect a hybrid relational + vector‐embedding storage in Postgres (pgvector) or a graph database like neo4j or a service like Zep that
- Enables semantic matching of user facts + profiles
- Supports exact filtering (courses, age, preferences)
- Grows gracefully as users update data
Benchmark against:
- Self-managed Postgres with pgvector & JSONB filters
- Vendor memory services (Zep Cloud, Qdrant + hybrid filters)
- Graph-based memory (Zep’s Graphiti RAG)
Optimize for cost (compute, storage), scalability (30K→100K users), and LLM context utilization (minimize tokens per query).
Design a privacy-first framework:
- Data encryption in-transit and at-rest
- User consent, GDPR/CCPA compliance
- Differential retention policies
- Design and evaluate mechanisms for private vs. public data storage, so that sensitive fields can power the AI internally without ever being exposed in peer-facing APIs
Explore collaboration or differentiation with Zep’s open-source efforts:
- Could leverage Zep’s Entity Types for structured memory (blog.getzep.com)
- Or build a complementary in-DB trigger pipeline to sidestep vendor lock-in

3. Research Questions

Hybrid vs. Graph vs. Pure-Vector
- At what scale does hybrid relational+vector outperform graph-based (Zep) or pure-vector (Pinecone) approaches?
- How do dynamic updates impact latency, truthiness and efficiency of each storage approach?
Cost & Performance Trade-offs
- What are the long-term TCOs comparing the different approaches
- How does context-window token usage differ per architecture?
Data Evolution & Consistency
- How to version or prune stale user facts over time?
Privacy & Governance
- Which encryption and access-control patterns best protect user PII in vector indices?
- How to offer “right to be forgotten” across relational + vector stores?

4. Methodology

Literature & Code Review
- Deep dive into Zep’s Temporal Knowledge Graph (Graphiti) (arXiv)
- Analyze Mem0’s dynamic extraction & retrieval pipeline (arXiv)
- Study MemoryOS hierarchical updates (arXiv)
- Explore HEMA’s dual-memory tradeoffs (arXiv)
Prototype Implementations
- Hybrid Postgres: relational tables + pgvector + in-DB triggers
- Zep Graph: deploy Zep OSS, model user facts as Entities & Episodes
- Vector-Only: PostgreSQL + Pinecone or Qdrant with metadata
Benchmarking
- Performance: latency & throughput for similarity search + filters (30K→100K rows)
- Cost: cloud pricing (compute/storage) over 12 months
- Token Efficiency: average LLM context tokens per query
Privacy Evaluation
- Implement encryption-at-rest for vector tables vs. key-management in Zep
- Test GDPR “forget” flows across systems
- Implement and compare ways for separating private (AI-only) vs public user attributes (e.g. column-level encryption + RLS for private fields, separate public views) while ensuring public-facing features never leak private data.
Matching Accuracy Studies
- Use synthetic profiles & conversational logs: empirically verify that Ember’s matching engine is actually connecting students who “should” be matched.
- Measure precision@K for friend/mentor matching scenarios

5. Conclusion

This proposal outlines a comprehensive study to architect and evaluate a privacy-first hybrid memory store for Ember. By leveraging both relational precision and vector semantic power, and comparing against graph-based (Zep) and pure-vector solutions (Mem0-style), we’ll produce actionable guidance on delivering scalable, cost-effective, and evolving user memory for complex AI-powered products.

Research proposal

This project will evaluate the composite matching performance of leading AI memory architectures like Zep’s temporal knowledge graph, Mem0’s dynamic vector store, and also exploring a custom hybrid architecture that combines the old-school relational with vector databases (PostgreSQL + PGVector). Using Ember as a prototype, we will integrate each memory backend and assess peer-matching accuracy, latency, and scalability on a synthetic student dataset. The core research question is: Which architecture yields the highest end-to-end peer-matching accuracy? The findings will inform the design of privacy-aware, cost-effective memory systems for AI agents in social matchmaking contexts.

Some other questions that this research can answer

Retrieval Accuracy & Latency
- How do Zep, Mem0 and MemoryOS compare on precision@K and query latency when retrieving relevant user facts from a held-out “Ember” conversation dataset?
Dynamic Update Freshness
- After a user edits or adds new information in Ember, how quickly and accurately does each system surface that change in subsequent retrievals?
Cost & Resource Efficiency
- What are the end-to-end compute, storage, and token-usage costs for each approach at scales of 30K, 60K and 100K synthetic Ember users?
Scalability & Throughput
- How does query throughput and indexing time degrade (or hold up) as you increase Ember’s memory store from 10K to 100K facts in each system?
Privacy Boundary Enforcement
- Can you enforce “private” vs. “public” memory slices in each system—ensuring Ember’s sensitive fields never leak—without significantly harming retrieval quality?
Integration Complexity
- What is the developer effort and system complexity to wire Ember’s extraction pipeline into each memory backend (Zep’s graph, Mem0’s vector store, MemoryOS’s hierarchy)?
Composite Matching Performance (the one above)
- When combining semantic fact retrieval with Ember’s structured filters (courses, interests), which architecture yields the highest end-to-end peer-matching accuracy?

Rami's Universe

Explorer