CommunityProgramación y desarrollogithub.com

memory-systems

This skill should be used for persistent semantic memory in agent systems: cross-session knowledge retention, entity tracking, temporal validity, graph or vector retrieval, memory consolidation, and memory benchmark selection. Route file-backed scratchpads to filesystem-context, handoff summaries to context-compression, and token-efficiency tactics to context-optimization.

Compatible con~Claude Code~Codex CLI~Cursor
npx add-skill https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering/tree/main/skills/memory-systems

Memory System Design

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

When to Activate

Activate this skill when:

  • Building agents that must persist knowledge across sessions
  • Choosing between memory frameworks (Mem0, Zep/Graphiti, Letta, LangMem, Cognee)
  • Needing to maintain entity consistency across conversations
  • Implementing reasoning over accumulated knowledge
  • Designing memory architectures that scale in production
  • Evaluating memory systems against benchmarks (LoCoMo, LongMemEval, DMR)
  • Building dynamic memory with automatic entity/relationship extraction and self-improving memory (Cognee)

Do not activate this skill for adjacent work owned by other skills:

  • File-backed scratchpads, run logs, and tool-output offloading: filesystem-context.
  • Conversation compaction or human-readable handoff summaries: context-compression.
  • Masking, prefix caching, token budgets, or retrieval scoping inside one trajectory: context-optimization.
  • Formal belief/desire/intention models over RDF state: bdi-mental-states.

Core Concepts

Think of memory as a spectrum from volatile context window to persistent storage. Default to the simplest layer that meets retrieval needs, because benchmark evidence suggests tool complexity matters less than reliable retrieval for some memory workloads (claim-memory-locomo-filesystem-baseline). Add structure (graphs, temporal validity) only when retrieval quality degrades or the agent needs multi-hop reasoning, relationship traversal, or time-travel queries.

Detailed Topics

Production Framework Landscape

Select a framework based on the dominant retrieval pattern the agent requires. Use this table to narrow the shortlist, then validate with the benchmark data below.

FrameworkArchitectureBest ForTrade-off
Mem0Vector store + graph memory, pluggable backendsMulti-tenant systems, broad integrationsLess specialized for multi-agent
Zep/GraphitiTemporal knowledge graph, bi-temporal modelEnterprise requiring relationship modeling + temporal reasoningAdvanced features cloud-locked
LettaSelf-editing memory with tiered storage (in-context/core/archival)Full agent introspection, stateful servicesComplexity for simple use cases
CogneeMulti-layer semantic graph via customizable ECL pipeline with customizable TasksEvolving agent memory that adapts and learns; multi-hop reasoningHeavier ingest-time processing
LangMemMemory tools for LangGraph workflowsTeams already on LangGraphTightly coupled to LangGraph
File-systemPlain files with naming conventionsSimple agents, prototypingNo semantic search, no relationships

Choose Zep/Graphiti when the agent needs bi-temporal modeling (tracking both when events occurred and when they were ingested) because its three-tier knowledge graph (episode, semantic entity, community subgraphs) excels at temporal queries. Choose Mem0 when the priority is fast time-to-production with managed infrastructure. Choose Letta when the agent needs deep self-introspection through its Agent Development Environment. Choose Cognee when the agent must build dense multi-layer semantic graphs — it layers text chunks and entity types as nodes with detailed relationship edges, and every core piece (ingestion, entity extraction, post-processing, retrieval) is customizable.

Benchmark Performance Comparison

Consult these benchmarks to set expectations, but treat them as source-specific signals for retrieval dimensions rather than absolute rankings. No single benchmark is definitive.

SystemDMR AccuracyLoCoMoHotPotQA (multi-hop)Latency
CogneePublished high scoreVariable
Zep (Temporal KG)Published high scoreMid-range across metricsLow-latency reported
Letta (filesystem)Published filesystem baseline
Mem0Published specialized-tool baselineLower in one comparison
MemGPTPublished high scoreVariable
GraphRAGPublished mid/high rangeVariable
Vector RAG baselinePublished lower rangeFast

Key takeaway: compare memory systems by retrieval shape, not brand. Use benchmark numbers as dated evidence that must be rechecked before making product claims; the stable design rule is to start shallow, measure retrieval quality, then add semantic or graph structure only when a simpler layer fails.

Memory Layers (Decision Points)

Pick the shallowest memory layer that satisfies the persistence requirement. Each deeper layer adds infrastructure cost and operational complexity, so only escalate when the shallower layer cannot meet the retrieval or durability need.

LayerPersistenceImplementationWhen to Use
WorkingContext window onlyScratchpad in system promptAlways — optimize with attention-favored positions
Short-termSession-scopedFile-system, in-memory cacheIntermediate tool results, conversation state
Long-termCross-sessionKey-value store → graph DBUser preferences, domain knowledge, entity registries
EntityCross-sessionEntity registry + propertiesMaintaining identity ("John Doe" = same person across conversations)
Temporal KGCross-session + historyGraph with validity intervalsFacts that change over time, time-travel queries, preventing context clash

Retrieval Strategies

Match the retrieval strategy to the query shape. Semantic search handles direct factual lookups well but degrades on multi-hop reasoning; entity-based traversal handles "everything about X" queries but requires graph structure; temporal filtering handles changing facts but requires validity metadata. When accuracy is paramount and infrastructure budget allows, combine strategies into hybrid retrieval.

StrategyUse WhenLimitation
Semantic (embedding similarity)Direct factual queriesDegrades on multi-hop reasoning
Entity-based (graph traversal)"Tell me everything about X"Requires graph structure
Temporal (validity filter)Facts change over timeRequires validity metadata
Hybrid (semantic + keyword + graph)Best overall accuracyMost infrastructure

Hybrid approaches reduce active context by retrieving only relevant subgraphs or memories. Cognee implements hybrid retrieval through multiple search modes across graph, vector, and relational stores, letting agents select the retrieval strategy that fits the query type rather than using a one-size-fits-all approach.

Memory Consolidation

Run consolidation periodically to prevent unbounded growth, because unchecked memory accumulation degrades retrieval quality over time. Invalidate but do not discard — preserving history matters for temporal queries that need to reconstruct past states. Trigger consolidation on memory count thresholds, degraded retrieval quality, or scheduled intervals. See Implementation Reference for working consolidation code.

Practical Guidance

Choosing a Memory Architecture

Start with the simplest viable layer and add complexity only when retrieval quality degrades. Most agents do not need a temporal knowledge graph on day one. Follow this escalation path:

  1. Prototype: Use file-system memory. Store facts as structured JSON with timestamps. This validates agent behavior before committing to infrastructure.
  2. Scale: Move to Mem0 or a vector store with metadata when the agent needs semantic search and multi-tenant isolation, because file-based lookup cannot handle similarity queries.
  3. Complex reasoning: Add Zep/Graphiti when the agent needs relationship traversal, temporal validity, or cross-session synthesis. Graphiti uses structured ties with generic relations, keeping graphs simple and easy to reason about; Cognee builds denser multi-layer semantic graphs with detailed relationship edges — choose based on whether the agent needs temporal bi-modeling (Graphiti) or richer interconnected knowledge structures (Cognee).
  4. Full control: Use Letta or Cognee when the agent must self-manage its own memory with deep introspection, because these frameworks expose memory operations as first-class agent actions.

Integration with Context

Load memories just-in-time rather than preloading everything, because large context payloads are expensive and degrade attention quality. Place retrieved memories in attention-favored positions (beginning or end of context) to maximize their influence on generation.

Error Recovery

Handle retrieval failures gracefully because memory systems are inherently noisy. Apply these recovery strategies in order:

  • Empty retrieval: Fall back to broader search (remove entity filter, widen time range). If still empty, prompt user for clarification.
  • Stale results: Check valid_until timestamps. If most results are expired, trigger consolidation before retrying.
  • Conflicting facts: Prefer the fact with the most recent valid_from. Surface the conflict to the user if confidence is low.
  • Storage failure: Queue writes for retry. Never block the agent's response on a memory write.

Examples

Example 1: Mem0 Integration

from mem0 import Memory

m = Memory()
m.add("User prefers dark mode and Python 3.12", user_id="alice")
m.add("User switched to light mode", user_id="alice")

# Retrieves current preference (light mode), not outdated one
results = m.search("What theme does the user prefer?", user_id="alice")

Example 2: Temporal Query

# Track entity with validity periods
graph.create_temporal_relationship(
    source_id=user_node,
    rel_type="LIVES_AT",
    target_id=address_node,
    valid_from=datetime(2024, 1, 15),
    valid_until=datetime(2024, 9, 1),  # moved out
)

# Query: Where did user live on March 1, 2024?
results = graph.query_at_time(
    {"type": "LIVES_AT", "source_label": "User"},
    query_time=datetime(2024, 3, 1)
)

Example 3: Cognee Memory Ingestion and Search

import cognee
from cognee.modules.search.types import SearchType

# Ingest and build knowledge graph
await cognee.add("./docs/")
await cognee.add("any data")
await cognee.cognify()

# Enrich memory
await cognee.memify()

# Agent retrieves relationship-aware context
results = await cognee.search(
    query_text="Any query for your memory",
    query_type=SearchType.GRAPH_COMPLETION,
)

Guidelines

  1. Start with file-system memory; add complexity only when retrieval quality demands it
  2. Track temporal validity for any fact that can change over time
  3. Use hybrid retrieval (semantic + keyword + graph) for best accuracy
  4. Consolidate memories periodically — invalidate but don't discard
  5. Design for retrieval failure: always have a fallback when memory lookup returns nothing
  6. Consider privacy implications of persistent memory (retention policies, deletion rights)
  7. Benchmark your memory system against LoCoMo or LongMemEval before and after changes
  8. Monitor memory growth and retrieval latency in production

Gotchas

  1. Stuffing everything into context: Loading all available memories into the prompt is expensive and degrades attention quality. Use just-in-time retrieval with relevance filtering instead.
  2. Ignoring temporal validity: Facts go stale. Without validity tracking, outdated information poisons the context and the agent acts on wrong assumptions.
  3. Over-engineering early: Simple filesystem-backed memory can outperform more specialized tooling on some benchmarks (claim-memory-locomo-filesystem-baseline). Add sophistication only when simple approaches demonstrably fail.
  4. No consolidation strategy: Unbounded memory growth degrades retrieval quality over time. Set memory count thresholds or scheduled intervals to trigger consolidation.
  5. Embedding model mismatch: Writing memories with one embedding model and reading with another produces poor retrieval because vector spaces are not interchangeable. Pin a single embedding model for each memory store and re-embed all entries if the model changes.
  6. Graph schema rigidity: Over-structured graph schemas (rigid node types, fixed relationship labels) break when the domain evolves. Prefer generic relation types and flexible property bags so new entity kinds do not require schema migrations.
  7. Stale memory poisoning: Old memories that contradict the current state corrupt agent behavior silently. Implement expiry policies or confidence decay so the agent deprioritizes aged facts, and surface contradictions explicitly when detected.
  8. Memory-context mismatch: Retrieving memories that are topically related but contextually wrong (e.g., a memory about "Python" the snake when the agent is discussing Python the language). Mitigate by including session or domain metadata in memory entries and filtering on it during retrieval.

Integration

This skill owns persistent semantic memory. Adjacent skills own scratch storage, compaction, and context tactics:

  • filesystem-context: file-backed scratchpads, logs, and simple run state before semantic retrieval is needed.
  • context-compression: summaries and handoffs that preserve session state in prose.
  • context-optimization: just-in-time memory loading and retrieval scoping inside active context budgets.
  • context-degradation: stale or conflicting memories as context poisoning or clash.
  • bdi-mental-states: formal mental-state modeling when beliefs, desires, intentions, and provenance chains matter.
  • multi-agent-patterns: shared memory across agents.
  • evaluation: memory quality, retrieval correctness, and benchmark selection.

References

Internal references:

  • Implementation Reference - Read when: implementing vector stores, property graphs, temporal queries, or memory consolidation logic from scratch

Related skills in this collection:

  • context-fundamentals - Read when: designing the context layer that memory feeds into
  • multi-agent-patterns - Read when: multiple agents need to share or coordinate memory state

External resources:

  • Zep temporal knowledge graph paper (arXiv:2501.13956) - Read when: evaluating bi-temporal modeling or Graphiti's architecture
  • Mem0 production architecture paper (arXiv:2504.19413) - Read when: assessing managed memory infrastructure trade-offs
  • Cognee optimized knowledge graph + LLM reasoning paper (arXiv:2505.24478) - Read when: comparing multi-layer semantic graph approaches
  • LoCoMo benchmark (Snap Research) - Read when: evaluating long-conversation memory retention
  • MemBench evaluation framework (ACL 2025) - Read when: designing memory evaluation suites
  • Graphiti open-source temporal KG engine (github.com/getzep/graphiti) - Read when: implementing temporal knowledge graphs
  • Cognee open-source knowledge graph memory (github.com/topoteretes/cognee) - Read when: building customizable ECL pipelines for memory
  • Cognee comparison: Form vs Function - Read when: comparing graph structures across Mem0, Graphiti, LightRAG, Cognee

Skill Metadata

Created: 2025-12-20 Last Updated: 2026-05-15 Author: Agent Skills for Context Engineering Contributors Version: 4.1.0

Individual skills in this repo

This repo contains 15 individual skills — each has its own dedicated page.

advanced-evaluation

This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment.

bdi-mental-states

This skill should be used when the user asks to "model agent mental states", "implement BDI architecture", "create belief-desire-intention models", "transform RDF to beliefs", "build cognitive agent", or mentions BDI ontology, mental state modeling, rational agency, or neuro-symbolic AI integration.

book-sft-pipeline

This skill should be used when the user asks to "fine-tune on books", "create SFT dataset", "train style model", "extract ePub text", or mentions style transfer, LoRA training, book segmentation, or author voice replication.

comprehensive-research-agent

Ensure thorough validation, error recovery, and transparent reasoning in research tasks with multiple tool calls

context-compression

This skill should be used when long-running agent sessions need context compression, structured summarization, compaction, token-per-task optimization, or durable handoff summaries that preserve decisions, files, risks, and next actions.

context-degradation

This skill should be used when the user asks to "diagnose context problems", "fix lost-in-middle issues", "debug agent failures", "understand context poisoning", or mentions context degradation, attention patterns, context clash, context confusion, or agent performance degradation. Provides patterns for recognizing and mitigating context failures.

context-fundamentals

This skill should be used when the user asks to "understand context", "explain context windows", "design agent architecture", "debug context issues", "optimize context usage", or discusses context components, attention mechanics, progressive disclosure, or context budgeting. Provides foundational understanding of context engineering for AI agent systems.

context-optimization

This skill should be used when the user asks to "optimize context", "reduce token costs", "improve context efficiency", "implement KV-cache optimization", "partition context", or mentions context limits, observation masking, context budgeting, or extending effective context capacity.

evaluation

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines.

filesystem-context

This skill should be used when the user asks to "offload context to files", "implement dynamic context discovery", "use filesystem for agent memory", "reduce context window bloat", or mentions file-based context management, tool output persistence, agent scratch pads, or just-in-time context loading.

hosted-agents

This skill should be used when the user asks to "build background agent", "create hosted coding agent", "set up sandboxed execution", "implement multiplayer agent", or mentions background agents, sandboxed VMs, agent infrastructure, Modal sandboxes, self-spawning agents, or remote coding environments.

latent-briefing

This skill should be used when the user asks to "share memory between agents", "KV cache compaction for multi-agent", "orchestrator worker context", "latent briefing", "reduce worker tokens", "cross-agent memory without summarization", or discusses Attention Matching compaction, recursive language models with workers, or token explosion in hierarchical agents.

multi-agent-patterns

This skill should be used when designing multi-agent systems that need context isolation, supervisor or swarm coordination, explicit handoffs, parallel execution, or a decision on whether multiple agents are justified.

project-development

This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.

tool-design

This skill should be used when the user asks to "design agent tools", "create tool descriptions", "reduce tool complexity", "implement MCP tools", or mentions tool consolidation, architectural reduction, tool naming conventions, or agent-tool interfaces.

Skills relacionados