Letta (formerly MemGPT) is the open source framework for building AI agents with persistent long-term memory — agents that remember what happened in previous sessions, update their knowledge as they learn, and retrieve relevant context automatically rather than starting fresh every conversation.
The Problem
Standard LLM API calls are stateless. Your application manages conversation history, and that history is bounded by the model's context window. Once a conversation grows long enough, older context falls off. Agents built on raw API calls forget users between sessions, lose the thread of multi-week projects, and repeat questions already answered. Building memory on top of a raw API requires custom vector stores, retrieval logic, and memory update pipelines — weeks of infrastructure before the agent itself can be built.
How Letta Solves It
Letta handles memory as a first-class concept. Each agent has three memory layers: in-context working memory for the current conversation, persistent recall storage for past interactions, and an archival store for long-term knowledge. The agent itself can read and write its own memory — inserting new facts, searching past conversations, deciding what to remember — using built-in memory tools called as part of the normal generation process.
Key Features
- Three-tier memory: in-context working memory, recall storage (past conversations), archival (long-term facts)
- Agents read and write their own memory using built-in tool calls during generation
- REST API and Python SDK for building stateful agents without managing vector stores
- Persistent agents that survive server restarts across multi-session deployments
- Multi-user isolation: each user gets independent agent memory
- Supports any OpenAI-compatible model
Who It's For
Letta is best for developers building personal AI assistants that must remember users across sessions, enterprise chatbots that accumulate domain knowledge over time, and research teams studying long-term memory architectures for AI agents.
Compared to LangChain
Unlike LangChain's memory primitives, Letta treats memory as a first-class agent capability — the agent itself reads and writes its memory stores during generation, rather than having memory managed externally by the application layer.

