Right, let’s talk about memory. Because without it, your AI agent is just a glorified, one-shot API call with amnesia. It’s the difference between a colleague who remembers the entire project history and a new intern you have to re-introduce yourself to every single morning.

The core problem is context windows. LLMs have a shockingly short attention span. You’re basically trying to fit the entire plot of War and Peace into a tweet. We combat this with a strategy you’re already familiar with: not remembering everything, but remembering the right things. We break it down into three key types.

Short-Term Memory: The “What Just Happened”

This is the agent’s working memory, and it’s brutally simple: it’s the conversation history or the chain of thought you shove into the next API call. It’s the immediate context of the current interaction.

In code, this usually looks like a list of message dictionaries that you maintain and constantly append to.

# A simple list acting as our short-term memory buffer
short_term_memory = []

def execute_agent_step(user_input, memory):
    # Build the conversation history for context
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
    ]
    # Add the entire remembered history
    messages.extend(memory)
    # Add the latest user input
    messages.append({"role": "user", "content": user_input})

    # Call the LLM with the full context
    response = call_llm(messages)
    
    # Append this entire exchange to memory for the next turn
    memory.append({"role": "user", "content": user_input})
    memory.append({"role": "assistant", "content": response})
    
    return response

# First turn
response_1 = execute_agent_step("What's the capital of France?", short_term_memory)
print(response_1) # "The capital of France is Paris."

# Second turn - the agent *remembers* the previous exchange
response_2 = execute_agent_step("And what is its population?", short_term_memory)
# The prompt now includes the previous Q&A, so it knows "its" refers to Paris.

The pitfall here is obvious: this list grows until it explodes your context window. Which brings us to…

Long-Term Memory: The “Important Stuff” Filing Cabinet

When your short-term memory gets too long, you need to offload the crucial bits. Long-term memory is a database—a vector store, a SQL table, a CSV file, whatever—where you stash summaries, key facts, or entire conversations for later recall.

The magic trick isn’t just storage; it’s retrieval. You can’t dump 10,000 tokens back into the prompt. Instead, you use the user’s current query to search your memory for the most relevant snippets.

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Set up our vector database as a long-term memory store
vector_db = Chroma(embedding_function=OpenAIEmbeddings(), persist_directory="./memories")

def commit_to_long_term_memory(conversation_summary):
    """Takes a summary text and adds it to the vector database."""
    vector_db.add_texts([conversation_summary])

def retrieve_memories(query, k=3):
    """Searches long-term memory for texts relevant to the query."""
    return vector_db.similarity_search(query, k=k)

# After a long conversation, we summarize it and commit it
summary = "The user asked about France. We discussed that Paris is the capital with a population of ~2.1 million."
commit_to_long_term_memory(summary)

# Weeks later, a new query comes in: "Tell me about French culture."
relevant_memories = retrieve_memories("French culture")
# This will likely retrieve the summary about France/Paris,
# providing the agent with crucial context from the past.

The big gotcha? Summarization is lossy. You will drop details. The art is in writing a good summary prompt that captures the intent and crucial facts, not just the literal text. Also, picking the right number of retrieved snippets (k) is a balancing act between relevance and context clutter.

Episodic Memory: The “Story of Us”

This is where we get fancy. Episodic memory isn’t just facts; it’s the autobiographical record of the agent’s experiences—its interactions, its tool uses, its successes and glorious failures. It’s the agent’s personal story with the user.

Implementing this often means storing richly structured data. You’re not just saving a text summary; you’re saving the action trajectory.

# A more advanced memory object for an episodic store
episode = {
    "timestamp": "2024-05-20T14:32:00Z",
    "user_input": "Book me a flight to Paris and a hotel for next week.",
    "actions": [
        {
            "tool_used": "flight_booking_tool",
            "parameters": {"destination": "CDG", "dates": "2024-05-27 to 2024-05-30"},
            "result": "Success. Flight booked, confirmation code: AF1234"
        },
        {
            "tool_used": "hotel_booking_tool",
            "parameters": {"city": "Paris", "check-in": "2024-05-27"},
            "result": "Error: No availability for those dates."
        }
    ],
    "outcome": "Flight was booked successfully but hotel booking failed. User needs to be notified and alternative dates suggested."
}

# We'd commit this whole structured episode to a database
# Later, if the user says "What's the status of my trip?",
# we can retrieve this episode, understand the past actions,
# and give a coherent update instead of just saying "I don't know."

The questionable choice here, made by nearly every designer first tackling this, is the assumption that more structure is always better. It’s not. It’s more complex to query and maintain. The best practice is to start simple—maybe just storing successful tool executions—and add structure only when you really need to answer complex questions about past events.

The ultimate goal is to weave these memory types together. Short-term handles the immediate task, long-term provides crucial background, and episodic remembers how you and it specifically handled things in the past. Get it right, and the agent feels shockingly competent. Get it wrong, and well, you get amnesiac interns. Your call.