LangGraph Tutorial: Building Stateful AI Agents with Persistent Memory

A technical deep-dive into LangGraph — how it differs from LangChain, how to model agent state as a graph, implement checkpointers, and build a production-grade stateful customer service agent.

LangGraph vs LangChain: What's Actually Different

LangChain is a linear orchestration layer. Its chains and agents follow a request-response model: input goes in, output comes out, and the process terminates. That model works for a surprising range of tasks — summarization, single-step retrieval, simple tool calls — but it breaks down the moment you need loops, branching based on intermediate state, or the ability to pause mid-execution and wait for human input. LangGraph was designed specifically for these patterns. It models agent execution as a directed graph (technically a directed acyclic graph extended with cycles) where nodes are Python functions that read and write to a shared state object, and edges define the transitions between them. A conditional edge can route to different nodes based on what's currently in state — making the agent's next action a function of its accumulated context rather than a fixed script. The architectural implication is significant: LangGraph agents are first-class stateful processes, not request handlers. This means you can checkpoint them, resume them after interruption, and inspect their state at any point in execution. For anything that requires multi-turn reasoning, error recovery loops, or human approval steps, LangGraph is the right abstraction — not an extension of plain LangChain chains.

The State Graph Model: Nodes, Edges, and State Schema

Every LangGraph application is built around three concepts. The state schema is a TypedDict (or Pydantic model) that defines all the information the graph can hold at any point in time. Think of it as the agent's working memory — it persists across node executions and is the only way nodes communicate with each other. A typical customer service agent state might include: messages (the conversation history), current_intent (what the user is trying to accomplish), resolved_entities (extracted from the conversation), tool_results (outputs from the last tool call), and escalation_flag (whether a human should be looped in). Nodes are Python functions with the signature `(state: State) -> dict` — they receive the current state and return a partial update to it. LangGraph merges these updates using a reducer function you define per field. The default reducer is replacement, but for lists like messages, you typically use an append reducer so each node adds to the history rather than replacing it. Edges define when control moves from one node to another. Static edges always go to the same next node; conditional edges call a function on the current state and return the name of the next node to execute. This lets you implement retry loops (go back to the tool-calling node if the tool returned an error), early exits (go directly to the response node if intent is clear), and human-in-the-loop gates (pause execution if escalation_flag is True). Use the Framework Radar to see how LangGraph's graph model compares to AutoGen's conversation primitives and CrewAI's process model.

Checkpointers: Persistent State Across Sessions

A checkpointer is what transforms LangGraph from an in-memory state machine into a persistent, resumable process. Every time a node runs and updates state, the checkpointer serializes the new state and writes it to a backing store keyed by a thread_id. On the next invocation with the same thread_id, LangGraph restores state from the checkpointer before executing the next node — giving you genuine conversational continuity across HTTP requests, process restarts, or even days between interactions. LangGraph ships with several checkpointer backends out of the box. MemorySaver is in-process, useful for development and testing but wiped on restart. SqliteSaver writes to a SQLite file — adequate for single-instance deployments. PostgresSaver is the production-grade option, using a PostgreSQL table to store serialized state with thread_id and checkpoint_id as the composite key. Implementing a checkpointer is three lines: `from langgraph.checkpoint.postgres import PostgresSaver`, `checkpointer = PostgresSaver.from_conn_string(DATABASE_URL)`, then pass it as `checkpointer=checkpointer` to `StateGraph.compile()`. From that point, every invocation with `config={'configurable': {'thread_id': user_id}}` automatically reads and writes to the database. The practical impact: a user who starts a support ticket on Monday and returns Thursday picks up exactly where they left off, with full context of previous tool calls and decisions preserved. This is categorically different from stuffing conversation history into a prompt — the agent's internal reasoning state, not just the message transcript, is what persists.

Human-in-the-Loop: Interrupt and Resume Patterns

Human-in-the-loop (HITL) is one of LangGraph's strongest differentiators. Most agent frameworks treat human input as just another turn in the conversation. LangGraph treats it as a first-class execution primitive: you can interrupt graph execution at a defined node, serialize the current state, return control to the calling application, wait for human input asynchronously, and then resume execution from the interruption point with the human's response merged into state. The implementation uses `interrupt_before` or `interrupt_after` at compile time: `graph.compile(interrupt_before=['send_response'], checkpointer=checkpointer)`. When execution reaches the 'send_response' node, LangGraph pauses, writes a checkpoint with a special 'interrupted' status, and raises an `Interrupt` exception. Your application layer catches this, surfaces the pending state to the human reviewer (e.g., in a web UI), and waits. When the reviewer approves or modifies the draft, your code calls `graph.invoke(Command(resume=human_input), config=config)` — LangGraph restores the checkpoint, injects the human's response into state, and continues from where it paused. This pattern is essential for high-stakes agent actions: before an agent sends a customer email, submits an order, or modifies a production database record, a human can review and approve. The agent's reasoning is preserved; the human just controls the gate. Compare this to other frameworks where 'human in the loop' typically means inserting a human-turn node into a fixed conversation flow — a much less flexible model.

Building a Stateful Customer Service Agent: Architecture

A production customer service agent using LangGraph has a predictable structure. The state schema tracks: messages, intent (classified or None), entities (dict of extracted fields), tool_results, draft_response, and escalation_flag. The graph has five nodes. classify_intent reads the latest user message and writes a classified intent string to state ('track_order', 'request_refund', 'technical_support', 'other'). A conditional edge routes to a specialized node based on the classified intent. The order_tools node, refund_tools node, and support_tools node each invoke the relevant API tools (order management system, refund API, knowledge base search) and write results to tool_results. The draft_response node takes the full state — intent, entities, tool results — and generates a draft customer response using an LLM call with a tightly scoped system prompt. The escalation_check conditional edge examines confidence scores and escalation_flag; if escalation is needed, it routes to a human_review interrupt node rather than directly sending. The send_response node finalizes and dispatches the message. The key insight is that each node is narrow in responsibility. The classify_intent node doesn't know about tools; the order_tools node doesn't know about response generation. State is the only shared medium. This makes individual nodes easy to test in isolation and easy to replace — swap out the order_tools node for a different API integration without touching any other logic.

Code Patterns: What LangGraph Actually Looks Like

The graph definition is declarative and readable once you understand the primitives. State schema: `class AgentState(TypedDict): messages: Annotated[list, add_messages]; intent: str | None; tool_results: dict; escalation_flag: bool`. Node definition: `def classify_intent(state: AgentState) -> dict: result = llm.invoke([system_prompt] + state['messages']); return {'intent': result.content}`. Conditional routing: `def route_by_intent(state: AgentState) -> str: mapping = {'track_order': 'order_tools', 'request_refund': 'refund_tools'}; return mapping.get(state['intent'], 'draft_response')`. Graph assembly: `graph = StateGraph(AgentState); graph.add_node('classify_intent', classify_intent); graph.add_conditional_edges('classify_intent', route_by_intent); graph.add_node('order_tools', order_tools_node); graph.add_edge('order_tools', 'draft_response'); graph.set_entry_point('classify_intent')`. Compiled with checkpointer: `app = graph.compile(checkpointer=PostgresSaver.from_conn_string(DB_URL), interrupt_before=['send_response'])`. Invocation: `result = app.invoke({'messages': [HumanMessage(content=user_input)]}, config={'configurable': {'thread_id': session_id}})`. Each subsequent invocation with the same thread_id continues the existing conversation, with all prior state intact. The entire agent, including its decision history, is inspectable: `app.get_state(config)` returns the full current state snapshot, and `app.get_state_history(config)` returns every checkpoint ever written — a complete audit trail.

When to Choose LangGraph Over Simple ReAct Agents

ReAct (Reasoning + Acting) agents are simpler to implement and appropriate for a large class of tasks: single-session tool use, research synthesis, code generation with verification. If your agent takes an input, calls some tools, and returns a final answer in one shot, a standard LangGraph ReAct agent or even a plain tool-calling LLM call is the right choice — LangGraph's graph overhead adds complexity without value. Choose LangGraph's full graph model when your requirements include: multi-session continuity (the agent must remember what happened in previous conversations), long-running workflows (execution spans hours or days, not seconds), human approval gates (specific actions require human review before proceeding), error recovery loops (the agent should detect tool failures and retry with modified parameters, not just fail), parallel subgraph execution (independent sub-tasks that can run concurrently), or audit trail requirements (every decision and state transition must be logged and replayable). The AI Readiness Assessment on AgentList helps you identify which of these patterns apply to your use case before you commit to a framework. In general, the heuristic is: if you can draw your agent's behavior as a simple flowchart with no loops, use a simpler framework. If the flowchart has cycles, branches, or external pause points, LangGraph is the right choice.

Production Considerations and Failure Modes

LangGraph in production introduces a few failure modes that simpler agent patterns don't have. Checkpoint storage grows unboundedly: every graph execution writes state to your checkpointer database. Without a retention policy, a busy production system can generate gigabytes of checkpoint data per week. Implement a cleanup job that deletes checkpoints older than your business needs (30 days is typical). Infinite loop protection: cyclic graphs can loop forever if the exit condition is never met — an LLM that repeatedly decides to retry a failing tool call with no backoff or maximum attempt counter. Always include a loop counter in state and a conditional edge that breaks out after N iterations. State schema migrations: when you add or rename fields in your state schema, existing checkpoints have the old schema shape. LangGraph doesn't automatically migrate checkpoint data. Design your state schema conservatively and use Optional fields with defaults to avoid breaking existing in-flight conversations when deploying schema changes. Thread ID collision: if your thread ID generation is not sufficiently unique, different users can end up sharing state — a serious data isolation bug. Use UUIDs derived from authenticated user IDs plus session context, never user-controlled strings. For benchmarking LangGraph agent performance against comparable frameworks, the Benchmarks section on AgentList provides task-completion metrics across customer service, research, and data processing workloads.

Related Resources

Find agencies that specialize in the frameworks and use cases covered in this article.

LangGraph Stack Profile →LangChain Stack Profile →Framework Radar →Agent Benchmarks →

Explore the Directory

Find the right AI agent agency for your project.