如何使用摘要管理长上下文¶
在现代 LLM 应用程序中,上下文大小会迅速增长并达到提供商的限制,无论您是构建具有多轮对话的聊天机器人,还是构建具有大量工具调用的代理系统。
处理此问题的一种有效策略是,一旦早期消息达到一定阈值,就对其进行摘要。本指南演示了如何使用 LangMem 预构建的 summarize_messages
和 SummarizationNode
在 LangGraph 应用程序中实现此方法。
在简单聊天机器人中使用¶
下面是一个带有摘要功能的简单多轮聊天机器人示例
API:ChatOpenAI | StateGraph | START | summarize_messages | RunningSummary
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import summarize_messages, RunningSummary
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128) # (1)!
# We will keep track of our running summary in the graph state
class SummaryState(MessagesState):
summary: RunningSummary | None
# Define the node that will be calling the LLM
def call_model(state: SummaryState) -> SummaryState:
summarization_result = summarize_messages( # (2)!
state["messages"],
# IMPORTANT: Pass running summary, if any
running_summary=state.get("summary"), # (3)!
token_counter=model.get_num_tokens_from_messages,
model=summarization_model,
max_tokens=256, # (4)!
max_tokens_before_summary=256, # (5)!
max_summary_tokens=128
)
response = model.invoke(summarization_result.messages)
state_update = {"messages": [response]}
if summarization_result.running_summary: # (6)!
state_update["summary"] = summarization_result.running_summary
return state_update
checkpointer = InMemorySaver()
builder = StateGraph(SummaryState)
builder.add_node(call_model)
builder.add_edge(START, "call_model")
graph = builder.compile(checkpointer=checkpointer) # (7)!
# Invoke the graph
config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)
- 我们还为摘要模型设置了最大输出令牌。这应该与
summarize_messages
中的max_summary_tokens
匹配,以获得更好的令牌预算估计。 - 我们将在调用 LLM 之前尝试摘要消息。如果
state["messages"]
中的消息符合max_tokens_before_summary
预算,我们将直接返回这些消息。否则,我们将摘要并返回[summary_message] + remaining_messages
- 传入运行中的摘要(如果有)。这使得
summarize_messages
可以避免在每次对话回合中重复摘要相同的消息。 - 这是摘要后生成的最终消息列表的最大令牌预算。
- 这是摘要将启动的令牌阈值。默认为
max_tokens
。 - 如果我们生成了摘要,请将其添加为状态更新,并覆盖之前生成的摘要(如果有)。
- 重要的是要使用检查点编译图,否则图将无法记住以前的对话回合。
在 UI 中使用
一个重要的问题是如何在应用程序的 UI 中向用户呈现消息。我们建议渲染完整、未修改的消息历史记录。您可以选择额外渲染摘要和传递给 LLM 的消息。我们还建议为完整的消息历史记录(例如 "messages"
)和摘要结果(例如 "summary"
)使用单独的 LangGraph 状态键。在 SummarizationNode
中,摘要结果存储在名为 context
的单独状态键中(参见下面的示例)。
使用 SummarizationNode
¶
您还可以将摘要分离到一个专用节点中。让我们探讨如何修改上述示例以使用 SummarizationNode
来实现相同的结果
API:ChatOpenAI | StateGraph | START | SummarizationNode | RunningSummary
from typing import Any, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import SummarizationNode, RunningSummary
model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)
class State(MessagesState):
context: dict[str, Any] # (1)!
class LLMInputState(TypedDict): # (2)!
summarized_messages: list[AnyMessage]
context: dict[str, Any]
summarization_node = SummarizationNode( # (3)!
token_counter=model.get_num_tokens_from_messages,
model=summarization_model,
max_tokens=256,
max_tokens_before_summary=256,
max_summary_tokens=128,
)
# IMPORTANT: we're passing a private input state here to isolate the summarization
def call_model(state: LLMInputState): # (4)!
response = model.invoke(state["summarized_messages"])
return {"messages": [response]}
checkpointer = InMemorySaver()
builder = StateGraph(State)
builder.add_node(call_model)
builder.add_node("summarize", summarization_node)
builder.add_edge(START, "summarize")
builder.add_edge("summarize", "call_model")
graph = builder.compile(checkpointer=checkpointer)
# Invoke the graph
config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)
- 我们将在
context
字段中跟踪运行中的摘要(SummarizationNode
期望如此)。 - 定义仅用于过滤
call_model
节点输入的私有状态。 - SummarizationNode 在底层使用
summarize_messages
,并自动处理我们必须在上述示例中手动进行的现有摘要传播。 - 现在,模型调用节点只是一个单一的 LLM 调用。
在 ReAct 代理中使用¶
一个常见的用例是在工具调用代理中摘要消息历史记录。以下示例演示了如何在 ReAct 风格的 LangGraph 代理中实现此功能
API:ChatOpenAI | tool | StateGraph | START | END | ToolNode | SummarizationNode | RunningSummary
from typing import Any, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import SummarizationNode, RunningSummary
class State(MessagesState):
context: dict[str, Any]
def search(query: str):
"""Search the web."""
if "weather" in query.lower():
return "The weather is sunny in New York, with a high of 104 degrees."
elif "broadway" in query.lower():
return "Hamilton is always on!"
else:
raise "Not enough information"
tools = [search]
model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)
summarization_node = SummarizationNode(
token_counter=model.get_num_tokens_from_messages,
model=summarization_model,
max_tokens=256,
max_tokens_before_summary=1024,
max_summary_tokens=128,
)
class LLMInputState(TypedDict):
summarized_messages: list[AnyMessage]
context: dict[str, Any]
def call_model(state: LLMInputState):
response = model.bind_tools(tools).invoke(state["summarized_messages"])
return {"messages": [response]}
# Define a router that determines whether to execute tools or exit
def should_continue(state: MessagesState):
messages = state["messages"]
last_message = messages[-1]
if not last_message.tool_calls:
return END
else:
return "tools"
checkpointer = InMemorySaver()
builder = StateGraph(State)
builder.add_node("summarize_node", summarization_node)
builder.add_node("call_model", call_model)
builder.add_node("tools", ToolNode(tools))
builder.set_entry_point("summarize_node")
builder.add_edge("summarize_node", "call_model")
builder.add_conditional_edges("call_model", should_continue, path_map=["tools", END])
builder.add_edge("tools", "summarize_node") # (1)!
graph = builder.compile(checkpointer=checkpointer)
# Invoke the graph
config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, i am bob"}, config)
graph.invoke({"messages": "what's the weather in nyc this weekend"}, config)
graph.invoke({"messages": "what's new on broadway?"}, config)
- 在执行工具后,我们不再返回 LLM,而是首先返回到摘要节点。