延迟后台记忆处理¶
当对话活跃时,代理可能会快速连续收到多条消息。与其立即处理每条消息以进行长期记忆管理,不如等待对话活动稳定下来。本指南展示了如何使用 ReflectionExecutor
来去抖内存处理。
问题¶
对每条消息进行记忆处理有以下缺点: - 消息快速连续到达时产生冗余工作 - 对话中途处理时上下文不完整 - 不必要的令牌消耗
ReflectionExecutor
延迟记忆处理并取消冗余工作
from langchain.chat_models import init_chat_model
from langgraph.func import entrypoint
from langgraph.store.memory import InMemoryStore
from langmem import ReflectionExecutor, create_memory_store_manager
# Create memory manager to extract memories from conversations (1)
memory_manager = create_memory_store_manager(
"anthropic:claude-3-5-sonnet-latest",
namespace=("memories",),
)
# Wrap memory_manager to handle deferred background processing (2)
executor = ReflectionExecutor(memory_manager)
store = InMemoryStore(
index={
"dims": 1536,
"embed": "openai:text-embedding-3-small",
}
)
@entrypoint(store=store)
def chat(message: str):
response = llm.invoke(message)
# Format conversation for memory processing
# Must follow OpenAI's message format
to_process = {"messages": [{"role": "user", "content": message}] + [response]}
# Wait 30 minutes before processing
# If new messages arrive before then:
# 1. Cancel pending processing task
# 2. Reschedule with new messages included
delay = 0.5 # In practice would choose longer (30-60 min)
# depending on app context.
executor.submit(to_process, after_seconds=delay)
return response.content
-
的
create_memory_store_manager
创建一个 Runnable,用于从对话中提取记忆。它处理 OpenAI 格式的消息 -
ReflectionExecutor
处理后台记忆。对于每个对话线程- 维护一个待处理记忆任务队列
- 当新消息到达时取消旧任务
- 仅在指定延迟后处理
这种去抖确保您处理的是完整的对话上下文,而非片段。