跳到内容

如何提取语义记忆

需要从对话中提取多个相关事实吗?以下是 LangMem 针对语义记忆的集合模式使用方法。对于用户资料等单文档模式,请参阅管理用户资料

无存储

提取语义记忆

API:create_memory_manager

from langmem import create_memory_manager # (1)!
from pydantic import BaseModel

class Triple(BaseModel): # (2)!
    """Store all new facts, preferences, and relationships as triples."""
    subject: str
    predicate: str
    object: str
    context: str | None = None

# Configure extraction
manager = create_memory_manager(  
    "anthropic:claude-3-5-sonnet-latest",
    schemas=[Triple], 
    instructions="Extract user preferences and any other useful information",
    enable_inserts=True,
    enable_deletes=True,
)
  1. LangMem 有两个相似的对象,用于提取和丰富记忆集合

    • create_memory_manager: (本示例) 您控制存储和更新
    • create_memory_store_manager: 直接处理为图上下文配置的任何 BaseStore 中的记忆搜索、插入更新和删除操作

    后者使用前者。两者都通过提示大型语言模型(LLM)使用并行工具调用来提取新记忆、更新旧记忆,以及(如果配置)删除旧记忆。

  2. 在此,我们自定义的“Triple”记忆模式塑造了记忆提取的方式。没有上下文,记忆在稍后检索时可能会变得模糊不清

    {"content": "User said yes"}  # No context; unhelpful
    
    添加上下文有助于 LLM 正确应用记忆
    {
        "subject": "user",
        "predicate": "response",
        "object": "yes",
        "context": "When asked about attending team meeting"
    }
    {
        "subject": "user",
        "predicate": "response",
        "object": "no",
        "context": "When asked if they were batman"
    }
    
    通常,最好对记忆进行模式化,以鼓励某些字段保持一致存储,或者至少包含指令,以便 LLM 保存的记忆在独立时也足够信息量。

在第一次短暂交互后,系统提取了一些语义三元组

# First conversation - extract triples
conversation1 = [
    {"role": "user", "content": "Alice manages the ML team and mentors Bob, who is also on the team."}
]
memories = manager.invoke({"messages": conversation1})
print("After first conversation:")
for m in memories:
    print(m)
# ExtractedMemory(id='f1bf258c-281b-4fda-b949-0c1930344d59', content=Triple(subject='Alice', predicate='manages', object='ML_team', context=None))
# ExtractedMemory(id='0214f151-b0c5-40c4-b621-db36b845956c', content=Triple(subject='Alice', predicate='mentors', object='Bob', context=None))
# ExtractedMemory(id='258dbf2d-e4ac-47ac-8ffe-35c70a3fe7fc', content=Triple(subject='Bob', predicate='is_member_of', object='ML_team', context=None))

第二次对话更新了一些现有记忆。由于我们启用了“删除”功能,管理器将返回 RemoveDoc 对象,以指示应删除该记忆,并将在其位置创建一个新记忆。由于这使用了核心“功能性”API(即,它不读取或写入数据库),您可以控制“删除”的含义,无论是软删除、硬删除,还是仅仅是降低记忆的权重。

# Second conversation - update and add triples
conversation2 = [
    {"role": "user", "content": "Bob now leads the ML team and the NLP project."}
]
update = manager.invoke({"messages": conversation2, "existing": memories})
print("After second conversation:")
for m in update:
    print(m)
# ExtractedMemory(id='65fd9b68-77a7-4ea7-ae55-66e1dd603046', content=RemoveDoc(json_doc_id='f1bf258c-281b-4fda-b949-0c1930344d59'))
# ExtractedMemory(id='7f8be100-5687-4410-b82a-fa1cc8d304c0', content=Triple(subject='Bob', predicate='leads', object='ML_team', context=None))
# ExtractedMemory(id='f4c09154-2557-4e68-8145-8ccd8afd6798', content=Triple(subject='Bob', predicate='leads', object='NLP_project', context=None))
# ExtractedMemory(id='f1bf258c-281b-4fda-b949-0c1930344d59', content=Triple(subject='Alice', predicate='manages', object='ML_team', context=None))
# ExtractedMemory(id='0214f151-b0c5-40c4-b621-db36b845956c', content=Triple(subject='Alice', predicate='mentors', object='Bob', context=None))
# ExtractedMemory(id='258dbf2d-e4ac-47ac-8ffe-35c70a3fe7fc', content=Triple(subject='Bob', predicate='is_member_of', object='ML_team', context=None))
existing = [m for m in update if isinstance(m.content, Triple)]

第三次对话覆盖了更多的记忆。

# Delete triples about an entity
conversation3 = [
    {"role": "user", "content": "Alice left the company."}
]
final = manager.invoke({"messages": conversation3, "existing": existing})
print("After third conversation:")
for m in final:
    print(m)
# ExtractedMemory(id='7ca76217-66a4-4041-ba3d-46a03ea58c1b', content=RemoveDoc(json_doc_id='f1bf258c-281b-4fda-b949-0c1930344d59'))
# ExtractedMemory(id='35b443c7-49e2-4007-8624-f1d6bcb6dc69', content=RemoveDoc(json_doc_id='0214f151-b0c5-40c4-b621-db36b845956c'))
# ExtractedMemory(id='65fd9b68-77a7-4ea7-ae55-66e1dd603046', content=RemoveDoc(json_doc_id='f1bf258c-281b-4fda-b949-0c1930344d59'))
# ExtractedMemory(id='7f8be100-5687-4410-b82a-fa1cc8d304c0', content=Triple(subject='Bob', predicate='leads', object='ML_team', context=None))
# ExtractedMemory(id='f4c09154-2557-4e68-8145-8ccd8afd6798', content=Triple(subject='Bob', predicate='leads', object='NLP_project', context=None))
# ExtractedMemory(id='f1bf258c-281b-4fda-b949-0c1930344d59', content=Triple(subject='Alice', predicate='manages', object='ML_team', context=None))
# ExtractedMemory(id='0214f151-b0c5-40c4-b621-db36b845956c', content=Triple(subject='Alice', predicate='mentors', object='Bob', context=None))
# ExtractedMemory(id='258dbf2d-e4ac-47ac-8ffe-35c70a3fe7fc', content=Triple(subject='Bob', predicate='is_member_of', object='ML_team', context=None))

有关语义记忆的更多信息,请参阅记忆类型

有存储

相同的提取可以由 LangGraph 的 BaseStore 自动管理

API: init_chat_model | entrypoint | create_memory_store_manager

from langchain.chat_models import init_chat_model
from langgraph.func import entrypoint
from langgraph.store.memory import InMemoryStore
from langmem import create_memory_store_manager

# Set up store and models
store = InMemoryStore(  # (1)!
    index={
        "dims": 1536,
        "embed": "openai:text-embedding-3-small",
    }
)
manager = create_memory_store_manager(
    "anthropic:claude-3-5-sonnet-latest",
    namespace=("chat", "{user_id}", "triples"),  # (2)!
    schemas=[Triple],
    instructions="Extract all user information and events as triples.",
    enable_inserts=True,
    enable_deletes=True,
)
my_llm = init_chat_model("anthropic:claude-3-5-sonnet-latest")
  1. 对于生产部署,请使用持久化存储,例如AsyncPostgresStoreInMemoryStore 在开发中运行良好,但在重启之间不持久化数据。

  2. 命名空间模式控制记忆隔离

    # User-specific memories
    ("chat", "user_123", "triples")
    
    # Team-shared knowledge
    ("chat", "team_x", "triples")
    
    # Global memories
    ("chat", "global", "triples")
    
    参阅存储系统获取命名空间设计模式

您也可以一次提取多种记忆类型

schemas=[Triple, Preference, Relationship]

每种类型都可以有自己的提取规则和存储模式 命名空间允许您按用户、团队或领域组织记忆

# User-specific memories
("chat", "user_123", "triples")

# Team-shared knowledge
("chat", "team_x", "triples")

# Domain-specific extraction
("chat", "user_123", "preferences")

{user_id} 占位符在运行时被替换

# Extract memories for User A
manager.invokse(
    {"messages": [{"role": "user", "content": "I prefer dark mode"}]},
    config={"configurable": {"user_id": "user-a"}}  # (1)!
)
  1. 使用 ("chat", "user-a", "triples")
# Define app with store context
@entrypoint(store=store) # (1)!
def app(messages: list):
    response = my_llm.invoke(
        [
            {
                "role": "system",
                "content": "You are a helpful assistant.",
            },
            *messages
        ]
    )

    # Extract and store triples (Uses store from @entrypoint context)
    manager.invoke({"messages": messages}) 
    return response
  1. @entrypoint 提供 BaseStore 上下文

    • 处理跨线程协调
    • 管理连接池 参阅BaseStore 指南获取生产设置

然后运行应用程序

# First conversation
app.invoke(
    [
        {
            "role": "user",
            "content": "Alice manages the ML team and mentors Bob, who is also on the team.",
        },
    ],
    config={"configurable": {"user_id": "user123"}},
)

# Second conversation
app.invoke(
    [
        {"role": "user", "content": "Bob now leads the ML team and the NLP project."},
    ],
    config={"configurable": {"user_id": "user123"}},
)

# Third conversation
app.invoke(
    [
        {"role": "user", "content": "Alice left the company."},
    ],
    config={"configurable": {"user_id": "user123"}},
)

# Check stored triples
for item in store.search(("chat", "user123")):
    print(item.namespace, item.value)

# Output:
# ('chat', 'user123', 'triples') {'kind': 'Triple', 'content': {'subject': 'Bob', 'predicate': 'is_member_of', 'object': 'ML_team', 'context': None}}
# ('chat', 'user123', 'triples') {'kind': 'Triple', 'content': {'subject': 'Bob', 'predicate': 'leads', 'object': 'ML_team', 'context': None}}
# ('chat', 'user123', 'triples') {'kind': 'Triple', 'content': {'subject': 'Bob', 'predicate': 'leads', 'object': 'NLP_project', 'context': None}}
# ('chat', 'user123', 'triples') {'kind': 'Triple', 'content': {'subject': 'Alice', 'predicate': 'employment_status', 'object': 'left_company', 'context': None}}

参阅存储系统获取命名空间模式。这种方法也与ReflectionExecutor兼容,以延迟和去重记忆处理。

使用记忆管理器代理

上述技术试图在一次 LLM 调用中管理记忆,包括插入、删除和删除操作。如果新信息量很大,这对于 LLM 进行多任务处理可能很复杂。或者,您可以创建一个代理,类似于快速入门中的代理,通过多次 LLM 调用提示它来管理记忆。您仍然可以将此代理与面向用户的代理分开,但这可以为您的 LLM 提供额外的时间来处理新信息和复杂的更新。

API: init_chat_model | entrypoint | create_react_agent | create_manage_memory_tool | create_search_memory_tool

from langchain.chat_models import init_chat_model
from langgraph.func import entrypoint
from langgraph.prebuilt import create_react_agent
from langgraph.store.memory import InMemoryStore

from langmem import create_manage_memory_tool, create_search_memory_tool

# Set up store and checkpointer
store = InMemoryStore(
    index={
        "dims": 1536,
        "embed": "openai:text-embedding-3-small",
    }
)
my_llm = init_chat_model("anthropic:claude-3-5-sonnet-latest")


def prompt(state):
    """Prepare messages with context from existing memories."""
    memories = store.search(
        ("memories",),
        query=state["messages"][-1].content,
    )
    system_msg = f"""You are a memory manager. Extract and manage all important knowledge, rules, and events using the provided tools.



Existing memories:
<memories>
{memories}
</memories>

Use the manage_memory tool to update and contextualize existing memories, create new ones, or delete old ones that are no longer valid.
You can also expand your search of existing memories to augment using the search tool."""
    return [{"role": "system", "content": system_msg}, *state["messages"]]


# Create the memory extraction agent
manager = create_react_agent(
    "anthropic:claude-3-5-sonnet-latest",
    prompt=prompt,
    tools=[
        # Agent can create/update/delete memories
        create_manage_memory_tool(namespace=("memories",)),
        create_search_memory_tool(namespace=("memories",)),
    ],
)


# Run extraction in background
@entrypoint(store=store)
def app(messages: list):
    response = my_llm.invoke(
        [
            {
                "role": "system",
                "content": "You are a helpful assistant.",
            },
            *messages,
        ]
    )

    # Extract and store triples (Uses store from @entrypoint context)
    manager.invoke({"messages": messages})
    return response


app.invoke(
    [
        {
            "role": "user",
            "content": "Alice manages the ML team and mentors Bob, who is also on the team.",
        }
    ]
)

print(store.search(("memories",)))

# [
#     Item(
#         namespace=["memories"],
#         key="5ca8dacc-7d46-40bb-9b3d-f4c2dc5c4b30",
#         value={"content": "Alice is the manager of the ML (Machine Learning) team"},
#         created_at="2025-02-11T00:28:01.688490+00:00",
#         updated_at="2025-02-11T00:28:01.688499+00:00",
#         score=None,
#     ),
#     Item(
#         namespace=["memories"],
#         key="586783fa-e501-4835-8651-028c2735f0d0",
#         value={"content": "Bob works on the ML team"},
#         created_at="2025-02-11T00:28:04.408826+00:00",
#         updated_at="2025-02-11T00:28:04.408841+00:00",
#         score=None,
#     ),
#     Item(
#         namespace=["memories"],
#         key="19f75f64-8787-4150-a439-22068b00118a",
#         value={"content": "Alice mentors Bob on the ML team"},
#         created_at="2025-02-11T00:28:06.951838+00:00",
#         updated_at="2025-02-11T00:28:06.951847+00:00",
#         score=None,
#     ),
# ]

这种方法也与ReflectionExecutor兼容,以延迟和去重记忆处理。

何时使用语义记忆

语义记忆帮助代理从对话中学习。它们提取并存储在未来交互中可能有用的有意义的信息。例如,在讨论项目时,代理可能会记住技术要求、团队结构或关键决策——任何能在以后提供有用上下文的信息。

目标是随着时间的推移建立理解,就像人类通过重复互动一样。并非所有信息都需要被记住——重点关注有助于代理在未来对话中更有帮助的信息。语义记忆在代理能够保存重要记忆及其之间的紧密关系时效果最佳,这样它以后不仅可以回忆“是什么”,还可以回忆“为什么”和“如何做”。

评论