算法竞赛编程¶
在本教程中,您将构建一个计算奥林匹克竞赛 Agent,它利用三种互补技术来提高性能:反思、检索和人机协作。这些技术和数据都改编自 Quan Shi、Michael Tang、Karthik Narasimhan 和 Shunyu Yao 的论文《语言模型能解决奥林匹克编程问题吗?》。您可以在以下链接查看他们的论文
您将构建一个能够回答难度递增的编程问题的 Agent 图。
- 反思:在第 1 部分中,您将创建一个零样本工具调用 Agent,并提示它反思测试用例结果,以纠正其初始错误。这类似于论文中报告的在 USACO 基准测试中通过率为 12.38 的 Agent。
- 检索:在第 2 部分中,您将实施一个初始检索步骤,作为 Agent 的“情景记忆”,从我们的编程问题语料库中检索高质量的少样本示例,以帮助解决青铜级别的问题。此 Agent 类似于论文中基准测试为 20.2 的 Agent。
- 人机协作:在第 3 部分中,您将使用
interrupt_after
让用户与 Agent 协同工作,以获得更好的答案。那么,基准性能仅受与其配对的人类竞争力的限制。
您的最终 Agent 图将结构化为下图所示
第 1 部分和第 2 部分类似于论文中基准测试通过率分别为 12.38 和 20.2 的系统。
虽然 LLM 尚未能够自主解决所有这些问题,但我们可以设计一个系统,使其在回答这些问题时远远超过基本 ReAct Agent 的能力。
在深入研究之前,让我们设置好我们的机器。这将涉及安装依赖项、获取数据集以及定义实用程序函数。
设置¶
在本教程中,我们将需要安装一些依赖项,获取奥林匹克竞赛数据集,并定义一个实用程序函数,以帮助运行候选解决方案,以查看它们是否通过了测试用例。
首先,让我们安装所需的软件包并设置我们的 API 密钥
%%capture --no-stderr
%pip install -U langgraph langsmith langchain_anthropic datasets langchain langchainhub
import getpass
import os
def _get_env(var: str):
if not os.environ.get(var):
os.environ[var] = getpass.getpass(f"{var}: ")
_get_env("ANTHROPIC_API_KEY")
设置 LangSmith 以进行 LangGraph 开发
注册 LangSmith 以快速发现问题并提高 LangGraph 项目的性能。LangSmith 允许您使用跟踪数据来调试、测试和监控使用 LangGraph 构建的 LLM 应用程序 — 阅读更多关于如何开始使用 此处 的信息。
数据¶
使用以下工具获取 USACO 基准数据
import os
import zipfile
import datasets
import requests
usaco_url = "https://storage.googleapis.com/benchmarks-artifacts/usaco/usaco_sampled_with_tests.zip"
zip_path = "usaco.zip"
extract_path = "usaco_datasets"
response = requests.get(usaco_url)
with open(zip_path, "wb") as file:
file.write(response.content)
with zipfile.ZipFile(zip_path, "r") as zip_ref:
zip_ref.extractall(extract_path)
os.remove(zip_path)
ds = datasets.load_from_disk(os.path.join(extract_path, "usaco_v3_sampled_with_tests"))
测试评估工具¶
我们还需要一种评估生成的代码的方法。我们将使用这个不安全的代码执行程序来针对我们的测试用例运行生成的代码。注意: 以下代码将在您的本地机器上运行任意代码!请谨慎操作。
import multiprocessing
import queue
import subprocess
import sys
import time
import traceback
multiprocessing.set_start_method("fork", force=True)
# WARNING
# This program exists to execute untrusted model-generated code. Although
# it is highly unlikely that model-generated code will do something overtly
# malicious in response to this test suite, model-generated code may act
# destructively due to a lack of model capability or alignment.
# Users are strongly encouraged to sandbox this evaluation suite so that it
# does not perform destructive actions on their host or network.
# Proceed at your own risk:
def exec_program(q, program, input_data, expected_output, timeout):
try:
start_time = time.time()
process = subprocess.Popen(
[sys.executable, "-c", program],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
stdout, stderr = process.communicate(input=input_data, timeout=timeout)
if time.time() - start_time > timeout:
raise TimeoutError("Execution timed out.")
if process.returncode != 0:
q.put(f"failed: {stderr}")
else:
if stdout.strip() == expected_output.strip():
q.put("passed")
else:
q.put(f"wrong answer. Expected '{expected_output}', got '{stdout}'")
except subprocess.TimeoutExpired:
process.kill()
q.put("timed out")
except Exception:
q.put(f"failed: {traceback.format_exc()}")
def check_correctness(
program: str, input_data: str, expected_output: str, timeout: float
) -> str:
q = multiprocessing.Queue()
process = multiprocessing.Process(
target=exec_program, args=(q, program, input_data, expected_output, timeout)
)
process.start()
process.join(timeout=timeout + 1)
if process.is_alive():
process.terminate()
process.join()
result = "timed out"
else:
try:
result = q.get_nowait()
except queue.Empty:
result = "no result returned"
return result
让我们检查一个示例程序和输出,看看它是如何工作的
program_code = "print('hello, world!')"
input_data = ""
expected_output = "hello, world!"
timeout = 2
test_result = check_correctness(program_code, input_data, expected_output, timeout)
print("Example 1: ", test_result)
test_result = check_correctness("print('goodbye')", input_data, "hi there", timeout)
print("Example 2: ", test_result)
第 1 部分:零样本反射¶
在我们的第一部分中,我们将构建一个简单的零样本工具调用 Agent,以尝试解决这些问题。我们将通过添加“推理”字段,在 Agent 的工具调用模式中直接结合一种简单的反思形式。此外,Claude 经过训练,可以在调用任何工具之前使用自由文本进行“推理”。总而言之,这应该会诱导反思性的“思维链”提示。
注意:这在某种程度上偏离了论文的实现,论文的实现使用了显式的反思步骤,并使用了 Reflexion 提示的变体。
在本节结束时,我们将构建一个反思性的零样本编程 Agent,它看起来像下图系统图中标记为“第 1 部分”的部分
状态¶
LangGraph 的主要原语是 StateGraph
,您可以使用它将 Agent 定义为可控的状态机。该图具有执行工作的 node
(python 函数)和定义如何在节点之间路由的 edge
。State
定义了每个节点之间的接口,并携带您的 Agent 需要的所有信息。
下面,为我们的编程奥林匹克竞赛 Agent 定义一个 State
。messages
将跟踪提交序列(和测试用例反馈)作为聊天记录。如果提交通过所有测试用例,则 status
字段将从 in_progress
翻转为 success
。其他字段(test_cases、runtime_limit)由 evaluation
节点用于测试 Agent 的提交。这些值 Agent 本身看不到。
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import AnyMessage, add_messages
class TestCase(TypedDict):
inputs: str
outputs: str
class State(TypedDict):
# Append-only chat memory so the agent can try to recover from initial mistakes.
messages: Annotated[list[AnyMessage], add_messages]
# From the dataset. These are used for testing.
test_cases: list[TestCase]
runtime_limit: int
status: str
API 参考:add_messages
现在,将数据集转换为我们的图将接受的输入。
input_states = [
{
"messages": [("user", row["description"])],
"test_cases": row["test_cases"],
"runtime_limit": row["runtime_limit"],
"status": "in_progress",
"problem_level": row["problem_level"],
}
for row in ds
]
节点 1:解题器¶
创建一个 solver
节点,该节点提示 LLM “Agent” 使用 writePython 工具 生成提交的代码。
将 Pydantic 与 LangChain 一起使用
本笔记本使用 Pydantic v2 BaseModel
,它需要 langchain-core >= 0.3
。使用 langchain-core < 0.3
将导致因混合使用 Pydantic v1 和 v2 BaseModel
而产生的错误。
from langchain_core.language_models import BaseChatModel
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
class writePython(BaseModel):
"""Write python code that resolves the problem."""
reasoning: str = Field(..., description="Conceptual solution.")
pseudocode: str = Field(..., description="Detailed English pseudocode.")
code: str = Field(..., description="Valid Python 3 solution to the problem")
class Solver:
def __init__(self, llm: BaseChatModel, prompt: ChatPromptTemplate):
self.runnable = prompt | llm.bind_tools([writePython])
def __call__(self, state: State) -> dict:
# Our agent only can see the "messages" and will ignore the test info
return {"messages": [self.runnable.invoke({"messages": state["messages"]})]}
API 参考:BaseChatModel | ChatPromptTemplate
现在,在下面创建解题器。我们将使用 Claude Opus
from langchain import hub
from langchain_anthropic import ChatAnthropic
# For this section, we are testing zero-shot performance and won't have
# any examples. Partial them out to pre-fill the template.
prompt = hub.pull("wfh/usaco-draft-solver").partial(examples="")
print("*" * 35 + "Prompt" + "*" * 35)
prompt.pretty_print()
# Use Haiku if you want to save $$ while (almost) never correctly answering the question
# llm = ChatAnthropic(model="claude-3-haiku-20240307")
llm = ChatAnthropic(model="claude-3-opus-20240229")
solver = Solver(llm, prompt)
API 参考:ChatAnthropic
***********************************Prompt***********************************
================================[1m System Message [0m================================
You are a world-class competitive programmer.
Please reply with a Python 3 solution to the problem below.
First, reason through the problem and conceptualize a solution.
Then write detailed pseudocode to uncover any potential logical errors or omissions.
Finally output the working Python code for your solution, ensuring to fix any errors uncovered while writing pseudocode.
No outside libraries are allowed.[33;1m[1;3m{examples}[0m
=============================[1m Messages Placeholder [0m=============================
[33;1m[1;3m{messages}[0m
print("*" * 34 + " Example " + "*" * 34)
result = solver(
{
"messages": [
(
"user",
"How do I get a perfectly random sample from an infinite stream",
)
]
}
)
result["messages"][0].pretty_print()
# Could expand to include (1)
# 1. Restate the problem in plain English
# 2. Closely following the explanation, restate and explain the solution in plain English
# 3. Write a pseudocode solution
# 4. Output the final Python solution with your solution steps in comments.
********************************** Example **********************************
==================================[1m Ai Message [0m==================================
[{'text': "<thinking>\nTo address this problem, we need to use the writePython function, which requires the following parameters:\n- reasoning: a conceptual solution to the problem\n- pseudocode: detailed pseudocode for the solution\n- code: working Python code implementing the solution\n\nThe key aspects to address in the solution are:\n1. We have an infinite stream, so we can't store all elements. Need an online algorithm.\n2. Need to ensure each element has an equal probability of being in the final sample.\n\nI believe I have enough information to provide values for all the required parameters.\n</thinking>", 'type': 'text'}, {'id': 'toolu_01UqpLYyueky5GtYMidS9oLF', 'input': {'reasoning': 'To get a perfectly random sample of size k from an infinite stream:\n\n1. Store the first k elements in an array (reservoir). \n2. For each ith element after the kth element (i > k):\n - Generate a random integer j between 0 and i (inclusive)\n - If j < k, replace the jth element of the reservoir with the ith element\n3. At the end, the reservoir contains the random sample.\n\nThis works because for any element, when we process the nth element, the probability that it is in the reservoir is:\n- k/n when n <= k (first k elements always selected)\n- k/n * k/(n-1) * k/(n-2) * ... * k/(k+1) = k/n when n > k\n\nSo any element has k/n probability of being in final reservoir, giving a perfectly random sample.', 'pseudocode': '\`\`\`\nfunction selectKItems(stream, k):\n reservoir = [0..k-1] # store first k elements\n\n i = k\n while stream has next item:\n item = stream.next()\n j = random(0, i) # generate random index between 0 and i\n if j < k:\n reservoir[j] = item # replace element at random index with new item\n i += 1\n\n return reservoir\n\`\`\`', 'code': 'import random\n\ndef reservoir_sampling(stream, k):\n reservoir = []\n \n # Store first k elements in reservoir\n for i in range(k):\n reservoir.append(next(stream))\n\n i = k\n for item in stream:\n # Generate random index between 0 and i\n j = random.randint(0, i) \n \n # Replace element at random index with new item\n if j < k:\n reservoir[j] = item\n i += 1\n\n return reservoir'}, 'name': 'writePython', 'type': 'tool_use'}]
节点 2:评估¶
现在定义“evaluate
”节点。此节点获取 solver
提交的代码,并针对我们 State
中的 test_cases
执行它。这使用了我们在上面设置中定义的不安全 check_correctness
实用程序。
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
# This is the node we will add to the graph.
# Most tool-calling APIs require that the `ToolMessage` contain the ID
# of the
def format_tool_message(response: str, ai_message: AIMessage):
return ToolMessage(
content=response + "\nMake all fixes using the writePython tool.",
tool_call_id=ai_message.tool_calls[0]["id"],
)
def evaluate(state: State):
test_cases = state["test_cases"]
ai_message: AIMessage = state["messages"][-1]
if not ai_message.tool_calls:
return {
"messages": [
HumanMessage(
content="No code submitted. Please try again using the correct python code."
)
]
}
try:
code = ai_message.tool_calls[0]["args"]["code"]
except Exception as e:
return {"messages": [format_tool_message(repr(e), ai_message)]}
num_test_cases = len(test_cases)
succeeded = 0
test_results = []
# TODO: Multiprocess
for test_case in test_cases:
input_data = test_case["inputs"]
expected_output = test_case["outputs"]
test_result = check_correctness(code, input_data, expected_output, timeout)
test_results.append(test_result)
if test_result == "passed":
succeeded += 1
pass_rate = succeeded / num_test_cases if num_test_cases else "N/A"
if pass_rate == 1:
return {"status": "success"}
responses = "\n".join(
[f"<test id={i}>\n{r}\n</test>" for i, r in enumerate(test_results)]
)
response = f"Incorrect submission. Please respond with updated code.\nPass rate: {succeeded}/{num_test_cases}\nResults:\n{responses}"
formatted_message = format_tool_message(response, ai_message)
return {"messages": [formatted_message]}
API 参考:AIMessage | HumanMessage | ToolMessage
创建图¶
现在,将它们放在一起!一旦您定义了每个节点,定义连接性/状态转换就相当容易了。
我们的零样本图定义了一个循环。如果我们可视化数据流,我们希望逻辑为:1. 首先转到 solver
,它尝试第一个解决方案。2. 接下来转到 evaluate
节点,它测试解决方案。3. 如果解决方案通过,则结束,否则,返回到 solver
再次尝试。
在 LangGraph 中,我们使用 conditional_edges
来定义包含条件逻辑的状态转换。下面,定义图,添加一个 control_edge
来处理上面的步骤 (3)。
from langgraph.graph import END, StateGraph, START
builder = StateGraph(State)
builder.add_node("solver", solver)
builder.add_edge(START, "solver")
builder.add_node("evaluate", evaluate)
builder.add_edge("solver", "evaluate")
def control_edge(state: State):
if state.get("status") == "success":
return END
return "solver"
builder.add_conditional_edges("evaluate", control_edge, {END: END, "solver": "solver"})
graph = builder.compile()
API 参考:END | StateGraph | START
from IPython.display import Image, display
try:
display(Image(graph.get_graph().draw_mermaid_png()))
except Exception:
# This requires some extra dependencies and is optional
pass
现在我们已经创建了我们的图,让我们看看它必须解决的问题类型。
input_state = input_states[0].copy()
# We will reduce the test cases to speed this notebook up
input_state["test_cases"] = input_state["test_cases"][:3]
print(input_state["messages"][0][1])
Farmer John has $N$ ($1 \leq N \leq 2 \cdot 10^5$) farms, numbered from $1$ to
$N$. It is known that FJ closes farm $i$ at time $c_i$. Bessie wakes up at time
$S$, and wants to maximize the productivity of her day by visiting as many farms
as possible before they close. She plans to visit farm $i$ on time $t_i + S$.
Bessie must arrive at a farm strictly before Farmer John closes it to actually visit it.
Bessie has $Q$ $(1 \leq Q \leq 2 \cdot 10^5)$ queries. For each query, she gives
you two integers $S$ and $V$. For each query, output whether Bessie can visit at
least $V$ farms if she wakes up at time $S$.
INPUT FORMAT (input arrives from the terminal / stdin):
The first line consists of $N$ and $Q$.
The second line consists of $c_1, c_2, c_3 \dots c_N$ ($1 \leq c_i \leq 10^6$).
The third line consists of $t_1, t_2, t_3 \dots t_N$ ($1 \leq t_i \leq 10^6$).
The next $Q$ lines each consist of two integers $V$ ($1 \leq V \leq N$) and $S$
($1 \leq S \leq 10^6$).
OUTPUT FORMAT (print output to the terminal / stdout):
For each of the $Q$ queries, output YES or NO on a new line.
SAMPLE INPUT:
5 5
3 5 7 9 12
4 2 3 3 8
1 5
1 6
3 3
4 2
5 1
SAMPLE OUTPUT:
YES
NO
YES
YES
NO
For the first query, Bessie will visit the farms at time $t = [9, 7, 8, 8, 13]$,
so she will only get to visit farm $4$ on time before FJ closes the farm.
For the second query, Bessie will not be able to visit any of the farms on time.
For the third query, Bessie will visit farms $3, 4, 5$ on time.
For the fourth and fifth queries, Bessie will be able to visit all but the first
farm on time.
SCORING:
Inputs 2-4: $N,Q\le 10^3$Inputs 5-9: $c_i, t_i \le 20$Inputs 10-17: No additional constraints.
Problem credits: Chongtian Ma
hide_inputs
”并过滤掉 test_cases。所有这些都是可选的,但对于开发很有用。
注意: 我们期望此处出现 GraphRecursionError,因为它无法在分配的步骤数内正确回答问题。
from langchain_core.tracers.context import tracing_v2_enabled
from langsmith import Client
# We don't need to include all the test cases in our traces.
def _hide_test_cases(inputs):
copied = inputs.copy()
# These are tens of MB in size. No need to send them up
copied["test_cases"] = "..."
return copied
client = Client(hide_inputs=_hide_test_cases, hide_outputs=_hide_test_cases)
with tracing_v2_enabled(client=client):
events = graph.stream(input_state)
for event in events:
for value in event.values():
messages = value.get("messages")
if messages:
if isinstance(messages, list):
messages = value["messages"][-1]
print(
"Assistant:",
str(messages.content).replace("\n", "\\n")[:50],
)
API 参考:tracing_v2_enabled
Assistant: [{'text': '<thinking>\nThe key steps to solve this
Assistant: KeyError('code')\nMake all fixes using the writePy
Assistant: [{'id': 'toolu_01KimhKt8aqQjGZJmrHVnAtE', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01CMZTqAd7BZQ2nSgtk9djRW', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01Kbaq9gX4BnHvps6TMfVGHL', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01MiSnpiGK5Yy4Cpp6GGbjmT', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01GWuvJezXLMVurUBG84odDP', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01W8DGmhcpFVctySmx58scf9', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_018bhYtCKDK6S4MHiAxUZCrb', 'input':
Assistant: KeyError('code')\nMake all fixes using the writePy
Assistant: [{'id': 'toolu_01LCwaCjX9uZBV3jt9eAkmAa', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01WqJvdE2WDeTZXoKp2V7PWb', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_01DGevkunt9zWx7SVDCHdBuv', 'input':
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'id': 'toolu_013comYKVxNSzTM4ZbH3L3FP', 'input':
Assistant: Incorrect submission. Please respond with updated
---------------------------------------------------------------------------
``````output
GraphRecursionError Traceback (most recent call last)
``````output
Cell In[25], line 17
15 with tracing_v2_enabled(client=client):
16 events = graph.stream(input_state)
---> 17 for event in events:
18 for value in event.values():
19 messages = value.get("messages")
``````output
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/langgraph/pregel/__init__.py:645, in Pregel.stream(self, input, config, stream_mode, output_keys, input_keys, interrupt_before_nodes, interrupt_after_nodes, debug)
643 break
644 elif step == config["recursion_limit"]:
--> 645 raise GraphRecursionError(
646 f"Recursion limit of {config['recursion_limit']} reached"
647 "without hitting a stop condition. You can increase the "
648 "limit by setting the `recursion_limit` config key."
649 )
651 # before execution, check if we should interrupt
652 if _should_interrupt(
653 checkpoint,
654 interrupt_before_nodes,
655 self.stream_channels_list,
656 next_tasks,
657 ):
``````output
GraphRecursionError: Recursion limit of 25 reachedwithout hitting a stop condition. You can increase the limit by setting the `recursion_limit` config key.
它没有及时解决问题,但这没关系!如果这很容易,这篇论文就会短很多 :)
您可以在提供的链接处查看 Agent 的完整 LangSmith 跟踪。
在下一节中,我们将添加论文中称为“情景记忆”的改进,在这种情况下,它实际上是少样本检索。
第 2 部分:少样本检索¶
即使使用反思性工具调用,我们第 1 部分中的基线 Agent 仍然难以应对这项困难的任务。 “教” LLM 如何更好地执行任务的一种方法是通过演示,也称为“少样本示例”。
USACO 论文的作者所称的“情景记忆”实际上只是在类似示例上进行的少样本提示。
在这种情况下,每个示例都是数据集中的不同问题 + 解决方案。“情景记忆”这个术语是有道理的,如果您假装您的 Agent 已经“解决”了这些问题,并且正在回忆其解决方案。
本节添加了下图“第 2 部分”中的“情景记忆”组件。
请注意,此记忆步骤执行一次,在第 1 部分的零样本循环逻辑之前。步骤如下
- 提示 LLM 生成候选解决方案。
- 使用候选解决方案的文本来检索 N 个最相似的(问题、解决方案)对。
- 将此结果格式化为零样本 Agent 的提示。
下面,让我们将我们的情景记忆实现为检索器。我们将遵循论文的检索器选择并使用 BM25。
状态¶
状态主要从第 1 部分回收利用。添加额外的“candidate”和“examples”字段以存储记忆步骤的信息。
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import AnyMessage, add_messages
class TestCase(TypedDict):
inputs: str
outputs: str
class State(TypedDict):
# NEW! Candidate for retrieval + formatted fetched examples as "memory"
candidate: AIMessage
examples: str
# Repeated from Part 1
messages: Annotated[list[AnyMessage], add_messages]
test_cases: list[TestCase]
runtime_limit: int
status: str
API 参考:add_messages
节点 1 和 3:草稿 & 解题器¶
让我们创建我们的“Agent”。我们将修改第 1 部分中的 Solver
,以将其重用于 Agent 节点和候选程序生成节点(“draft”)。
from langchain import hub
from langchain_anthropic import ChatAnthropic
class Solver:
def __init__(self, llm: BaseChatModel, prompt: ChatPromptTemplate):
self.runnable = prompt | llm.bind_tools([writePython])
def __call__(self, state: State) -> dict:
# Our agent only can see the "messages" and will ignore the test info
inputs = {"messages": state["messages"]}
has_examples = bool(state.get("examples"))
output_key = "candidate" # Used in the draft node
if has_examples:
output_key = "messages"
# Used in the solve node
inputs["examples"] = state["examples"]
response = self.runnable.invoke(inputs)
if not response.content:
return {
output_key: AIMessage(
content="I'll need to think about this step by step."
)
}
return {output_key: response}
prompt = hub.pull("wfh/usaco-draft-solver")
llm = ChatAnthropic(model="claude-3-opus-20240229")
draft_solver = Solver(llm, prompt.partial(examples=""))
solver = Solver(llm, prompt)
API 参考:ChatAnthropic
节点 2:检索¶
检索节点接收候选解决方案(由“solver”节点生成),使用此解决方案搜索相似的示例,然后将这些示例格式化在消息中。
# We will test our agent on index 0 (the same as above).
# Later, we will test on index 2 (the first 'silver difficulty' question)
test_indices = [0, 2]
train_ds = [row for i, row in enumerate(ds) if i not in test_indices]
test_ds = [row for i, row in enumerate(ds) if i in test_indices]
from langchain_community.retrievers import BM25Retriever
def format_example(row):
question = row["description"]
answer = row["solution"]
return f"""<problem>
{question}
</problem>
<solution>
{answer}
</solution>"""
# Skip our 'test examples' to avoid cheating
# This is "simulating" having seen other in-context examples
retriever = BM25Retriever.from_texts([format_example(row) for row in train_ds])
API 参考:BM25Retriever
现在定义节点。任何节点都可以选择接受第二个 config
位置参数。这包含您在调用图时可以调整的 configurable
参数。例如,我们可以调整为我们的 Agent 检索的前 k
个示例。
from langchain_core.runnables import RunnableConfig
def retrieve_examples(state: State, config: RunnableConfig):
top_k = config["configurable"].get("k") or 2
ai_message: AIMessage = state["candidate"]
if not ai_message.tool_calls:
# We err here. To make more robust, you could loop back
raise ValueError("Draft agent did not produce a valid code block")
code = ai_message.tool_calls[0]["args"]["code"]
examples_str = "\n".join(
[doc.page_content for doc in retriever.invoke(code)[:top_k]]
)
examples_str = f"""
You previously solved the following problems in this competition:
<Examples>
{examples_str}
<Examples>
Approach this new question with similar sophistication."""
return {"examples": examples_str}
API 参考:RunnableConfig
图¶
现在让我们把它们放在一起。该图比第 1 部分中的图稍微复杂一些,因为我们必须将初始的“draft”和“retrieve”节点添加到我们的 Agent 循环中。
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, StateGraph, START
builder = StateGraph(State)
builder.add_node("draft", draft_solver)
builder.add_edge(START, "draft")
builder.add_node("retrieve", retrieve_examples)
builder.add_node("solve", solver)
builder.add_node("evaluate", evaluate)
# Add connectivity
builder.add_edge("draft", "retrieve")
builder.add_edge("retrieve", "solve")
builder.add_edge("solve", "evaluate")
def control_edge(state: State):
if state.get("status") == "success":
return END
return "solve"
builder.add_conditional_edges("evaluate", control_edge, {END: END, "solve": "solve"})
checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)
API 参考:MemorySaver | END | StateGraph | START
from IPython.display import Image, display
try:
display(Image(graph.get_graph().draw_mermaid_png()))
except Exception:
# This requires some extra dependencies and is optional
pass
让我们再次尝试这个问题
config = {"configurable": {"thread_id": "question-recall", "k": 3}}
with tracing_v2_enabled(client=client):
events = graph.stream(input_state, config)
for event in events:
for value in event.values():
messages = value.get("messages")
if messages:
if isinstance(messages, list):
messages = value["messages"][-1]
print(
"Assistant:",
str(messages.content).replace("\n", "\\n")[:50],
)
elif value.get("examples"):
print("Retrieved examples:\n\n", value["examples"][:100] + "...")
elif value.get("candidate"):
print(str(value["candidate"].content)[:200])
[{'text': "<thinking>\nThis problem essentially asks to find the number of farms Bessie can visit before they close at each query. The key insights are:\n\n1. Bessie's arrival time at each farm is S +
Retrieved examples:
You previously solved the following problems in this competition:
<Examples>
<problem>
Farmer John...
Assistant: [{'text': "<thinking>\nThe key information given i
恭喜! 您已将“情景记忆”添加到您的 Agent 中,以获取少样本示例并解决这个青铜级别的编程奥林匹克竞赛问题!
但是,我们的 Agent 仍然有限。让我们在一个更具挑战性的 🪙🏆白银✨ 级别的问题上对其进行测试
silver_input = {
"messages": [("user", silver_row["description"])],
"test_cases": silver_row["test_cases"],
"runtime_limit": silver_row["runtime_limit"],
"status": "in_progress",
}
config = {"configurable": {"thread_id": "silver-question-1", "k": 2}}
with tracing_v2_enabled(client=client):
events = graph.stream(silver_input, config)
for event in events:
for value in event.values():
messages = value.get("messages")
if messages:
if isinstance(messages, list):
messages = value["messages"][-1]
print(
"Assistant:",
str(messages.content).replace("\n", "\\n")[:50],
)
elif value.get("examples"):
print("Retrieved examples:\n\n", value["examples"][:100] + "...")
elif value.get("candidate"):
print(str(value["candidate"].content)[:200])
[{'text': "<thinking>\nThe relevant tool for this problem is writePython. It requires the following parameters:\n- reasoning: To solve this problem, we need to simulate the cruise by following the seq
Retrieved examples:
You previously solved the following problems in this competition:
<Examples>
<problem>
Farmer John...
Assistant: [{'text': "<thinking>\nTo solve this problem, we n
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nAfter reviewing the failed
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nAfter reviewing the latest
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nOops, looks like I made a s
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nHmm, some of the test cases
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': '<thinking>\nOops, looks like I accident
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nLooks like the code is now
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': '<thinking>\nOops, looks like I accident
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nHmm, the optimization to si
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nOops, I did it again - acci
Assistant: Incorrect submission. Please respond with updated
Assistant: [{'text': "<thinking>\nHmm, the latest code is sti
Assistant: Incorrect submission. Please respond with updated
---------------------------------------------------------------------------
``````output
GraphRecursionError Traceback (most recent call last)
``````output
Cell In[37], line 12
10 with tracing_v2_enabled(client=client):
11 events = graph.stream(silver_input, config)
---> 12 for event in events:
13 for value in event.values():
14 messages = value.get("messages")
``````output
File ~/.pyenv/versions/3.11.2/lib/python3.11/site-packages/langgraph/pregel/__init__.py:645, in Pregel.stream(self, input, config, stream_mode, output_keys, input_keys, interrupt_before_nodes, interrupt_after_nodes, debug)
643 break
644 elif step == config["recursion_limit"]:
--> 645 raise GraphRecursionError(
646 f"Recursion limit of {config['recursion_limit']} reached"
647 "without hitting a stop condition. You can increase the "
648 "limit by setting the `recursion_limit` config key."
649 )
651 # before execution, check if we should interrupt
652 if _should_interrupt(
653 checkpoint,
654 interrupt_before_nodes,
655 self.stream_channels_list,
656 next_tasks,
657 ):
``````output
GraphRecursionError: Recursion limit of 25 reachedwithout hitting a stop condition. You can increase the limit by setting the `recursion_limit` config key.
仍然太难了! 尚未实现 AGI。要详细调查我们 Agent 的轨迹,请查看 完整 LangSmith 跟踪。
我们的 Agent 还不够好,无法实现自主。LangGraph 的优点在于您不必在“自主 Agent”和“简单 DAG”之间做出决定:您可以将控制和用户界面注入到任何可以有效地为您的应用程序带来好处的地方。
第 3 部分:人机协作¶
我们检索增强的 Agent 能够解决 bronze
级别的问题,但在难度更高的 silver 级别的问题上仍然失败。
回顾一下,论文提出了 3 种互补技术来提高性能
- 反思:显式提示 LLM “反思”其错误可以帮助它
- 少样本提示:检索相关的高质量示例作为“记忆”
- 人机协作: 在不给出正确答案的情况下,允许人类帮助 Agent 反思其方法并将其指向更好的方向。
在本节中,我们将添加“human”节点(在下图中标记为“第 3 部分”),完成我们的 Agent 图
从 ML 的角度来看,这有点像 聪明的汉斯,但从应用程序设计者的角度来看,主要目标是实现更高的综合成功率,让人类介入思考和见解是很自然的。
在任何一种情况下,向 LangGraph 实例添加人工检查都不需要额外的代码行。让我们通过指示图在“evaluate
”节点之后 interrupt_after
来让用户有机会修改轨迹来实现这一点。
开始在下面组装您的图。以下部分与我们在第 2 部分中的应用程序相同
# This is all the same as before
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, StateGraph, START
builder = StateGraph(State)
prompt = hub.pull("wfh/usaco-draft-solver")
llm = ChatAnthropic(model="claude-3-opus-20240229", max_tokens_to_sample=4000)
draft_solver = Solver(llm, prompt.partial(examples=""))
builder.add_node("draft", draft_solver)
builder.add_edge(START, "draft")
builder.add_node("retrieve", retrieve_examples)
solver = Solver(llm, prompt)
builder.add_node("solve", solver)
builder.add_node("evaluate", evaluate)
builder.add_edge("draft", "retrieve")
builder.add_edge("retrieve", "solve")
builder.add_edge("solve", "evaluate")
def control_edge(state: State):
if state.get("status") == "success":
return END
return "solve"
builder.add_conditional_edges("evaluate", control_edge, {END: END, "solve": "solve"})
checkpointer = MemorySaver()
API 参考:MemorySaver | END | StateGraph | START
现在通过编译图来完成。设置 interrupt_after=["evaluate"]
以指示 Agent 在继续执行之前等待人工输入。
graph = builder.compile(
checkpointer=checkpointer,
# New: this tells the graph to break any time it goes to the "human" node
interrupt_after=["evaluate"],
)
from IPython.display import Image, display
try:
display(Image(graph.get_graph().draw_mermaid_png()))
except Exception:
# This requires some extra dependencies and is optional
pass
正如您在上面的图中看到的,结构与第 2 部分相同,只是我们在“evaluate
”和“solve
”节点之间插入了一个“human
”断点。
让我们再次尝试这个问题!
config = {"configurable": {"thread_id": "silver-hl-1", "k": 2}}
with tracing_v2_enabled(client=client):
events = graph.stream(silver_input, config)
for event in events:
for value in event.values():
messages = value.get("messages")
if messages:
if isinstance(messages, list):
messages = value["messages"][-1]
print(
"Assistant:",
str(messages.content).replace("\n", "\\n")[:50],
)
elif value.get("examples"):
print("Retrieved examples:\n\n", value["examples"][:100] + "...")
elif value.get("candidate"):
print(str(value["candidate"].content)[:200])
[{'text': "<thinking>\nTo solve this problem, we need to:\n1. Read in the input data - number of ports N, length of direction sequence M, number of repetitions K, the port connections, and the directi
Retrieved examples:
You previously solved the following problems in this competition:
<Examples>
<problem>
Farmer John ...
Assistant: [{'text': '<thinking>\nTo determine where Bessie e
Assistant: Incorrect submission. Please respond with updated
回顾原始问题
Problem 3: Luxury River Cruise [Josh Alman and Nathan Pinsker, 2013]
Farmer John is taking Bessie and the cows on a cruise! They are sailing on a
network of rivers with N ports (1 <= N <= 1,000) labeled 1..N, and Bessie
starts at port 1. Each port has exactly two rivers leading out of it which
lead directly to other ports, and rivers can only be sailed one way.
At each port, the tour guides choose either the "left" river or the "right"
river to sail down next, but they keep repeating the same choices over and
over. More specifically, the tour guides have chosen a short sequence of M
directions (1 <= M <= 500), each either "left" or "right", and have
repeated it K times (1 <= K <= 1,000,000,000). Bessie thinks she is going
in circles -- help her figure out where she ends up!
PROBLEM NAME: cruise
INPUT FORMAT:
* Line 1: Three space-separated integers N, M, and K.
* Lines 2..N+1: Line i+1 has two space-separated integers,
representing the number of the ports that port i's left and
right rivers lead to, respectively.
* Line N+2: M space-separated characters, either 'L' or 'R'. 'L'
represents a choice of 'left' and 'R' represents a choice of
'right'.
SAMPLE INPUT:
4 3 3
2 4
3 1
4 2
1 3
L L R
INPUT DETAILS:
The port numbers are arranged clockwise in a circle, with 'L' being a
clockwise rotation and 'R' being a counterclockwise rotation. The sequence
taken is LLRLLRLLR.
OUTPUT FORMAT:
* Line 1: A single integer giving the number of the port where
Bessie's cruise ends.
SAMPLE OUTPUT:
4
OUTPUT DETAILS:
After the first iteration of the sequence of directions, Bessie is at port
2 (1 -> 2 -> 3 -> 2); after the second, she is at port 3 (2 -> 3 -> 4 ->
3), and at the end she is at port 4 (3 -> 4 -> 1 -> 4).
snapshot = graph.get_state(config)
print(snapshot.values["messages"][-2].content[0]["text"])
print("\n\nCode:\n\n")
print(snapshot.values["messages"][-2].tool_calls[0]["args"]["code"])
<thinking>
To determine where Bessie ends up, we need to:
1. Simulate the cruise by following the sequence of left/right directions
2. Repeat this sequence K times to find the final destination port
The problem provides:
- The number of ports N
- The connections between ports (left and right rivers for each port)
- The sequence of M directions (L or R) to follow
- The number of times K to repeat the sequence
With this information, we have everything needed to simulate the cruise and find the ending port. The key steps will be:
1. Read in the input data to initialize the river connections and direction sequence
2. Iterate K times:
- For each direction in the M-length sequence:
- Move to the next port based on the current port and direction
3. Output the final port number after K iterations
The solution will require loops to repeat the sequence K times and follow the M directions. Since K can be up to 1 billion, simulating all K iterations directly would be too slow. Instead, we can find a pattern in how the port changes after each M-length sequence, and then "fast-forward" by calculating which port we reach after K repetitions of the pattern.
</thinking>
Code:
N, M, K = map(int, input().split())
ports = []
for _ in range(N):
left, right = map(int, input().split())
ports.append((left, right))
directions = input().split()
cur = 1
pattern = []
seen = set()
steps = 0
while cur not in seen:
seen.add(cur)
for d in directions:
steps += 1
if d == 'L':
cur = ports[cur-1][0]
else:
cur = ports[cur-1][1]
pattern.append((cur, steps))
K %= steps
for port, step in pattern:
if step > K:
cur = port
break
K -= step
print(cur)
Incorrect submission. Please respond with updated code.
Pass rate: 4/10
Results:
<test id=0>
wrong answer. Expected '4
', got '3
'
</test>
<test id=1>
wrong answer. Expected '50
', got '2
'
</test>
<t
Agent 需要记住,模拟应该包括周期 + 导致示例的任何步骤。它可以使用“龟兔赛跑”算法进行周期检测,使用模拟路径并在检测到重复时中断,然后
让我们通过更新图状态来告知 Agent 这一点。
updated_config = graph.update_state(
config,
values={
"messages": [
(
"user",
"""Consider breaking down the algorithm into separate parts: reading inputs, detecting cycles using the tortoise and hare algorithm, and determining Bessie's final position by skipping ahead K steps.
Read the inputs into three arrays:
- Two arrays L and R for the ports (adjust for 0-based indexing)
- A third array S for the direction sequence
Optimize by multiplying K by M before the main loop to convert the number of repetitions into the total number of steps.
Use the tortoise and hare algorithm to detect the cycle:
- Define a helper function get_next(v) that returns the next position and direction index
- Initialize two pointers s0 and s1 to (0, 0)
- In each iteration:
- Move s0 by 1 step and s1 by 2 steps using get_next()
- If s0 equals s1, decrement K by 1 and break out of the loop
- Otherwise, decrement K by 1
- After the loop, if K is not 0, there is a cycle
To find the cycle length:
- Initialize a counter variable rho to 1
- Move s0 by 1 step using get_next()
- Enter a loop:
- Move s0 by 1 step using get_next()
- Increment rho
- If s0 equals s1, break out of the loop
Skip ahead by reducing K modulo rho.
Simulate the remaining steps:
- While K > 0, move s0 to the next position using get_next() and decrement K
Print the final position (converted to 1-based indexing).
Pay close attention to the initialization and movement of pointers during cycle detection and length calculation. Ensure that the logic is correct and handles all cases accurately.""",
)
]
},
)
现在图的状态包含我们的新消息。
HumanMessage(content="Consider breaking down the algorithm into separate parts: reading inputs, detecting cycles using the tortoise and hare algorithm, and determining Bessie's final position by skipping ahead K steps.\n\nRead the inputs into three arrays:\n- Two arrays L and R for the ports (adjust for 0-based indexing)\n- A third array S for the direction sequence\n\nOptimize by multiplying K by M before the main loop to convert the number of repetitions into the total number of steps.\n\nUse the tortoise and hare algorithm to detect the cycle:\n- Define a helper function get_next(v) that returns the next position and direction index\n- Initialize two pointers s0 and s1 to (0, 0)\n- In each iteration:\n - Move s0 by 1 step and s1 by 2 steps using get_next()\n - If s0 equals s1, decrement K by 1 and break out of the loop\n - Otherwise, decrement K by 1\n- After the loop, if K is not 0, there is a cycle\n\nTo find the cycle length:\n- Initialize a counter variable rho to 1\n- Move s0 by 1 step using get_next()\n- Enter a loop:\n - Move s0 by 1 step using get_next()\n - Increment rho\n - If s0 equals s1, break out of the loop\n\nSkip ahead by reducing K modulo rho.\n\nSimulate the remaining steps:\n- While K > 0, move s0 to the next position using get_next() and decrement K\n\nPrint the final position (converted to 1-based indexing).\n\nPay close attention to the initialization and movement of pointers during cycle detection and length calculation. Ensure that the logic is correct and handles all cases accurately.", id='98888982-a469-4c5a-ab65-743d2f2608dc')
让我们让 Agent 再次尝试。调用 stream
并使用 None
只是为了使用从内存加载的输入。我们将跳过我们的人工审核,以进行接下来的几次尝试,看看它是否可以自我纠正。
num_trials = 1
with tracing_v2_enabled(client=client):
for _ in range(num_trials):
events = graph.stream(None, updated_config)
for event in events:
for value in event.values():
messages = value.get("messages")
if messages:
if isinstance(messages, list):
messages = value["messages"][-1]
print(
"Assistant:",
str(messages.content).replace("\n", "\\n")[:50],
)
elif value.get("examples"):
print("Retrieved examples:\n\n", value["examples"][:100] + "...")
elif value.get("candidate"):
print(str(value["candidate"].content)[:200])
if graph.get_state(config).values["status"] == "success":
break
print("Continuing...")
Assistant: [{'text': '<thinking>\nThank you for the detailed
Assistant: Incorrect submission. Please respond with updated
Continuing...
好的,Agent 再次尝试了。查看此步骤的 LangSmith 跟踪,以查看其更新。
snapshot = graph.get_state(most_recent_state.config)
ai_message = snapshot.values["messages"][-2]
if ai_message.content:
print(ai_message.content)
print("\n\nCode:\n\n")
print(ai_message.tool_calls[0]["args"]["code"] if ai_message.tool_calls else "N/A")
[{'text': '<thinking>\nThank you for the detailed algorithm breakdown! Let me go through each step to make sure I understand and have the necessary information to implement the solution.\n\nReading inputs:\n- Read N, M, K and store in separate variables\n- Create arrays L and R to store the left and right port connections (adjust for 0-based indexing)\n- Create array S to store the M-length direction sequence \n- Multiply K by M upfront to get the total number of steps\n\nDetecting cycles with tortoise and hare:\n- Define get_next(v) to return the next position and direction index\n - It will use the current position and direction to look up the next port in L/R\n- Initialize two pointers s0 and s1 to (0, 0) \n- Loop until s0 equals s1 or all K steps are taken:\n - Move s0 by 1 step and s1 by 2 steps using get_next()\n - Decrement K\n- After the loop, check if K is 0 to determine if a cycle was found\n\nFinding cycle length:\n- If a cycle was found, initialize rho to 1\n- Move s0 by 1 step \n- Loop until s0 equals s1 again:\n - Move s0 by 1 step and increment rho\n- rho will equal the cycle length\n\nSkipping ahead:\n- Reduce K by taking it modulo rho\n\nSimulating remaining steps:\n- While K is greater than 0:\n - Move s0 using get_next()\n - Decrement K\n- s0 will hold the final position\n\nPrinting result:\n- Add 1 to the final position to convert back to 1-based indexing before printing\n\nThe key aspects are:\n- Handling the input format and 0-based indexing \n- Defining get_next() to handle moving to the next port based on direction\n- Correctly implementing the tortoise and hare cycle detection\n- Finding the cycle length after detection\n- Skipping ahead with modulo and simulating any remaining steps\n- Adjusting the output back to 1-based indexing\n\nI believe I have all the necessary pieces to implement this solution now. Let me code it up using the writePython tool.\n</thinking>', 'type': 'text'}, {'id': 'toolu_01EDrYeHJU7GxApRb1QfMA1b', 'input': {'reasoning': "Here's the problem-solving approach:\n\n1. Read in the input data:\n - N ports, M-length direction sequence, K repetitions\n - L and R arrays for left/right port connections\n - S array for direction sequence\n - Multiply K by M to get total steps\n\n2. Define get_next(v) helper function:\n - Takes current position and direction index\n - Returns next position and incremented direction index\n - Looks up next port in L/R arrays based on current direction\n\n3. Detect cycle using tortoise and hare algorithm:\n - Initialize s0 and s1 pointers to (0, 0)\n - Loop until match or all steps taken:\n - Move s0 by 1 step, s1 by 2 steps\n - Decrement K\n - Check if K is 0 after loop\n\n4. If cycle found, find cycle length:\n - Initialize rho to 1\n - Move s0 by 1 step\n - Loop until s0 equals s1 again:\n - Move s0 and increment rho\n - rho is the cycle length\n\n5. Skip ahead by K % rho steps\n\n6. Simulate remaining steps:\n - While K > 0:\n - Move s0 with get_next()\n - Decrement K\n \n7. Print final position (+1 for 1-based indexing)\n\nKey points:\n- Multiplying K*M avoids nested loop\n- get_next() handles port transitions \n- Tortoise and hare finds cycles\n- Modulo skips ahead in cycle\n- Adjust 0-based indexing for input/output", 'pseudocode': "1. Read input:\n N, M, K = read_ints()\n L = [0] * N\n R = [0] * N\n for i in 0..N-1:\n L[i], R[i] = read_ints()\n S = read_direction_sequence()\n K *= M\n\n2. Define get_next(v):\n def get_next(pos, dir_idx):\n if S[dir_idx] == 'L':\n next_pos = L[pos]\n else:\n next_pos = R[pos]\n next_dir_idx = (dir_idx + 1) % M\n return (next_pos, next_dir_idx)\n\n3. Find cycle:\n s0 = (0, 0)\n s1 = (0, 0) \n while K:\n s0 = get_next(s0[0], s0[1])\n s1 = get_next(s1[0], get_next(s1[0], s1[1])[1])\n K -= 1\n if s0 == s1: break\n if K != 0: no cycle, print s0[0] + 1\n\n4. Find cycle length:\n rho = 1\n s0 = get_next(s0[0], s0[1])\n while s0 != s1:\n s0 = get_next(s0[0], s0[1]) \n rho += 1\n\n5. Skip steps:\n K %= rho\n\n6. Remaining steps: \n while K:\n s0 = get_next(s0[0], s0[1])\n K -= 1\n \n7. Print result:\n print(s0[0] + 1)", 'code': "def read_ints():\n return map(int, input().split())\n\nN, M, K = read_ints()\n\nL = [0] * N\nR = [0] * N\nfor i in range(N):\n L[i], R[i] = read_ints()\n L[i] -= 1\n R[i] -= 1\n\nS = input().split()\n\nK *= M\n\ndef get_next(pos, dir_idx):\n if S[dir_idx] == 'L':\n next_pos = L[pos] \n else:\n next_pos = R[pos]\n next_dir_idx = (dir_idx + 1) % M\n return (next_pos, next_dir_idx)\n\ns0 = (0, 0) \ns1 = (0, 0)\n\nwhile K:\n if s0 == s1: break\n \n s0 = get_next(s0[0], s0[1])\n s1 = get_next(s1[0], get_next(s1[0], s1[1])[1])\n \n K -= 1\n \nif K:\n rho = 1\n s0 = get_next(s0[0], s0[1])\n while s0 != s1:\n s0 = get_next(s0[0], s0[1])\n rho += 1\n \n K %= rho\n \nwhile K: \n s0 = get_next(s0[0], s0[1])\n K -= 1\n \nprint(s0[0] + 1)"}, 'name': 'writePython', 'type': 'tool_use'}]
Code:
def read_ints():
return map(int, input().split())
N, M, K = read_ints()
L = [0] * N
R = [0] * N
for i in range(N):
L[i], R[i] = read_ints()
L[i] -= 1
R[i] -= 1
S = input().split()
K *= M
def get_next(pos, dir_idx):
if S[dir_idx] == 'L':
next_pos = L[pos]
else:
next_pos = R[pos]
next_dir_idx = (dir_idx + 1) % M
return (next_pos, next_dir_idx)
s0 = (0, 0)
s1 = (0, 0)
while K:
if s0 == s1: break
s0 = get_next(s0[0], s0[1])
s1 = get_next(s1[0], get_next(s1[0], s1[1])[1])
K -= 1
if K:
rho = 1
s0 = get_next(s0[0], s0[1])
while s0 != s1:
s0 = get_next(s0[0], s0[1])
rho += 1
K %= rho
while K:
s0 = get_next(s0[0], s0[1])
K -= 1
print(s0[0] + 1)
Incorrect submission. Please respond with updated code.
Pass rate: 3/10
Results:
<test id=0>
passed
</test>
<test id=1>
timed out
</test>
<test id=2>
timed out
</test>
<test id=3>
timed out
</test>
<t
让我们提供更多反馈。
updated_config = graph.update_state(
updated_config,
values={
"messages": [
(
"user",
"""That's better, but you're still getting some errors. Let's double check some things:
1. When calculating the cycle length, make sure the initialization and movement of the pointers is correct. Double-check the logic there and see if you can spot any discrepancies.
2. Check the condition for whether there's a cycle after the main loop to ensure it covers all cases, like if K becomes 0 in the last iteration.
Think step by step through youur implementation and update using the writePython tool.""",
)
]
},
)
既然我们已经提供了此反馈,让我们在再次权衡之前,给 Agent 几次尝试解决问题的机会。
num_trials = 2
with tracing_v2_enabled(client=client):
for _ in range(num_trials):
events = graph.stream(None, updated_config)
for event in events:
for value in event.values():
messages = value.get("messages")
if messages:
if isinstance(messages, list):
messages = value["messages"][-1]
print(
"Assistant:",
str(messages.content).replace("\n", "\\n")[:50],
)
elif value.get("examples"):
print("Retrieved examples:\n\n", value["examples"][:100] + "...")
elif value.get("candidate"):
print(str(value["candidate"].content)[:200])
if graph.get_state(config).values["status"] == "success":
break
print("Continuing...")
结论¶
恭喜您走到最后!在本教程中,您在 LangGraph 中实现了一个能够解决具有挑战性的编程问题的 Agent。您通过利用一些常用技术来提高性能,包括
- 反思:虽然我们没有实现显式的反思步骤,但我们的提示和工具调用旨在鼓励对先前输出进行批判。您在第 1 部分中添加了此功能。
- 检索:Agent 的“情景记忆”从我们的编程问题语料库中检索高质量的少样本示例,以帮助解决青铜级别的问题。在第 2 部分中,您将检索记忆实现为初始步骤。
- 人机协作:LLM 驱动的 Agent 仍然太弱,无法自主回答所有这些问题,但有时,它们可以完成大部分工作,并在人工反馈下获得正确的答案。在第 3 部分中,您在
evaluate
节点上使用了interrupt_after
,然后通过在图上使用update_state
包含了您的反馈。
LLM 无法自主解决所有这些问题,但通过更好的提示和巧妙的工程设计,您可以创建一个能够更可靠地获得正确解决方案的系统。