关系型数据#

在这个任务中，代理可以访问一组工具，这些工具可用于跨 3 个关系型表进行查询。

这些表包含有关用户、位置和食物的信息。代理必须使用提供的工具回答有关数据的问题。

基础数据如下所示（显示前 2 条记录）

用户数据

id	姓名	电子邮件	位置	最喜欢的颜色	最喜欢的食物
1	Alice	alice@gmail.com	1	红色	[1, 2, 3]
21	Bob	bob@hotmail.com	2	橙色	[4, 5, 6]

位置数据

id	城市	当前时间	当前天气
1	纽约	2023-11-14 上午 10:30	多云，温度：68°F
2	洛杉矶	2023-11-14 上午 7:45	晴朗，温度：75°F

食物数据

id	姓名	卡路里	过敏成分
1	披萨	285	[“面筋”, “乳制品”]
2	巧克力	50	[“牛奶”, “大豆”]

这些工具允许根据 ID 查找信息（例如，get_user_email 接收用户 ID 并返回电子邮件），并进行搜索（例如，find_foods_by_name 接收食物名称并返回结果列表）。

from langchain_benchmarks import registry

要使此代码正常工作，请使用您的凭据配置 LangSmith 环境变量。

task = registry["Tool Usage - Relational Data"]

环境#

让我们检查环境

env = task.create_environment()
env.tools[:5]

[StructuredTool(name='get_user_name', description="get_user_name(user_id: int) -> str - Get the name of the user with the given user ID.\n\n        Args:\n            user_id: The user's ID.\n\n        Returns:\n            The user's name.", args_schema=<class 'pydantic.v1.main.get_user_nameSchema'>, handle_tool_error=True, func=<function get_available_functions.<locals>.get_user_name at 0x78f30602fec0>),
 StructuredTool(name='list_user_ids', description='list_user_ids() -> List[str] - List all the user IDs.', args_schema=<class 'pydantic.v1.main.list_user_idsSchema'>, handle_tool_error=True, func=<function get_available_functions.<locals>.list_user_ids at 0x78f30602fe20>),
 StructuredTool(name='find_users_by_name', description='find_users_by_name(name: str) -> List[langchain_benchmarks.tool_usage.tasks.relational_data.SearchHit] - Find users with the given name.\n\n        Args:\n            name: The name to search for.\n\n        Returns:\n            The list of matching users.', args_schema=<class 'pydantic.v1.main.find_users_by_nameSchema'>, handle_tool_error=True, func=<function get_available_functions.<locals>.find_users_by_name at 0x78f306058040>),
 StructuredTool(name='find_locations_by_name', description='find_locations_by_name(city: str) -> List[langchain_benchmarks.tool_usage.tasks.relational_data.SearchHit] - Find locations with the given city name.', args_schema=<class 'pydantic.v1.main.find_locations_by_nameSchema'>, handle_tool_error=True, func=<function get_available_functions.<locals>.find_locations_by_name at 0x78f3060580e0>),
 StructuredTool(name='find_foods_by_name', description='find_foods_by_name(food: str) -> List[langchain_benchmarks.tool_usage.tasks.relational_data.SearchHit] - Find foods with the given name.', args_schema=<class 'pydantic.v1.main.find_foods_by_nameSchema'>, handle_tool_error=True, func=<function get_available_functions.<locals>.find_foods_by_name at 0x78f306058180>)]

env.tools[0].invoke({"user_id": 21})

'Bob'

env.tools[3].invoke({"city": "LA"})

[{'id': 2, 'city': 'Los Angeles'},
 {'id': 1, 'city': 'New York'},
 {'id': 3, 'city': 'Chicago'},
 {'id': 4, 'city': 'Houston'},
 {'id': 5, 'city': 'Miami'}]

探索任务#

为了评估，我们需要一个代理工厂，该工厂将在每次评估运行时创建一个新的代理执行器实例。

我们将使用 StandardAgentFactory – 查看 intro 以获取有关其功能和/或如何创建自定义代理工厂的更多信息。

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai.chat_models import ChatOpenAI

from langchain_benchmarks.tool_usage.agents import StandardAgentFactory

model = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "{instructions}"),  # Populated from task.instructions automatically
        ("human", "{question}"),  # Populated from the test data
        (
            "placeholder",
            "{agent_scratchpad}",
        ),  # Work where the agent can do its work (e.g., call multiple tools)
    ]
)

agent_factory = StandardAgentFactory(task, model, prompt)

from langchain import globals

globals.set_verbose(True)

agent = agent_factory()
agent.invoke({"question": "what is the weather in LA"})

> Entering new AgentExecutor chain...

Invoking: `find_locations_by_name` with `{'city': 'LA'}`

[{'id': 2, 'city': 'Los Angeles'}, {'id': 1, 'city': 'New York'}, {'id': 3, 'city': 'Chicago'}, {'id': 4, 'city': 'Houston'}, {'id': 5, 'city': 'Miami'}]
Invoking: `get_current_weather_for_location` with `{'location_id': 2}`

Sunny, Temperature: 75°FThe weather in Los Angeles is sunny with a temperature of 75°F.

> Finished chain.

{'question': 'what is the weather in LA',
 'output': 'The weather in Los Angeles is sunny with a temperature of 75°F.',
 'intermediate_steps': [(ToolAgentAction(tool='find_locations_by_name', tool_input={'city': 'LA'}, log="\nInvoking: `find_locations_by_name` with `{'city': 'LA'}`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_hJrCZgP4eDgaj6s4RtCKXTOo', 'function': {'arguments': '{"city":"LA"}', 'name': 'find_locations_by_name'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-23ccffb0-3b17-46a4-b42e-5eaa3220b211', tool_calls=[{'name': 'find_locations_by_name', 'args': {'city': 'LA'}, 'id': 'call_hJrCZgP4eDgaj6s4RtCKXTOo'}], tool_call_chunks=[{'name': 'find_locations_by_name', 'args': '{"city":"LA"}', 'id': 'call_hJrCZgP4eDgaj6s4RtCKXTOo', 'index': 0}])], tool_call_id='call_hJrCZgP4eDgaj6s4RtCKXTOo'),
   [{'id': 2, 'city': 'Los Angeles'},
    {'id': 1, 'city': 'New York'},
    {'id': 3, 'city': 'Chicago'},
    {'id': 4, 'city': 'Houston'},
    {'id': 5, 'city': 'Miami'}]),
  (ToolAgentAction(tool='get_current_weather_for_location', tool_input={'location_id': 2}, log="\nInvoking: `get_current_weather_for_location` with `{'location_id': 2}`\n\n\n", message_log=[AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_lopYjo00MF9mZtnHtiisTqyp', 'function': {'arguments': '{"location_id":2}', 'name': 'get_current_weather_for_location'}, 'type': 'function'}]}, response_metadata={'finish_reason': 'tool_calls'}, id='run-9bba5827-d98b-464d-8028-25eb4a05d227', tool_calls=[{'name': 'get_current_weather_for_location', 'args': {'location_id': 2}, 'id': 'call_lopYjo00MF9mZtnHtiisTqyp'}], tool_call_chunks=[{'name': 'get_current_weather_for_location', 'args': '{"location_id":2}', 'id': 'call_lopYjo00MF9mZtnHtiisTqyp', 'index': 0}])], tool_call_id='call_lopYjo00MF9mZtnHtiisTqyp'),
   'Sunny, Temperature: 75°F')]}

基准测试#

查看 introduction 和 benchmark all 以了解如何运行基准测试。此笔记本只是为了解释和探索任务。

关系型数据

内容

关系型数据#

环境#

探索任务#

基准测试#