如何从您的图中流式化 LLM 令牌¶
在本示例中,我们将从为代理提供动力的语言模型中流式化令牌。我们将以 ReAct 代理为例。简而言之,就是使用 streamEvents (API 参考).
注意
如果您使用的是 @langchain/core
< 0.2.3 版本,在调用聊天模型或 LLM 时,您需要在节点内调用 await model.stream()
以获取逐个令牌的流式化事件,并在需要时聚合最终输出以更新图状态。在 @langchain/core
的更高版本中,此操作会自动发生,您可以调用 await model.invoke()
。
有关如何升级 @langchain/core
的更多信息,请查看 此处说明.
此操作指南紧随本目录中的其他指南,展示了如何将功能整合到 LangGraph 中的原型代理中。
流式支持
许多聊天模型支持令牌流式化,但并非所有模型都支持。查看您的 LLM 集成是否支持令牌流式化 此处(文档)。请注意,一些集成可能支持通用令牌流式化,但缺乏对流式工具调用的支持。
注意
在本操作指南中,我们将从头开始创建代理,以确保透明度(但冗长)。您可以使用 createReactAgent({ llm, tools })
(API 文档) 构造函数实现类似的功能。如果您习惯于使用 LangChain 的 AgentExecutor 类,这可能更合适。
设置¶
本指南将使用 OpenAI 的 GPT-4o 模型。我们将选择性地为 LangSmith 跟踪 设置我们的 API 密钥,这将为我们提供一流的可观测性。
// process.env.OPENAI_API_KEY = "sk_...";
// Optional, add tracing in LangSmith
// process.env.LANGCHAIN_API_KEY = "ls__...";
// process.env.LANGCHAIN_CALLBACKS_BACKGROUND = "true";
// process.env.LANGCHAIN_TRACING = "true";
// process.env.LANGCHAIN_PROJECT = "Stream Tokens: LangGraphJS";
定义状态¶
状态是我们图中所有节点的接口。
import { Annotation } from "@langchain/langgraph";
import { BaseMessage } from "@langchain/core/messages";
const StateAnnotation = Annotation.Root({
messages: Annotation<BaseMessage[]>({
reducer: (x, y) => x.concat(y),
}),
});
import { tool } from "@langchain/core/tools";
import { z } from "zod";
const searchTool = tool((_) => {
// This is a placeholder for the actual implementation
return "Cold, with a low of 3℃";
}, {
name: "search",
description:
"Use to surf the web, fetch current information, check the weather, and retrieve other information.",
schema: z.object({
query: z.string().describe("The query to use in your search."),
}),
});
await searchTool.invoke({ query: "What's the weather like?" });
const tools = [searchTool];
现在,我们可以将这些工具包装在预构建的 ToolNode 中。此对象将在 LLM 调用工具(函数)时实际运行它们。
import { ToolNode } from "@langchain/langgraph/prebuilt";
const toolNode = new ToolNode(tools);
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({ model: "gpt-4o", temperature: 0 });
完成此操作后,我们应该确保模型知道它可以调用这些工具。我们可以通过调用 bindTools 来做到这一点。
const boundModel = model.bindTools(tools);
定义图¶
现在我们可以将所有内容整合在一起。
import { StateGraph, END } from "@langchain/langgraph";
import { AIMessage } from "@langchain/core/messages";
const routeMessage = (state: typeof StateAnnotation.State) => {
const { messages } = state;
const lastMessage = messages[messages.length - 1] as AIMessage;
// If no tools are called, we can finish (respond to the user)
if (!lastMessage?.tool_calls?.length) {
return END;
}
// Otherwise if there is, we continue and call the tools
return "tools";
};
const callModel = async (
state: typeof StateAnnotation.State,
) => {
// For versions of @langchain/core < 0.2.3, you must call `.stream()`
// and aggregate the message from chunks instead of calling `.invoke()`.
const { messages } = state;
const responseMessage = await boundModel.invoke(messages);
return { messages: [responseMessage] };
};
const workflow = new StateGraph(StateAnnotation)
.addNode("agent", callModel)
.addNode("tools", toolNode)
.addEdge("__start__", "agent")
.addConditionalEdges("agent", routeMessage)
.addEdge("tools", "agent");
const agent = workflow.compile();
import * as tslab from "tslab";
const runnableGraph = agent.getGraph();
const image = await runnableGraph.drawMermaidPng();
const arrayBuffer = await image.arrayBuffer();
await tslab.display.png(new Uint8Array(arrayBuffer));
如何流式化工具调用¶
现在您可以运行您的代理。让我们先来看一个流式化中间工具调用的示例。并非所有提供商都支持此功能,但有些提供商支持对工具调用的令牌级流式化。
要获取部分填充的工具调用,您可以访问消息块的 tool_call_chunks
属性
import type { AIMessageChunk } from "@langchain/core/messages";
const eventStream = await agent.streamEvents(
{ messages: [{role: "user", content: "What's the weather like today?" }] },
{
version: "v2",
},
);
for await (const { event, data } of eventStream) {
if (event === "on_chat_model_stream") {
const msg = data.chunk as AIMessageChunk;
if (msg.tool_call_chunks !== undefined && msg.tool_call_chunks.length > 0) {
console.log(msg.tool_call_chunks);
}
}
}
[ { name: 'search', args: '', id: 'call_ziGo5u8fYyqQ78SdLZTEC9Vg', index: 0, type: 'tool_call_chunk' } ] [ { name: undefined, args: '{"', id: undefined, index: 0, type: 'tool_call_chunk' } ] [ { name: undefined, args: 'query', id: undefined, index: 0, type: 'tool_call_chunk' } ] [ { name: undefined, args: '":"', id: undefined, index: 0, type: 'tool_call_chunk' } ] [ { name: undefined, args: 'current', id: undefined, index: 0, type: 'tool_call_chunk' } ] [ { name: undefined, args: ' weather', id: undefined, index: 0, type: 'tool_call_chunk' } ] [ { name: undefined, args: '"}', id: undefined, index: 0, type: 'tool_call_chunk' } ]
const eventStreamFinalRes = await agent.streamEvents(
{ messages: [{ role: "user", content: "What's the weather like today?" }] },
{ version: "v2" });
for await (const { event, data } of eventStreamFinalRes) {
if (event === "on_chat_model_stream") {
const msg = data.chunk as AIMessageChunk;
if (!msg.tool_call_chunks?.length) {
console.log(msg.content);
}
}
}
The weather today is cold , with a low of 3 ℃ .
其他图¶
如果您的图在多个节点中包含多个模型调用,并且其中一个总是最后调用,那么您可以通过为其分配运行名称或标签来区分该模型。为了说明这一点,声明一个新的图,如下所示
const OtherGraphAnnotation = Annotation.Root({
messages: Annotation<BaseMessage[]>({
reducer: (x, y) => x.concat(y),
}),
});
const respond = async (state: typeof OtherGraphAnnotation.State) => {
const { messages } = state;
const model = new ChatOpenAI({ model: "gpt-4o", temperature: 0 });
const responseMessage = await model.invoke(messages);
return {
messages: [responseMessage],
}
};
const summarize = async (state: typeof OtherGraphAnnotation.State) => {
const { messages } = state;
// Assign the final model call a run name
const model = new ChatOpenAI({
model: "gpt-4o",
temperature: 0
}).withConfig({ runName: "Summarizer" });
const userMessage = { role: "human", content: "Now, summarize the above messages" };
const responseMessage = await model.invoke([
...messages,
userMessage,
]);
return {
messages: [userMessage, responseMessage]
};
}
const otherWorkflow = new StateGraph(OtherGraphAnnotation)
.addNode("respond", respond)
.addNode("summarize", summarize)
.addEdge("__start__", "respond")
.addEdge("respond", "summarize")
.addEdge("summarize", "__end__");
const otherGraph = otherWorkflow.compile();
const otherRunnableGraph = otherGraph.getGraph();
const otherImage = await otherRunnableGraph.drawMermaidPng();
const otherArrayBuffer = await otherImage.arrayBuffer();
await tslab.display.png(new Uint8Array(otherArrayBuffer));
现在,当我们调用 streamEvents
时,我们可以看到,现在可以根据运行名称进行过滤,以便仅查看当前聊天历史记录的最终摘要生成
const otherEventStream = await otherGraph.streamEvents(
{ messages: [{ role: "user", content: "What's the capital of Nepal?" }] },
{ version: "v2" },
{ includeNames: ["Summarizer"] }
);
for await (const { event, data } of otherEventStream) {
if (event === "on_chat_model_stream") {
console.log(data.chunk.content);
}
}
You asked about the capital of Nepal , and I responded that it is Kathmandu .