使用本地LLM的自适应RAG¶
自适应RAG是一种RAG策略,它结合了 (1) 查询分析 与 (2) 主动/自我纠正RAG。
在论文中,他们报告了通过查询分析来路由到以下方式:
- 不检索
- 单次RAG
- 迭代RAG
让我们使用LangGraph在此基础上进行构建。
在我们的实现中,我们将在以下方式之间进行路由:
- 网页搜索:针对与近期事件相关的问题
- 自我纠正RAG:针对与我们的索引相关的问题
设置¶
首先,你需要安装一些必需的依赖项
对于回退网页搜索,你还需要获取一个Tavily API密钥并将其设置为名为TAVILY_API_KEY
的环境变量。
模型¶
接下来,选择你将使用的本地模型。
本地嵌入¶
我们将使用Ollama的mxbai-embed-large
嵌入模型。
本地LLM¶
(1) 下载Ollama应用。
(2) 从此处拉取一个Llama 3
模型。你也可以尝试从此处拉取Mistral
模型,或者量化后的Cohere Command-R模型之一,或者任何你希望从Ollama库尝试的其他模型——请确保你的计算机有足够的RAM。
追踪¶
可选地,使用LangSmith进行追踪(底部显示)
// process.env.LANGCHAIN_TRACING_V2 = "true";
// process.env.LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com";
// process.env.LANGCHAIN_API_KEY = "<your-api-key>"
索引¶
现在你已经选择了并设置好本地模型,加载并索引一些源文档。下面的代码使用Lilian Weng关于LLM和智能体的一些博客文章作为数据源,然后将它们加载到一个演示性的MemoryVectorStore
实例中。接着,它从该向量存储中创建一个检索器以备后用。
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OllamaEmbeddings } from "@langchain/ollama";
const urls = [
"https://lilianweng.github.io/posts/2023-06-23-agent/",
"https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
"https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
];
const docs = await Promise.all(urls.map((url) => {
const loader = new CheerioWebBaseLoader(url);
return loader.load();
}));
const docsList = docs.flat();
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 250,
chunkOverlap: 0,
});
const splitDocs = await textSplitter.splitDocuments(docsList);
const embeddings = new OllamaEmbeddings({
model: "mxbai-embed-large",
});
// Add to vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
splitDocs,
embeddings,
);
const retriever = vectorStore.asRetriever();
创建组件¶
在这里,你将创建图的组件。
问题路由¶
首先,创建一个链,它会将传入的问题路由到你的向量存储(如果它们与LLM或智能体相关),或者路由到通用网页搜索(如果它们不相关)。
你将使用Ollama的JSON模式来帮助保持输出格式的一致性。
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatOllama } from "@langchain/ollama";
const jsonModeLlm = new ChatOllama({
model: "llama3",
format: "json",
temperature: 0,
});
const QUESTION_ROUTER_SYSTEM_TEMPLATE =
`You are an expert at routing a user question to a vectorstore or web search.
Use the vectorstore for questions on LLM agents, prompt engineering, and adversarial attacks.
You do not need to be stringent with the keywords in the question related to these topics.
Otherwise, use web-search. Give a binary choice 'web_search' or 'vectorstore' based on the question.
Return the a JSON with a single key 'datasource' and no preamble or explanation.`;
const questionRouterPrompt = ChatPromptTemplate.fromMessages([
["system", QUESTION_ROUTER_SYSTEM_TEMPLATE],
["human", "{question}"],
]);
const questionRouter = questionRouterPrompt.pipe(jsonModeLlm).pipe(
new JsonOutputParser(),
);
await questionRouter.invoke({ question: "llm agent memory" });
检索评分器¶
创建一个评分器,它将检查从我们的向量存储中检索到的文档的相关性
const GRADER_TEMPLATE =
`You are a grader assessing relevance of a retrieved document to a user question.
Here is the retrieved document:
<document>
{content}
</document>
Here is the user question:
<question>
{question}
</question>
If the document contains keywords related to the user question, grade it as relevant.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;
const graderPrompt = ChatPromptTemplate.fromTemplate(GRADER_TEMPLATE);
const retrievalGrader = graderPrompt.pipe(jsonModeLlm).pipe(
new JsonOutputParser(),
);
// Test run
const testQuestion = "agent memory";
const docs2 = await retriever.invoke(testQuestion);
await retrievalGrader.invoke({
question: testQuestion,
content: docs2[0].pageContent,
});
“智能体记忆”
相关。
生成¶
接下来,创建一个链,根据检索到的文档生成答案。
import * as hub from "langchain/hub";
import { StringOutputParser } from "@langchain/core/output_parsers";
import type { Document } from "@langchain/core/documents";
// https://smith.langchain.com/hub/rlm/rag-prompt
const ragPrompt = await hub.pull("rlm/rag-prompt");
// Post-processing
const formatDocs = (docs: Document[]) => {
return docs.map((doc) => doc.pageContent).join("\n\n");
};
// Initialize a new model without JSON mode active
const llm = new ChatOllama({
model: "llama3",
temperature: 0,
});
// Chain
const ragChain = ragPrompt.pipe(llm).pipe(new StringOutputParser());
// Test run
const testQuestion2 = "agent memory";
const docs3 = await retriever.invoke(testQuestion2);
await ragChain.invoke({ context: formatDocs(docs3), question: testQuestion2 });
Based on the provided context, it appears that an agent's memory refers to its ability to record and reflect on past experiences, using both long-term and short-term memory modules. The long-term memory module, or "memory stream," stores a comprehensive list of agents' experiences in natural language, while the reflection mechanism synthesizes these memories into higher-level inferences over time to guide future behavior.
幻觉评分器¶
创建一个链,用于审查生成的答案并检查是否存在幻觉。我们将再次为此使用JSON模式。
const HALLUCINATION_GRADER_TEMPLATE =
`You are a grader assessing whether an answer is grounded in / supported by a set of facts.
Here are the facts used as context to generate the answer:
<context>
{context}
</context>
Here is the answer:
<answer>
{generation}
</answer>
Give a binary score 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;
const hallucinationGraderPrompt = ChatPromptTemplate.fromTemplate(
HALLUCINATION_GRADER_TEMPLATE,
);
const hallucinationGrader = hallucinationGraderPrompt.pipe(llm).pipe(
new JsonOutputParser(),
);
// Test run
const generation2 = await ragChain.invoke({
context: formatDocs(docs3),
question: testQuestion2,
});
await hallucinationGrader.invoke({ context: formatDocs(docs3), generation: generation2 });
答案评分器¶
创建一个用于检查最终答案相关性的链。
const ANSWER_GRADER_PROMPT_TEMPLATE =
`You are a grader assessing whether an answer is useful to resolve a question.
Here is the answer:
<answer>
{generation}
</answer>
Here is the question:
<question>
{question}
</question>
Give a binary score 'yes' or 'no' to indicate whether the answer is useful to resolve a question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;
const answerGraderPrompt = ChatPromptTemplate.fromTemplate(
ANSWER_GRADER_PROMPT_TEMPLATE,
);
const answerGrader = answerGraderPrompt.pipe(jsonModeLlm).pipe(
new JsonOutputParser(),
);
// Test run
const generation3 = await ragChain.invoke({
context: formatDocs(docs3),
question: testQuestion2,
});
await answerGrader.invoke({ question: testQuestion2, generation: generation3 });
问题重写器¶
创建一个问题重写器。这个链会对用户问题执行查询分析,并为RAG优化这些问题,以帮助处理困难的查询。
const REWRITER_PROMPT_TEMPLATE =
`You a question re-writer that converts an input question to a better version that is optimized
for vectorstore retrieval. Look at the initial and formulate an improved question.
Here is the initial question:
<question>
{question}
</question>
Respond only with an improved question. Do not include any preamble or explanation.`;
const rewriterPrompt = ChatPromptTemplate.fromTemplate(
REWRITER_PROMPT_TEMPLATE,
);
const rewriter = rewriterPrompt.pipe(llm).pipe(new StringOutputParser());
// Test run
// Test question is "agent memory"
await rewriter.invoke({ question: testQuestion2 });
网页搜索工具¶
最后,你需要一个网页搜索工具,可以处理超出索引文档范围的问题。下面的代码初始化了一个由Tavily提供支持的搜索工具。
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
const webSearchTool = new TavilySearchResults({ maxResults: 3 });
await webSearchTool.invoke("red robin");
[{"title":"Family Friendly Burger Restaurant | Red Robin","url":"https://www.redrobin.com/","content":"Red Robin is donating 10¢ to Make-A-Wish ® for every Kids Meal purchased. You can contribute to life-changing wishes by simply purchasing a Kids Meal at Red Robin for Dine-in or To-Go. Join us for a memorable meal or order online and help transform lives, one wish at a time.","score":0.998043,"raw_content":null},{"title":"Red Robin United States of America Directory","url":"https://locations.redrobin.com/locations-list/us/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin Restaurants in United States","score":0.99786776,"raw_content":null},{"title":"Red Robin Restaurant Locations","url":"https://locations.redrobin.com/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin","score":0.99718815,"raw_content":null}]
图¶
现在你已经创建了所有必需的组件,是时候将流程捕获为图了。
图状态¶
像这样定义图状态。由于question
和generation
是简单的字符串,我们可以使用null
作为默认行为的简写。
import type { Document } from "@langchain/core/documents";
import { Annotation } from "@langchain/langgraph";
// This defines the agent state.
// Returned documents from a node will override the current
// "documents" value in the state object.
const GraphState = Annotation.Root({
question: Annotation<string>,
generation: Annotation<string>,
documents: Annotation<Document[]>({
reducer: (_, y) => y,
default: () => [],
})
})
准备节点和边¶
让我们将组件封装在与LangGraph所需接口匹配的函数中。这些函数将处理输入和输出的格式化。
我们将在节点内部使用一些组件,并使用其他组件来定义条件边。每个组件都将图状态作为参数。节点返回要更新的状态属性,而条件边返回要执行的下一个节点的名称。
import { Document } from "@langchain/core/documents";
/* ---Nodes--- */
// Retrieve documents for a question
const retrieve = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
console.log("---RETRIEVE---");
const documents = await retriever.invoke(state.question);
// Add sources to the state
return { documents };
};
// RAG generation
const generate = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
console.log("---GENERATE---");
const generation = await ragChain.invoke({
context: formatDocs(state.documents),
question: state.question,
});
// Add generation to the state
return { generation };
};
// Determines whether the retrieved documents are relevant to the question.
const gradeDocuments = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
console.log("---CHECK DOCUMENT RELEVANCE TO QUESTION---");
// Score each doc
const relevantDocs: Document[] = [];
for (const doc of state.documents) {
const grade: { score: string } = await retrievalGrader.invoke({
question: state.question,
content: doc.pageContent,
});
if (grade.score === "yes") {
console.log("---GRADE: DOCUMENT RELEVANT---");
relevantDocs.push(doc);
} else {
console.log("---GRADE: DOCUMENT NOT RELEVANT---");
}
}
return { documents: relevantDocs };
};
// Re-write question
const transformQuery = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
console.log("---TRANSFORM QUERY---");
const betterQuestion = await rewriter.invoke({ question: state.question });
return { question: betterQuestion };
};
// Web search based on the re-phrased question
const webSearch = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
console.log("---WEB SEARCH---");
const stringifiedSearchResults = await webSearchTool.invoke(state.question);
return {
documents: [new Document({ pageContent: stringifiedSearchResults })],
};
};
/* ---Edges--- */
// Decide on the datasource to route the initial question to.
const routeQuestion = async (state: typeof GraphState.State) => {
const source: { datasource: string } = await questionRouter.invoke({
question: state.question,
});
if (source.datasource === "web_search") {
console.log(`---ROUTING QUESTION "${state.question} TO WEB SEARCH---`);
return "web_search";
} else {
console.log(`---ROUTING QUESTION "${state.question} TO RAG---`);
return "retrieve";
}
};
// Decide whether the current documents are sufficiently relevant
// to come up with a good answer.
const decideToGenerate = async (state: typeof GraphState.State) => {
const filteredDocuments = state.documents;
// All documents have been filtered as irrelevant
// Regenerate a new query and try again
if (filteredDocuments.length === 0) {
console.log(
"---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---",
);
return "transform_query";
} else {
// We have relevant documents, so generate answer.
console.log("---DECISION: GENERATE---");
return "generate";
}
};
// Determines whether the generation is grounded in the document and answers question.
const gradeGenerationDocumentsAndQuestion = async (
state: typeof GraphState.State,
) => {
const hallucinationGrade: { score: string } = await hallucinationGrader
.invoke({
generation: state.generation,
context: formatDocs(state.documents),
});
// Check for hallucination
if (hallucinationGrade.score === "yes") {
console.log("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---");
// Check question answering
console.log("---GRADING GENERATION vs. QUESTION---");
const onTopicGrade: { score: string } = await answerGrader.invoke({
question: state.question,
generation: state.generation,
});
if (onTopicGrade.score === "yes") {
console.log("---DECISION: GENERATION ADDRESSES QUESTION---");
return "useful";
} else {
console.log("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---");
return "not_useful";
}
} else {
console.log(
"---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RETRY---",
);
return "not_supported";
}
};
构建图¶
现在我们来构建图。为了好玩,让我们添加一个检查点,并让编译后的图在进行网页搜索之前暂停。这将模拟请求权限的过程。
import { END, MemorySaver, START, StateGraph } from "@langchain/langgraph";
const graph = new StateGraph(GraphState)
.addNode("web_search", webSearch)
.addNode("retrieve", retrieve)
.addNode("grade_documents", gradeDocuments)
.addNode("generate", generate)
.addNode("transform_query", transformQuery)
.addConditionalEdges(START, routeQuestion)
.addEdge("web_search", "generate")
.addEdge("retrieve", "grade_documents")
.addConditionalEdges("grade_documents", decideToGenerate)
.addEdge("transform_query", "retrieve")
.addConditionalEdges("generate", gradeGenerationDocumentsAndQuestion, {
not_supported: "generate",
useful: END,
not_useful: "transform_query",
});
const app = graph.compile({
checkpointer: new MemorySaver(),
interruptBefore: ["web_search"],
});
运行图¶
一切就绪!是时候提问了。首先,尝试一个与智能体相关的问题。
await app.invoke(
{
question: "What are some features of long-term memory?",
},
{ configurable: { thread_id: "1" } },
);
---ROUTING QUESTION "What are some features of long-term memory? TO WEB SEARCH---
{
question: 'What are some features of long-term memory?',
documents: []
}
如果你询问与智能体或LLM无关的问题,图应回退到从网络获取的信息。图将在执行前暂停,如上所述。
await app.invoke(
{
question: "Where are the 2024 Euros being held?",
},
{ configurable: { thread_id: "2" } },
);
---ROUTING QUESTION "Where are the 2024 Euros being held? TO WEB SEARCH---
{ question: 'Where are the 2024 Euros being held?', documents: [] }
null
调用图来继续。
---WEB SEARCH---
---GENERATE---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADING GENERATION vs. QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
{
question: 'Where are the 2024 Euros being held?',
generation: 'The 2024 Euros are being held in Germany. The final match will take place at Olympiastadion Berlin on July 14, 2024.',
documents: [
Document {
pageContent: `[{"title":"Where is Euro 2024? Country, host cities and venues","url":"https://www.radiotimes.com/tv/sport/football/euro-2024-location/","content":"Euro 2024 stadiums The Olympiastadion Berlin, the biggest stadium in Germany with a capacity of around 74,000, will host games as well as the final on Sunday, 14th July, 2024.","score":0.99743915,"raw_content":null},{"title":"UEFA EURO 2024 venues - complete list: When and where will the opening ...","url":"https://olympics.com/en/news/uefa-euro-2024-venues-complete-list-when-where-final-opening-game","content":"UEFA EURO 2024 will be held in Germany across June and July, with 10 host cities staging the major football tournament.. It all begins in Munich on June 14, when hosts Germany take on Scotland in the tournament's opening game at Bayern Munich's stadium.. The final takes place a month later on July 14 at Olympiastadion Berlin in the German capital, which hosted the 2006 FIFA World Cup final ...","score":0.9973061,"raw_content":null},{"title":"EURO 2024: All you need to know | UEFA EURO 2024","url":"https://www.uefa.com/euro2024/news/0257-0e13b161b2e8-4a3fd5615e0c-1000--euro-2024-all-you-need-to-know/","content":"Article top media content\\nArticle body\\nWhere will EURO 2024 be held?\\nGermany will host EURO 2024, having been chosen to stage the 17th edition of the UEFA European Championship at a UEFA Executive Committee meeting in Nyon on 27 September 2018. Host cities\\nEURO 2024 fixtures by venue\\nEURO 2024 fixtures by team\\nAlso visit\\nChange language\\nServices links and disclaimer\\n© 1998-2024 UEFA. Where and when will the final of UEFA EURO 2024 be played?\\nBerlin's Olympiastadion will stage the final on Sunday 14 July 2024.\\n The ten venues chosen to host games at the tournament include nine of the stadiums used at the 2006 World Cup plus the Düsseldorf Arena.\\n All you need to know\\nThursday, January 11, 2024\\nArticle summary\\nThree-time winners Germany will stage the UEFA European Championship in 2024.\\n","score":0.99497885,"raw_content":null}]`,
metadata: {},
id: undefined
}
]
}