跳到内容

使用本地 LLM 的自适应 RAG

自适应 RAG 是一种 RAG 策略,它将 (1) 查询分析 与 (2) 主动/自校正 RAG 结合起来。

在该论文中,他们报告了用于路由查询分析以跨越:

  • 无检索
  • 单次 RAG
  • 迭代 RAG

让我们使用 LangGraph 来构建。

在我们的实现中,我们将在以下两者之间路由:

  • 网页搜索:用于与近期事件相关的问题
  • 自校正 RAG:用于与我们的索引相关的问题

Adaptive RAG graph

设置

首先,你需要安装一些必需的依赖项

npm install cheerio langchain @langchain/community @langchain/ollama @langchain/core

对于回退网页搜索,你还需要获取一个 Tavily API 密钥并将其设置为名为 TAVILY_API_KEY 的环境变量。

模型

接下来,选择你将使用的本地模型。

本地嵌入

我们将使用 Ollama 的 mxbai-embed-large 嵌入模型。

本地 LLM

(1) 下载 Ollama 应用

(2) 在此处拉取一个 Llama 3 模型。你还可以在此处尝试 Mistral 模型、量化的 Cohere Command-R 模型之一,或从 Ollama 库中你想要尝试的任何其他模型 - 只需确保你的计算机有足够的 RAM。

ollama pull llama3 mxbai-embed-large

追踪

可选地,使用 LangSmith 进行追踪(底部显示)

// process.env.LANGCHAIN_TRACING_V2 = "true";
// process.env.LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com";
// process.env.LANGCHAIN_API_KEY = "<your-api-key>"

索引

现在你已经选择了并设置了你的本地模型,加载并索引一些源文档。下面的代码使用 Lilian Weng 关于 LLM 和 agent 的一些博客文章作为数据源,然后将它们加载到演示用的 MemoryVectorStore 实例中。然后它从该向量存储创建一个 retriever 以供后续使用。

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OllamaEmbeddings } from "@langchain/ollama";

const urls = [
  "https://lilianweng.github.io/posts/2023-06-23-agent/",
  "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
  "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
];

const docs = await Promise.all(urls.map((url) => {
  const loader = new CheerioWebBaseLoader(url);
  return loader.load();
}));

const docsList = docs.flat();

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 250,
  chunkOverlap: 0,
});

const splitDocs = await textSplitter.splitDocuments(docsList);

const embeddings = new OllamaEmbeddings({
  model: "mxbai-embed-large",
});

// Add to vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
  splitDocs,
  embeddings,
);

const retriever = vectorStore.asRetriever();

创建组件

在这里,你将创建图的组件。

问题路由器

首先,创建一个链,该链将传入的问题路由到你的向量存储(如果它们与 LLM 或 agent 相关)或通用网页搜索(如果它们不相关)。

你将使用 Ollama 的 JSON 模式来帮助保持输出格式的一致性。

import { ChatPromptTemplate } from "@langchain/core/prompts";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatOllama } from "@langchain/ollama";

const jsonModeLlm = new ChatOllama({
  model: "llama3",
  format: "json",
  temperature: 0,
});

const QUESTION_ROUTER_SYSTEM_TEMPLATE =
  `You are an expert at routing a user question to a vectorstore or web search.
Use the vectorstore for questions on LLM agents, prompt engineering, and adversarial attacks.
You do not need to be stringent with the keywords in the question related to these topics.
Otherwise, use web-search. Give a binary choice 'web_search' or 'vectorstore' based on the question.
Return the a JSON with a single key 'datasource' and no preamble or explanation.`;

const questionRouterPrompt = ChatPromptTemplate.fromMessages([
  ["system", QUESTION_ROUTER_SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const questionRouter = questionRouterPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

await questionRouter.invoke({ question: "llm agent memory" });
{ datasource: 'vectorstore' }
上面,注意到你用一个与我们的向量存储包含的知识相关的查询调用了路由器,因此它会相应地响应。如果你问一些不相关的问题会发生什么:

await questionRouter.invoke({ question: "red robin" });
{ datasource: 'web_search' }
在这种情况下,你可以看到执行将被路由到我们的网页搜索。

检索评估器

创建一个评估器,它将检查从我们的向量存储中检索到的文档的相关性

const GRADER_TEMPLATE =
  `You are a grader assessing relevance of a retrieved document to a user question.
Here is the retrieved document:

<document>
{content}
</document>

Here is the user question:
<question>
{question}
</question>

If the document contains keywords related to the user question, grade it as relevant.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const graderPrompt = ChatPromptTemplate.fromTemplate(GRADER_TEMPLATE);

const retrievalGrader = graderPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

// Test run
const testQuestion = "agent memory";

const docs2 = await retriever.invoke(testQuestion);

await retrievalGrader.invoke({
  question: testQuestion,
  content: docs2[0].pageContent,
});
{ score: 'yes' }
你可以看到它将第一个检索到的文档标记为与 "agent memory" 相关。

生成

接下来,创建一个基于检索到的文档生成答案的链。

import * as hub from "langchain/hub";
import { StringOutputParser } from "@langchain/core/output_parsers";
import type { Document } from "@langchain/core/documents";

// https://smith.langchain.com/hub/rlm/rag-prompt
const ragPrompt = await hub.pull("rlm/rag-prompt");

// Post-processing
const formatDocs = (docs: Document[]) => {
  return docs.map((doc) => doc.pageContent).join("\n\n");
};

// Initialize a new model without JSON mode active
const llm = new ChatOllama({
  model: "llama3",
  temperature: 0,
});

// Chain
const ragChain = ragPrompt.pipe(llm).pipe(new StringOutputParser());

// Test run
const testQuestion2 = "agent memory";
const docs3 = await retriever.invoke(testQuestion2);

await ragChain.invoke({ context: formatDocs(docs3), question: testQuestion2 });
Based on the provided context, it appears that an agent's memory refers to its ability to record and reflect on past experiences, using both long-term and short-term memory modules. The long-term memory module, or "memory stream," stores a comprehensive list of agents' experiences in natural language, while the reflection mechanism synthesizes these memories into higher-level inferences over time to guide future behavior.

幻觉评估器

创建一个链,该链审查生成的答案并检查是否存在幻觉。我们将为此再次使用 JSON 模式

const HALLUCINATION_GRADER_TEMPLATE =
  `You are a grader assessing whether an answer is grounded in / supported by a set of facts.
Here are the facts used as context to generate the answer:

<context>
{context} 
</context>

Here is the answer:

<answer>
{generation}
</answer>

Give a binary score 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const hallucinationGraderPrompt = ChatPromptTemplate.fromTemplate(
  HALLUCINATION_GRADER_TEMPLATE,
);

const hallucinationGrader = hallucinationGraderPrompt.pipe(llm).pipe(
  new JsonOutputParser(),
);

// Test run
const generation2 = await ragChain.invoke({
  context: formatDocs(docs3),
  question: testQuestion2,
});

await hallucinationGrader.invoke({ context: formatDocs(docs3), generation: generation2 });
{ score: 'yes' }

答案评估器

创建一个用于检查最终答案相关性的链。

const ANSWER_GRADER_PROMPT_TEMPLATE =
  `You are a grader assessing whether an answer is useful to resolve a question.
Here is the answer:

<answer>
{generation} 
</answer>

Here is the question:

<question>
{question}
</question>

Give a binary score 'yes' or 'no' to indicate whether the answer is useful to resolve a question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const answerGraderPrompt = ChatPromptTemplate.fromTemplate(
  ANSWER_GRADER_PROMPT_TEMPLATE,
);

const answerGrader = answerGraderPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

// Test run
const generation3 = await ragChain.invoke({
  context: formatDocs(docs3),
  question: testQuestion2,
});

await answerGrader.invoke({ question: testQuestion2, generation: generation3 });
{ score: 'yes' }

问题重写器

创建一个问题重写器。此链对用户问题执行查询分析,并针对 RAG 进行优化,以帮助处理疑难查询。

const REWRITER_PROMPT_TEMPLATE =
  `You a question re-writer that converts an input question to a better version that is optimized
for vectorstore retrieval. Look at the initial and formulate an improved question.

Here is the initial question:

<question>
{question}
</question>

Respond only with an improved question. Do not include any preamble or explanation.`;

const rewriterPrompt = ChatPromptTemplate.fromTemplate(
  REWRITER_PROMPT_TEMPLATE,
);

const rewriter = rewriterPrompt.pipe(llm).pipe(new StringOutputParser());

// Test run

// Test question is "agent memory"
await rewriter.invoke({ question: testQuestion2 });
What are memories stored in by agents?

网页搜索工具

最后,你需要一个网页搜索工具来处理超出索引文档范围的问题。下面的代码初始化了一个由 Tavily 提供支持的搜索工具

import { TavilySearchResults } from "@langchain/community/tools/tavily_search";

const webSearchTool = new TavilySearchResults({ maxResults: 3 });

await webSearchTool.invoke("red robin");
[{"title":"Family Friendly Burger Restaurant | Red Robin","url":"https://www.redrobin.com/","content":"Red Robin is donating 10¢ to Make-A-Wish ® for every Kids Meal purchased. You can contribute to life-changing wishes by simply purchasing a Kids Meal at Red Robin for Dine-in or To-Go. Join us for a memorable meal or order online and help transform lives, one wish at a time.","score":0.998043,"raw_content":null},{"title":"Red Robin United States of America Directory","url":"https://locations.redrobin.com/locations-list/us/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin Restaurants in United States","score":0.99786776,"raw_content":null},{"title":"Red Robin Restaurant Locations","url":"https://locations.redrobin.com/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin","score":0.99718815,"raw_content":null}]

现在你已经创建了所有必需的组件,是时候将流程捕获为图了。

图状态

像这样定义图状态。由于 questiongeneration 是简单的字符串,我们可以使用 null 作为默认行为的简写。

import type { Document } from "@langchain/core/documents";
import { Annotation } from "@langchain/langgraph";

// This defines the agent state.
// Returned documents from a node will override the current
// "documents" value in the state object.
const GraphState = Annotation.Root({
  question: Annotation<string>,
  generation: Annotation<string>,
  documents: Annotation<Document[]>({
    reducer: (_, y) => y,
    default: () => [],
  })
})

准备节点和边

让我们将组件包装在与 LangGraph 要求的接口匹配的函数中。这些函数将处理输入和输出的格式化。

我们将在节点内部使用一些组件,其他组件用于定义条件边。每个都将图状态作为参数。节点返回要更新的状态属性,而条件边返回要执行的下一个节点的名称。

import { Document } from "@langchain/core/documents";

/* ---Nodes--- */

// Retrieve documents for a question
const retrieve = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---RETRIEVE---");
  const documents = await retriever.invoke(state.question);
  // Add sources to the state
  return { documents };
};

// RAG generation
const generate = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---GENERATE---");
  const generation = await ragChain.invoke({
    context: formatDocs(state.documents),
    question: state.question,
  });
  // Add generation to the state
  return { generation };
};

// Determines whether the retrieved documents are relevant to the question.
const gradeDocuments = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---CHECK DOCUMENT RELEVANCE TO QUESTION---");
  // Score each doc
  const relevantDocs: Document[] = [];
  for (const doc of state.documents) {
    const grade: { score: string } = await retrievalGrader.invoke({
      question: state.question,
      content: doc.pageContent,
    });
    if (grade.score === "yes") {
      console.log("---GRADE: DOCUMENT RELEVANT---");
      relevantDocs.push(doc);
    } else {
      console.log("---GRADE: DOCUMENT NOT RELEVANT---");
    }
  }
  return { documents: relevantDocs };
};

// Re-write question
const transformQuery = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---TRANSFORM QUERY---");
  const betterQuestion = await rewriter.invoke({ question: state.question });
  return { question: betterQuestion };
};

// Web search based on the re-phrased question
const webSearch = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---WEB SEARCH---");
  const stringifiedSearchResults = await webSearchTool.invoke(state.question);
  return {
    documents: [new Document({ pageContent: stringifiedSearchResults })],
  };
};

/* ---Edges--- */

// Decide on the datasource to route the initial question to.
const routeQuestion = async (state: typeof GraphState.State) => {
  const source: { datasource: string } = await questionRouter.invoke({
    question: state.question,
  });
  if (source.datasource === "web_search") {
    console.log(`---ROUTING QUESTION "${state.question} TO WEB SEARCH---`);
    return "web_search";
  } else {
    console.log(`---ROUTING QUESTION "${state.question} TO RAG---`);
    return "retrieve";
  }
};

// Decide whether the current documents are sufficiently relevant
// to come up with a good answer.
const decideToGenerate = async (state: typeof GraphState.State) => {
  const filteredDocuments = state.documents;
  // All documents have been filtered as irrelevant
  // Regenerate a new query and try again
  if (filteredDocuments.length === 0) {
    console.log(
      "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---",
    );
    return "transform_query";
  } else {
    // We have relevant documents, so generate answer.
    console.log("---DECISION: GENERATE---");
    return "generate";
  }
};

// Determines whether the generation is grounded in the document and answers question.
const gradeGenerationDocumentsAndQuestion = async (
  state: typeof GraphState.State,
) => {
  const hallucinationGrade: { score: string } = await hallucinationGrader
    .invoke({
      generation: state.generation,
      context: formatDocs(state.documents),
    });
  // Check for hallucination
  if (hallucinationGrade.score === "yes") {
    console.log("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---");
    // Check question answering
    console.log("---GRADING GENERATION vs. QUESTION---");
    const onTopicGrade: { score: string } = await answerGrader.invoke({
      question: state.question,
      generation: state.generation,
    });
    if (onTopicGrade.score === "yes") {
      console.log("---DECISION: GENERATION ADDRESSES QUESTION---");
      return "useful";
    } else {
      console.log("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---");
      return "not_useful";
    }
  } else {
    console.log(
      "---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RETRY---",
    );
    return "not_supported";
  }
};

构建图

现在我们构建图。为了好玩,让我们添加一个检查点,并让编译后的图在进行网页搜索之前暂停。这将模拟请求许可。

import { END, MemorySaver, START, StateGraph } from "@langchain/langgraph";

const graph = new StateGraph(GraphState)
  .addNode("web_search", webSearch)
  .addNode("retrieve", retrieve)
  .addNode("grade_documents", gradeDocuments)
  .addNode("generate", generate)
  .addNode("transform_query", transformQuery)
  .addConditionalEdges(START, routeQuestion)
  .addEdge("web_search", "generate")
  .addEdge("retrieve", "grade_documents")
  .addConditionalEdges("grade_documents", decideToGenerate)
  .addEdge("transform_query", "retrieve")
  .addConditionalEdges("generate", gradeGenerationDocumentsAndQuestion, {
    not_supported: "generate",
    useful: END,
    not_useful: "transform_query",
  });

const app = graph.compile({
  checkpointer: new MemorySaver(),
  interruptBefore: ["web_search"],
});

运行图

你都准备好了!是时候问一些问题了。首先,尝试一个与 agent 相关的问题

await app.invoke(
  {
    question: "What are some features of long-term memory?",
  },
  { configurable: { thread_id: "1" } },
);
---ROUTING QUESTION "What are some features of long-term memory? TO WEB SEARCH---
{
  question: 'What are some features of long-term memory?',
  documents: []
}
你可以看到你的图正确地将查询路由到向量存储并回答问题,同时过滤掉一些不必要的文档。

如果你问一些与 agent 或 LLM 不相关的问题,图应该回退到从网上收集的信息。图将在执行前暂停,如上所述。

await app.invoke(
  {
    question: "Where are the 2024 Euros being held?",
  },
  { configurable: { thread_id: "2" } },
);
---ROUTING QUESTION "Where are the 2024 Euros being held? TO WEB SEARCH---
{ question: 'Where are the 2024 Euros being held?', documents: [] }
你可以看到图在运行网页搜索前暂停了。现在我们通过使用 null 调用图来继续。

await app.invoke(null, { configurable: { thread_id: "2" } });
---WEB SEARCH---
---GENERATE---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADING GENERATION vs. QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
{
  question: 'Where are the 2024 Euros being held?',
  generation: 'The 2024 Euros are being held in Germany. The final match will take place at Olympiastadion Berlin on July 14, 2024.',
  documents: [
    Document {
      pageContent: `[{"title":"Where is Euro 2024? Country, host cities and venues","url":"https://www.radiotimes.com/tv/sport/football/euro-2024-location/","content":"Euro 2024 stadiums The Olympiastadion Berlin, the biggest stadium in Germany with a capacity of around 74,000, will host games as well as the final on Sunday, 14th July, 2024.","score":0.99743915,"raw_content":null},{"title":"UEFA EURO 2024 venues - complete list: When and where will the opening ...","url":"https://olympics.com/en/news/uefa-euro-2024-venues-complete-list-when-where-final-opening-game","content":"UEFA EURO 2024 will be held in Germany across June and July, with 10 host cities staging the major football tournament.. It all begins in Munich on June 14, when hosts Germany take on Scotland in the tournament's opening game at Bayern Munich's stadium.. The final takes place a month later on July 14 at Olympiastadion Berlin in the German capital, which hosted the 2006 FIFA World Cup final ...","score":0.9973061,"raw_content":null},{"title":"EURO 2024: All you need to know | UEFA EURO 2024","url":"https://www.uefa.com/euro2024/news/0257-0e13b161b2e8-4a3fd5615e0c-1000--euro-2024-all-you-need-to-know/","content":"Article top media content\\nArticle body\\nWhere will EURO 2024 be held?\\nGermany will host EURO 2024, having been chosen to stage the 17th edition of the UEFA European Championship at a UEFA Executive Committee meeting in Nyon on 27 September 2018. Host cities\\nEURO 2024 fixtures by venue\\nEURO 2024 fixtures by team\\nAlso visit\\nChange language\\nServices links and disclaimer\\n© 1998-2024 UEFA. Where and when will the final of UEFA EURO 2024 be played?\\nBerlin's Olympiastadion will stage the final on Sunday 14 July 2024.\\n The ten venues chosen to host games at the tournament include nine of the stadiums used at the 2006 World Cup plus the Düsseldorf Arena.\\n All you need to know\\nThursday, January 11, 2024\\nArticle summary\\nThree-time winners Germany will stage the UEFA European Championship in 2024.\\n","score":0.99497885,"raw_content":null}]`,
      metadata: {},
      id: undefined
    }
  ]
}