跳到内容

使用本地 LLM 的自适应 RAG

自适应 RAG 是一种 RAG 策略,它结合了 (1) 查询分析 与 (2) 主动/自我修正的 RAG

论文中,他们报告了跨以下方式路由的查询分析

  • 无检索
  • 单次 RAG
  • 迭代 RAG

让我们使用 LangGraph 在此基础上构建。

在我们的实现中,我们将在以下方式之间路由

  • 网络搜索:用于与近期事件相关的问题
  • 自我修正的 RAG:用于与我们的索引相关的问题

Adaptive RAG graph

设置

首先,您需要安装一些必需的依赖项

npm install cheerio langchain @langchain/community @langchain/ollama @langchain/core

对于回退网络搜索,您还需要获取 Tavily API 密钥 并将其设置为名为 TAVILY_API_KEY 的环境变量。

模型

接下来,选择您将使用的本地模型。

本地嵌入

我们将使用来自 Ollama 的 mxbai-embed-large 嵌入模型。

本地 LLM

(1) 下载 Ollama 应用

(2) 在此处拉取 Llama 3 模型。您也可以尝试 此处Mistral 模型,此处的量化 Cohere Command-R 模型之一,或者您想从 Ollama 库尝试的任何其他模型 - 只是要确保您的计算机有足够的 RAM。

ollama pull llama3 mxbai-embed-large

追踪

可选地,使用 LangSmith 进行追踪(显示在底部)

// process.env.LANGCHAIN_TRACING_V2 = "true";
// process.env.LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com";
// process.env.LANGCHAIN_API_KEY = "<your-api-key>"

索引

现在您已经选择并设置了本地模型,加载一些源文档并建立索引。下面的代码使用了一些 Lilian Weng 的博客文章,内容关于 LLM 和 agents 作为数据源,然后将它们加载到演示 MemoryVectorStore 实例中。然后,它从该向量存储创建了一个 检索器,以供稍后使用。

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OllamaEmbeddings } from "@langchain/ollama";

const urls = [
  "https://lilianweng.github.io/posts/2023-06-23-agent/",
  "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
  "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
];

const docs = await Promise.all(urls.map((url) => {
  const loader = new CheerioWebBaseLoader(url);
  return loader.load();
}));

const docsList = docs.flat();

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 250,
  chunkOverlap: 0,
});

const splitDocs = await textSplitter.splitDocuments(docsList);

const embeddings = new OllamaEmbeddings({
  model: "mxbai-embed-large",
});

// Add to vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
  splitDocs,
  embeddings,
);

const retriever = vectorStore.asRetriever();

创建组件

在这里,您将创建图的组件。

问题路由器

首先,创建一个链,将传入的问题路由到您的向量存储(如果它们与 LLM 或 agents 相关),或者路由到通用网络搜索(如果它们不相关)。

您将使用 Ollama 的 JSON 模式,以帮助保持输出格式一致。

import { ChatPromptTemplate } from "@langchain/core/prompts";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatOllama } from "@langchain/ollama";

const jsonModeLlm = new ChatOllama({
  model: "llama3",
  format: "json",
  temperature: 0,
});

const QUESTION_ROUTER_SYSTEM_TEMPLATE =
  `You are an expert at routing a user question to a vectorstore or web search.
Use the vectorstore for questions on LLM agents, prompt engineering, and adversarial attacks.
You do not need to be stringent with the keywords in the question related to these topics.
Otherwise, use web-search. Give a binary choice 'web_search' or 'vectorstore' based on the question.
Return the a JSON with a single key 'datasource' and no preamble or explanation.`;

const questionRouterPrompt = ChatPromptTemplate.fromMessages([
  ["system", QUESTION_ROUTER_SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const questionRouter = questionRouterPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

await questionRouter.invoke({ question: "llm agent memory" });
{ datasource: 'vectorstore' }
上面,请注意您使用与我们的向量存储包含的知识相关的问题调用了路由器,因此它会做出相应的响应。这是当您询问一些不相关的内容时会发生的情况

await questionRouter.invoke({ question: "red robin" });
{ datasource: 'web_search' }
在这种情况下,您可以看到执行将被路由到我们的网络搜索。

检索评分器

创建一个评分器,它将检查从我们的向量存储中检索到的文档的相关性

const GRADER_TEMPLATE =
  `You are a grader assessing relevance of a retrieved document to a user question.
Here is the retrieved document:

<document>
{content}
</document>

Here is the user question:
<question>
{question}
</question>

If the document contains keywords related to the user question, grade it as relevant.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const graderPrompt = ChatPromptTemplate.fromTemplate(GRADER_TEMPLATE);

const retrievalGrader = graderPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

// Test run
const testQuestion = "agent memory";

const docs2 = await retriever.invoke(testQuestion);

await retrievalGrader.invoke({
  question: testQuestion,
  content: docs2[0].pageContent,
});
{ score: 'yes' }
您可以看到它将第一个检索到的文档标记为与 "agent memory" 相关。

生成

接下来,创建一个链,根据检索到的文档生成答案。

import * as hub from "langchain/hub";
import { StringOutputParser } from "@langchain/core/output_parsers";
import type { Document } from "@langchain/core/documents";

// https://smith.langchain.com/hub/rlm/rag-prompt
const ragPrompt = await hub.pull("rlm/rag-prompt");

// Post-processing
const formatDocs = (docs: Document[]) => {
  return docs.map((doc) => doc.pageContent).join("\n\n");
};

// Initialize a new model without JSON mode active
const llm = new ChatOllama({
  model: "llama3",
  temperature: 0,
});

// Chain
const ragChain = ragPrompt.pipe(llm).pipe(new StringOutputParser());

// Test run
const testQuestion2 = "agent memory";
const docs3 = await retriever.invoke(testQuestion2);

await ragChain.invoke({ context: formatDocs(docs3), question: testQuestion2 });
Based on the provided context, it appears that an agent's memory refers to its ability to record and reflect on past experiences, using both long-term and short-term memory modules. The long-term memory module, or "memory stream," stores a comprehensive list of agents' experiences in natural language, while the reflection mechanism synthesizes these memories into higher-level inferences over time to guide future behavior.

幻觉评分器

创建一个链,审查生成的答案并检查幻觉。我们将返回使用 JSON 模式来处理这个

const HALLUCINATION_GRADER_TEMPLATE =
  `You are a grader assessing whether an answer is grounded in / supported by a set of facts.
Here are the facts used as context to generate the answer:

<context>
{context} 
</context>

Here is the answer:

<answer>
{generation}
</answer>

Give a binary score 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const hallucinationGraderPrompt = ChatPromptTemplate.fromTemplate(
  HALLUCINATION_GRADER_TEMPLATE,
);

const hallucinationGrader = hallucinationGraderPrompt.pipe(llm).pipe(
  new JsonOutputParser(),
);

// Test run
const generation2 = await ragChain.invoke({
  context: formatDocs(docs3),
  question: testQuestion2,
});

await hallucinationGrader.invoke({ context: formatDocs(docs3), generation: generation2 });
{ score: 'yes' }

答案评分器

创建一个链,用于检查最终答案的相关性

const ANSWER_GRADER_PROMPT_TEMPLATE =
  `You are a grader assessing whether an answer is useful to resolve a question.
Here is the answer:

<answer>
{generation} 
</answer>

Here is the question:

<question>
{question}
</question>

Give a binary score 'yes' or 'no' to indicate whether the answer is useful to resolve a question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const answerGraderPrompt = ChatPromptTemplate.fromTemplate(
  ANSWER_GRADER_PROMPT_TEMPLATE,
);

const answerGrader = answerGraderPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

// Test run
const generation3 = await ragChain.invoke({
  context: formatDocs(docs3),
  question: testQuestion2,
});

await answerGrader.invoke({ question: testQuestion2, generation: generation3 });
{ score: 'yes' }

问题重写器

创建一个问题重写器。此链对用户问题执行 查询分析,并针对 RAG 优化它们,以帮助处理困难的查询。

const REWRITER_PROMPT_TEMPLATE =
  `You a question re-writer that converts an input question to a better version that is optimized
for vectorstore retrieval. Look at the initial and formulate an improved question.

Here is the initial question:

<question>
{question}
</question>

Respond only with an improved question. Do not include any preamble or explanation.`;

const rewriterPrompt = ChatPromptTemplate.fromTemplate(
  REWRITER_PROMPT_TEMPLATE,
);

const rewriter = rewriterPrompt.pipe(llm).pipe(new StringOutputParser());

// Test run

// Test question is "agent memory"
await rewriter.invoke({ question: testQuestion2 });
What are memories stored in by agents?

网络搜索工具

最后,您需要一个网络搜索工具,它可以处理索引文档范围之外的问题。下面的代码初始化了一个 Tavily 驱动的 搜索工具

import { TavilySearchResults } from "@langchain/community/tools/tavily_search";

const webSearchTool = new TavilySearchResults({ maxResults: 3 });

await webSearchTool.invoke("red robin");
[{"title":"Family Friendly Burger Restaurant | Red Robin","url":"https://www.redrobin.com/","content":"Red Robin is donating 10¢ to Make-A-Wish ® for every Kids Meal purchased. You can contribute to life-changing wishes by simply purchasing a Kids Meal at Red Robin for Dine-in or To-Go. Join us for a memorable meal or order online and help transform lives, one wish at a time.","score":0.998043,"raw_content":null},{"title":"Red Robin United States of America Directory","url":"https://locations.redrobin.com/locations-list/us/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin Restaurants in United States","score":0.99786776,"raw_content":null},{"title":"Red Robin Restaurant Locations","url":"https://locations.redrobin.com/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin","score":0.99718815,"raw_content":null}]

现在您已经创建了所有必要的组件,是时候将流程捕获为图了。

图状态

像这样定义图状态。由于 questiongeneration 是简单的字符串,我们可以使用 null 作为默认行为的简写

import type { Document } from "@langchain/core/documents";
import { Annotation } from "@langchain/langgraph";

// This defines the agent state.
// Returned documents from a node will override the current
// "documents" value in the state object.
const GraphState = Annotation.Root({
  question: Annotation<string>,
  generation: Annotation<string>,
  documents: Annotation<Document[]>({
    reducer: (_, y) => y,
    default: () => [],
  })
})

准备节点和边

让我们将我们的组件包装在与 LangGraph 要求的接口匹配的函数中。这些函数将处理格式化输入和输出。

我们将在节点中使用一些组件,而另一些用于定义条件边。每个组件都将图状态作为参数。节点返回要更新的状态属性,而条件边返回要执行的下一个节点的名称。

import { Document } from "@langchain/core/documents";

/* ---Nodes--- */

// Retrieve documents for a question
const retrieve = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---RETRIEVE---");
  const documents = await retriever.invoke(state.question);
  // Add sources to the state
  return { documents };
};

// RAG generation
const generate = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---GENERATE---");
  const generation = await ragChain.invoke({
    context: formatDocs(state.documents),
    question: state.question,
  });
  // Add generation to the state
  return { generation };
};

// Determines whether the retrieved documents are relevant to the question.
const gradeDocuments = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---CHECK DOCUMENT RELEVANCE TO QUESTION---");
  // Score each doc
  const relevantDocs: Document[] = [];
  for (const doc of state.documents) {
    const grade: { score: string } = await retrievalGrader.invoke({
      question: state.question,
      content: doc.pageContent,
    });
    if (grade.score === "yes") {
      console.log("---GRADE: DOCUMENT RELEVANT---");
      relevantDocs.push(doc);
    } else {
      console.log("---GRADE: DOCUMENT NOT RELEVANT---");
    }
  }
  return { documents: relevantDocs };
};

// Re-write question
const transformQuery = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---TRANSFORM QUERY---");
  const betterQuestion = await rewriter.invoke({ question: state.question });
  return { question: betterQuestion };
};

// Web search based on the re-phrased question
const webSearch = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---WEB SEARCH---");
  const stringifiedSearchResults = await webSearchTool.invoke(state.question);
  return {
    documents: [new Document({ pageContent: stringifiedSearchResults })],
  };
};

/* ---Edges--- */

// Decide on the datasource to route the initial question to.
const routeQuestion = async (state: typeof GraphState.State) => {
  const source: { datasource: string } = await questionRouter.invoke({
    question: state.question,
  });
  if (source.datasource === "web_search") {
    console.log(`---ROUTING QUESTION "${state.question} TO WEB SEARCH---`);
    return "web_search";
  } else {
    console.log(`---ROUTING QUESTION "${state.question} TO RAG---`);
    return "retrieve";
  }
};

// Decide whether the current documents are sufficiently relevant
// to come up with a good answer.
const decideToGenerate = async (state: typeof GraphState.State) => {
  const filteredDocuments = state.documents;
  // All documents have been filtered as irrelevant
  // Regenerate a new query and try again
  if (filteredDocuments.length === 0) {
    console.log(
      "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---",
    );
    return "transform_query";
  } else {
    // We have relevant documents, so generate answer.
    console.log("---DECISION: GENERATE---");
    return "generate";
  }
};

// Determines whether the generation is grounded in the document and answers question.
const gradeGenerationDocumentsAndQuestion = async (
  state: typeof GraphState.State,
) => {
  const hallucinationGrade: { score: string } = await hallucinationGrader
    .invoke({
      generation: state.generation,
      context: formatDocs(state.documents),
    });
  // Check for hallucination
  if (hallucinationGrade.score === "yes") {
    console.log("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---");
    // Check question answering
    console.log("---GRADING GENERATION vs. QUESTION---");
    const onTopicGrade: { score: string } = await answerGrader.invoke({
      question: state.question,
      generation: state.generation,
    });
    if (onTopicGrade.score === "yes") {
      console.log("---DECISION: GENERATION ADDRESSES QUESTION---");
      return "useful";
    } else {
      console.log("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---");
      return "not_useful";
    }
  } else {
    console.log(
      "---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RETRY---",
    );
    return "not_supported";
  }
};

构建图

现在我们构建图。为了好玩,让我们添加一个检查点,并让编译后的图在进行网络搜索之前暂停。这将模拟请求权限。

import { END, MemorySaver, START, StateGraph } from "@langchain/langgraph";

const graph = new StateGraph(GraphState)
  .addNode("web_search", webSearch)
  .addNode("retrieve", retrieve)
  .addNode("grade_documents", gradeDocuments)
  .addNode("generate", generate)
  .addNode("transform_query", transformQuery)
  .addConditionalEdges(START, routeQuestion)
  .addEdge("web_search", "generate")
  .addEdge("retrieve", "grade_documents")
  .addConditionalEdges("grade_documents", decideToGenerate)
  .addEdge("transform_query", "retrieve")
  .addConditionalEdges("generate", gradeGenerationDocumentsAndQuestion, {
    not_supported: "generate",
    useful: END,
    not_useful: "transform_query",
  });

const app = graph.compile({
  checkpointer: new MemorySaver(),
  interruptBefore: ["web_search"],
});

运行图

您已全部设置完毕!是时候问一些问题了。首先,尝试问一个关于与 agents 相关的问题

await app.invoke(
  {
    question: "What are some features of long-term memory?",
  },
  { configurable: { thread_id: "1" } },
);
---ROUTING QUESTION "What are some features of long-term memory? TO WEB SEARCH---
{
  question: 'What are some features of long-term memory?',
  documents: []
}
您可以看到您的图正确地将查询路由到向量存储并回答了问题,并在必要时过滤掉了一些文档。

如果您询问一些与 agents 或 LLM 无关的内容,则图应回退到从网络收集的信息。该图将在执行前暂停,如上所述

await app.invoke(
  {
    question: "Where are the 2024 Euros being held?",
  },
  { configurable: { thread_id: "2" } },
);
---ROUTING QUESTION "Where are the 2024 Euros being held? TO WEB SEARCH---
{ question: 'Where are the 2024 Euros being held?', documents: [] }
您可以看到图在运行网络搜索之前暂停了。现在我们通过使用 null 调用图来继续

await app.invoke(null, { configurable: { thread_id: "2" } });
---WEB SEARCH---
---GENERATE---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADING GENERATION vs. QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
{
  question: 'Where are the 2024 Euros being held?',
  generation: 'The 2024 Euros are being held in Germany. The final match will take place at Olympiastadion Berlin on July 14, 2024.',
  documents: [
    Document {
      pageContent: `[{"title":"Where is Euro 2024? Country, host cities and venues","url":"https://www.radiotimes.com/tv/sport/football/euro-2024-location/","content":"Euro 2024 stadiums The Olympiastadion Berlin, the biggest stadium in Germany with a capacity of around 74,000, will host games as well as the final on Sunday, 14th July, 2024.","score":0.99743915,"raw_content":null},{"title":"UEFA EURO 2024 venues - complete list: When and where will the opening ...","url":"https://olympics.com/en/news/uefa-euro-2024-venues-complete-list-when-where-final-opening-game","content":"UEFA EURO 2024 will be held in Germany across June and July, with 10 host cities staging the major football tournament.. It all begins in Munich on June 14, when hosts Germany take on Scotland in the tournament's opening game at Bayern Munich's stadium.. The final takes place a month later on July 14 at Olympiastadion Berlin in the German capital, which hosted the 2006 FIFA World Cup final ...","score":0.9973061,"raw_content":null},{"title":"EURO 2024: All you need to know | UEFA EURO 2024","url":"https://www.uefa.com/euro2024/news/0257-0e13b161b2e8-4a3fd5615e0c-1000--euro-2024-all-you-need-to-know/","content":"Article top media content\\nArticle body\\nWhere will EURO 2024 be held?\\nGermany will host EURO 2024, having been chosen to stage the 17th edition of the UEFA European Championship at a UEFA Executive Committee meeting in Nyon on 27 September 2018. Host cities\\nEURO 2024 fixtures by venue\\nEURO 2024 fixtures by team\\nAlso visit\\nChange language\\nServices links and disclaimer\\n© 1998-2024 UEFA. Where and when will the final of UEFA EURO 2024 be played?\\nBerlin's Olympiastadion will stage the final on Sunday 14 July 2024.\\n The ten venues chosen to host games at the tournament include nine of the stadiums used at the 2006 World Cup plus the Düsseldorf Arena.\\n All you need to know\\nThursday, January 11, 2024\\nArticle summary\\nThree-time winners Germany will stage the UEFA European Championship in 2024.\\n","score":0.99497885,"raw_content":null}]`,
      metadata: {},
      id: undefined
    }
  ]
}