使用本地LLM的自适应RAG¶

自适应RAG是一种结合了(1)查询分析和(2)主动/自我纠正RAG的RAG策略。

在论文中，他们报告了查询分析用于路由

不检索
单次RAG
迭代RAG

让我们使用LangGraph在此基础上进行构建。

在我们的实现中，我们将在以下之间进行路由

网络搜索：用于与近期事件相关的问题
自我纠正RAG：用于与我们的索引相关的问题

Adaptive RAG graph

设置¶

首先，你需要安装一些必要的依赖项

npm install cheerio langchain @langchain/community @langchain/ollama @langchain/core

对于备用网络搜索，你还需要获取一个Tavily API 密钥，并将其设置为名为TAVILY_API_KEY的环境变量。

模型¶

接下来，选择你将使用的本地模型。

本地嵌入¶

我们将使用Ollama的mxbai-embed-large嵌入模型。

本地LLM¶

(1) 下载Ollama 应用。

(2) 从这里拉取一个Llama 3模型。你也可以尝试这里的Mistral模型，量化的Cohere Command-R模型之一，或者任何你希望从Ollama 库尝试的其他模型——请务必确保你的计算机有足够的RAM。

ollama pull llama3 mxbai-embed-large

追踪¶

（可选）使用LangSmith进行追踪（底部显示）

// process.env.LANGCHAIN_TRACING_V2 = "true";
// process.env.LANGCHAIN_ENDPOINT = "https://api.smith.langchain.com";
// process.env.LANGCHAIN_API_KEY = "<your-api-key>"

索引¶

现在你已经选择了并设置好了本地模型，接下来加载并索引一些源文档。下面的代码使用Lilian Weng的一些关于LLM和代理的博客文章作为数据源，然后将它们加载到演示MemoryVectorStore实例中。然后，它从该向量存储创建了一个检索器，供后续使用。

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OllamaEmbeddings } from "@langchain/ollama";

const urls = [
  "https://lilianweng.github.io/posts/2023-06-23-agent/",
  "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
  "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
];

const docs = await Promise.all(urls.map((url) => {
  const loader = new CheerioWebBaseLoader(url);
  return loader.load();
}));

const docsList = docs.flat();

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 250,
  chunkOverlap: 0,
});

const splitDocs = await textSplitter.splitDocuments(docsList);

const embeddings = new OllamaEmbeddings({
  model: "mxbai-embed-large",
});

// Add to vector store
const vectorStore = await MemoryVectorStore.fromDocuments(
  splitDocs,
  embeddings,
);

const retriever = vectorStore.asRetriever();

创建组件¶

在这里，你将创建图的组件。

问题路由器¶

首先，创建一个链，它将根据传入的问题是否与LLM或代理相关，将其路由到你的向量存储，如果不是，则路由到通用网络搜索。

你将使用Ollama的JSON模式来帮助保持输出格式的一致性。

import { ChatPromptTemplate } from "@langchain/core/prompts";
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatOllama } from "@langchain/ollama";

const jsonModeLlm = new ChatOllama({
  model: "llama3",
  format: "json",
  temperature: 0,
});

const QUESTION_ROUTER_SYSTEM_TEMPLATE =
  `You are an expert at routing a user question to a vectorstore or web search.
Use the vectorstore for questions on LLM agents, prompt engineering, and adversarial attacks.
You do not need to be stringent with the keywords in the question related to these topics.
Otherwise, use web-search. Give a binary choice 'web_search' or 'vectorstore' based on the question.
Return the a JSON with a single key 'datasource' and no preamble or explanation.`;

const questionRouterPrompt = ChatPromptTemplate.fromMessages([
  ["system", QUESTION_ROUTER_SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const questionRouter = questionRouterPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

await questionRouter.invoke({ question: "llm agent memory" });

{ datasource: 'vectorstore' }

上面，请注意，你使用与我们向量存储中包含的知识相关的查询调用了路由器，因此它做出了相应的响应。以下是当你提出不相关的问题时会发生的情况

await questionRouter.invoke({ question: "red robin" });

{ datasource: 'web_search' }

在这种情况下，你可以看到执行将被路由到我们的网络搜索。

检索分级器¶

创建一个分级器，它将检查从我们的向量存储中检索到的文档的相关性。

const GRADER_TEMPLATE =
  `You are a grader assessing relevance of a retrieved document to a user question.
Here is the retrieved document:

<document>
{content}
</document>

Here is the user question:
<question>
{question}
</question>

If the document contains keywords related to the user question, grade it as relevant.
It does not need to be a stringent test. The goal is to filter out erroneous retrievals.
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const graderPrompt = ChatPromptTemplate.fromTemplate(GRADER_TEMPLATE);

const retrievalGrader = graderPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

// Test run
const testQuestion = "agent memory";

const docs2 = await retriever.invoke(testQuestion);

await retrievalGrader.invoke({
  question: testQuestion,
  content: docs2[0].pageContent,
});

{ score: 'yes' }

你可以看到它将第一个检索到的文档标记为与"agent memory"相关。

生成¶

接下来，创建一个链，根据检索到的文档生成答案。

import * as hub from "langchain/hub";
import { StringOutputParser } from "@langchain/core/output_parsers";
import type { Document } from "@langchain/core/documents";

// https://smith.langchain.com/hub/rlm/rag-prompt
const ragPrompt = await hub.pull("rlm/rag-prompt");

// Post-processing
const formatDocs = (docs: Document[]) => {
  return docs.map((doc) => doc.pageContent).join("\n\n");
};

// Initialize a new model without JSON mode active
const llm = new ChatOllama({
  model: "llama3",
  temperature: 0,
});

// Chain
const ragChain = ragPrompt.pipe(llm).pipe(new StringOutputParser());

// Test run
const testQuestion2 = "agent memory";
const docs3 = await retriever.invoke(testQuestion2);

await ragChain.invoke({ context: formatDocs(docs3), question: testQuestion2 });

Based on the provided context, it appears that an agent's memory refers to its ability to record and reflect on past experiences, using both long-term and short-term memory modules. The long-term memory module, or "memory stream," stores a comprehensive list of agents' experiences in natural language, while the reflection mechanism synthesizes these memories into higher-level inferences over time to guide future behavior.

幻觉分级器¶

创建一个链，它审查生成的答案并检查是否存在幻觉。我们将再次使用JSON模式来实现此功能

const HALLUCINATION_GRADER_TEMPLATE =
  `You are a grader assessing whether an answer is grounded in / supported by a set of facts.
Here are the facts used as context to generate the answer:

<context>
{context} 
</context>

Here is the answer:

<answer>
{generation}
</answer>

Give a binary score 'yes' or 'no' score to indicate whether the answer is grounded in / supported by a set of facts.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const hallucinationGraderPrompt = ChatPromptTemplate.fromTemplate(
  HALLUCINATION_GRADER_TEMPLATE,
);

const hallucinationGrader = hallucinationGraderPrompt.pipe(llm).pipe(
  new JsonOutputParser(),
);

// Test run
const generation2 = await ragChain.invoke({
  context: formatDocs(docs3),
  question: testQuestion2,
});

await hallucinationGrader.invoke({ context: formatDocs(docs3), generation: generation2 });

{ score: 'yes' }

答案分级器¶

创建一个用于检查最终答案相关性的链

const ANSWER_GRADER_PROMPT_TEMPLATE =
  `You are a grader assessing whether an answer is useful to resolve a question.
Here is the answer:

<answer>
{generation} 
</answer>

Here is the question:

<question>
{question}
</question>

Give a binary score 'yes' or 'no' to indicate whether the answer is useful to resolve a question.
Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.`;

const answerGraderPrompt = ChatPromptTemplate.fromTemplate(
  ANSWER_GRADER_PROMPT_TEMPLATE,
);

const answerGrader = answerGraderPrompt.pipe(jsonModeLlm).pipe(
  new JsonOutputParser(),
);

// Test run
const generation3 = await ragChain.invoke({
  context: formatDocs(docs3),
  question: testQuestion2,
});

await answerGrader.invoke({ question: testQuestion2, generation: generation3 });

{ score: 'yes' }

问题重写器¶

创建一个问题重写器。此链对用户问题执行查询分析，并为RAG优化这些问题，以帮助处理困难的查询。

const REWRITER_PROMPT_TEMPLATE =
  `You a question re-writer that converts an input question to a better version that is optimized
for vectorstore retrieval. Look at the initial and formulate an improved question.

Here is the initial question:

<question>
{question}
</question>

Respond only with an improved question. Do not include any preamble or explanation.`;

const rewriterPrompt = ChatPromptTemplate.fromTemplate(
  REWRITER_PROMPT_TEMPLATE,
);

const rewriter = rewriterPrompt.pipe(llm).pipe(new StringOutputParser());

// Test run

// Test question is "agent memory"
await rewriter.invoke({ question: testQuestion2 });

What are memories stored in by agents?

网络搜索工具¶

最后，你需要一个网络搜索工具，它能够处理超出索引文档范围的问题。下面的代码初始化了一个Tavily支持的搜索工具

import { TavilySearchResults } from "@langchain/community/tools/tavily_search";

const webSearchTool = new TavilySearchResults({ maxResults: 3 });

await webSearchTool.invoke("red robin");

[{"title":"Family Friendly Burger Restaurant | Red Robin","url":"https://www.redrobin.com/","content":"Red Robin is donating 10¢ to Make-A-Wish ® for every Kids Meal purchased. You can contribute to life-changing wishes by simply purchasing a Kids Meal at Red Robin for Dine-in or To-Go. Join us for a memorable meal or order online and help transform lives, one wish at a time.","score":0.998043,"raw_content":null},{"title":"Red Robin United States of America Directory","url":"https://locations.redrobin.com/locations-list/us/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin Restaurants in United States","score":0.99786776,"raw_content":null},{"title":"Red Robin Restaurant Locations","url":"https://locations.redrobin.com/","content":"Maps, Driving Directions and Local Restaurant Information for Red Robin","score":0.99718815,"raw_content":null}]

图¶

现在你已经创建了所有必要的组件，是时候将流程捕获为图了。

图状态¶

像这样定义图状态。由于question和generation是简单的字符串，我们可以使用null作为默认行为的缩写

import type { Document } from "@langchain/core/documents";
import { Annotation } from "@langchain/langgraph";

// This defines the agent state.
// Returned documents from a node will override the current
// "documents" value in the state object.
const GraphState = Annotation.Root({
  question: Annotation<string>,
  generation: Annotation<string>,
  documents: Annotation<Document[]>({
    reducer: (_, y) => y,
    default: () => [],
  })
})

准备节点和边¶

让我们将组件封装在符合LangGraph所需接口的函数中。这些函数将处理输入和输出的格式化。

我们将在节点内使用一些组件，并使用其他组件来定义条件边。每个组件都将图状态作为参数。节点返回要更新的状态属性，而条件边返回要执行的下一个节点的名称。

import { Document } from "@langchain/core/documents";

/* ---Nodes--- */

// Retrieve documents for a question
const retrieve = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---RETRIEVE---");
  const documents = await retriever.invoke(state.question);
  // Add sources to the state
  return { documents };
};

// RAG generation
const generate = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---GENERATE---");
  const generation = await ragChain.invoke({
    context: formatDocs(state.documents),
    question: state.question,
  });
  // Add generation to the state
  return { generation };
};

// Determines whether the retrieved documents are relevant to the question.
const gradeDocuments = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---CHECK DOCUMENT RELEVANCE TO QUESTION---");
  // Score each doc
  const relevantDocs: Document[] = [];
  for (const doc of state.documents) {
    const grade: { score: string } = await retrievalGrader.invoke({
      question: state.question,
      content: doc.pageContent,
    });
    if (grade.score === "yes") {
      console.log("---GRADE: DOCUMENT RELEVANT---");
      relevantDocs.push(doc);
    } else {
      console.log("---GRADE: DOCUMENT NOT RELEVANT---");
    }
  }
  return { documents: relevantDocs };
};

// Re-write question
const transformQuery = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---TRANSFORM QUERY---");
  const betterQuestion = await rewriter.invoke({ question: state.question });
  return { question: betterQuestion };
};

// Web search based on the re-phrased question
const webSearch = async (state: typeof GraphState.State): Promise<Partial<typeof GraphState.State>> => {
  console.log("---WEB SEARCH---");
  const stringifiedSearchResults = await webSearchTool.invoke(state.question);
  return {
    documents: [new Document({ pageContent: stringifiedSearchResults })],
  };
};

/* ---Edges--- */

// Decide on the datasource to route the initial question to.
const routeQuestion = async (state: typeof GraphState.State) => {
  const source: { datasource: string } = await questionRouter.invoke({
    question: state.question,
  });
  if (source.datasource === "web_search") {
    console.log(`---ROUTING QUESTION "${state.question} TO WEB SEARCH---`);
    return "web_search";
  } else {
    console.log(`---ROUTING QUESTION "${state.question} TO RAG---`);
    return "retrieve";
  }
};

// Decide whether the current documents are sufficiently relevant
// to come up with a good answer.
const decideToGenerate = async (state: typeof GraphState.State) => {
  const filteredDocuments = state.documents;
  // All documents have been filtered as irrelevant
  // Regenerate a new query and try again
  if (filteredDocuments.length === 0) {
    console.log(
      "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---",
    );
    return "transform_query";
  } else {
    // We have relevant documents, so generate answer.
    console.log("---DECISION: GENERATE---");
    return "generate";
  }
};

// Determines whether the generation is grounded in the document and answers question.
const gradeGenerationDocumentsAndQuestion = async (
  state: typeof GraphState.State,
) => {
  const hallucinationGrade: { score: string } = await hallucinationGrader
    .invoke({
      generation: state.generation,
      context: formatDocs(state.documents),
    });
  // Check for hallucination
  if (hallucinationGrade.score === "yes") {
    console.log("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---");
    // Check question answering
    console.log("---GRADING GENERATION vs. QUESTION---");
    const onTopicGrade: { score: string } = await answerGrader.invoke({
      question: state.question,
      generation: state.generation,
    });
    if (onTopicGrade.score === "yes") {
      console.log("---DECISION: GENERATION ADDRESSES QUESTION---");
      return "useful";
    } else {
      console.log("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---");
      return "not_useful";
    }
  } else {
    console.log(
      "---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RETRY---",
    );
    return "not_supported";
  }
};

构建图¶

现在我们来构建图。为了好玩，我们添加一个检查点，并让编译后的图在进行网络搜索之前暂停。这将模拟请求许可的过程。

import { END, MemorySaver, START, StateGraph } from "@langchain/langgraph";

const graph = new StateGraph(GraphState)
  .addNode("web_search", webSearch)
  .addNode("retrieve", retrieve)
  .addNode("grade_documents", gradeDocuments)
  .addNode("generate", generate)
  .addNode("transform_query", transformQuery)
  .addConditionalEdges(START, routeQuestion)
  .addEdge("web_search", "generate")
  .addEdge("retrieve", "grade_documents")
  .addConditionalEdges("grade_documents", decideToGenerate)
  .addEdge("transform_query", "retrieve")
  .addConditionalEdges("generate", gradeGenerationDocumentsAndQuestion, {
    not_supported: "generate",
    useful: END,
    not_useful: "transform_query",
  });

const app = graph.compile({
  checkpointer: new MemorySaver(),
  interruptBefore: ["web_search"],
});

运行图¶

一切就绪！是时候提问了。首先，尝试一个与代理相关的问题

await app.invoke(
  {
    question: "What are some features of long-term memory?",
  },
  { configurable: { thread_id: "1" } },
);

---ROUTING QUESTION "What are some features of long-term memory? TO WEB SEARCH---
{
  question: 'What are some features of long-term memory?',
  documents: []
}

你可以看到你的图正确地将查询路由到向量存储并回答了问题，并根据需要过滤掉了一些文档。

如果你问的问题与代理或LLM无关，图应该会退回到从网络获取的信息。图会在执行前暂停，如上所述

await app.invoke(
  {
    question: "Where are the 2024 Euros being held?",
  },
  { configurable: { thread_id: "2" } },
);

---ROUTING QUESTION "Where are the 2024 Euros being held? TO WEB SEARCH---
{ question: 'Where are the 2024 Euros being held?', documents: [] }

你可以看到图在运行网络搜索之前暂停了。现在我们通过使用null调用图来继续

await app.invoke(null, { configurable: { thread_id: "2" } });

---WEB SEARCH---
---GENERATE---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADING GENERATION vs. QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
{
  question: 'Where are the 2024 Euros being held?',
  generation: 'The 2024 Euros are being held in Germany. The final match will take place at Olympiastadion Berlin on July 14, 2024.',
  documents: [
    Document {
      pageContent: `[{"title":"Where is Euro 2024? Country, host cities and venues","url":"https://www.radiotimes.com/tv/sport/football/euro-2024-location/","content":"Euro 2024 stadiums The Olympiastadion Berlin, the biggest stadium in Germany with a capacity of around 74,000, will host games as well as the final on Sunday, 14th July, 2024.","score":0.99743915,"raw_content":null},{"title":"UEFA EURO 2024 venues - complete list: When and where will the opening ...","url":"https://olympics.com/en/news/uefa-euro-2024-venues-complete-list-when-where-final-opening-game","content":"UEFA EURO 2024 will be held in Germany across June and July, with 10 host cities staging the major football tournament.. It all begins in Munich on June 14, when hosts Germany take on Scotland in the tournament's opening game at Bayern Munich's stadium.. The final takes place a month later on July 14 at Olympiastadion Berlin in the German capital, which hosted the 2006 FIFA World Cup final ...","score":0.9973061,"raw_content":null},{"title":"EURO 2024: All you need to know | UEFA EURO 2024","url":"https://www.uefa.com/euro2024/news/0257-0e13b161b2e8-4a3fd5615e0c-1000--euro-2024-all-you-need-to-know/","content":"Article top media content\\nArticle body\\nWhere will EURO 2024 be held?\\nGermany will host EURO 2024, having been chosen to stage the 17th edition of the UEFA European Championship at a UEFA Executive Committee meeting in Nyon on 27 September 2018. Host cities\\nEURO 2024 fixtures by venue\\nEURO 2024 fixtures by team\\nAlso visit\\nChange language\\nServices links and disclaimer\\n© 1998-2024 UEFA. Where and when will the final of UEFA EURO 2024 be played?\\nBerlin's Olympiastadion will stage the final on Sunday 14 July 2024.\\n The ten venues chosen to host games at the tournament include nine of the stadiums used at the 2006 World Cup plus the Düsseldorf Arena.\\n All you need to know\\nThursday, January 11, 2024\\nArticle summary\\nThree-time winners Germany will stage the UEFA European Championship in 2024.\\n","score":0.99497885,"raw_content":null}]`,
      metadata: {},
      id: undefined
    }
  ]
}