Building a Context-Aware Q&A System with LangChain.js and Web Scraping

In the rapidly evolving world of Large Language Models (LLMs), providing relevant context is key to getting accurate and helpful responses. While LLMs are incredibly powerful, they don’t inherently know everything about specific, niche topics or the latest information on a website. This is where Retrieval Augmented Generation (RAG) comes into play, allowing us to feed specific, up-to-date information to our LLMs.

This blog post will walk you through building a basic RAG system using LangChain.js to scrape content from a website, embed it, store it, and then use it to answer questions with a Google Gemini model.

Why RAG? The Problem of Context

Imagine asking an LLM about the specific features of a new product launched last week, or details from your company’s internal documentation. Without explicit context, the LLM might hallucinate, provide general information, or simply state it doesn’t know. RAG solves this by:

Retrieving relevant information from a knowledge base (in our case, a scraped website).
Augmenting the LLM’s prompt with this retrieved context.
Generating a more informed and accurate answer.

The Tools We’ll Use

Our system leverages several powerful components from the LangChain.js ecosystem:

LangChain.js: A framework for developing applications powered by language models.
ChatGoogleGenerativeAI: Our chosen LLM, specifically Google’s Gemini 2.0 Flash model.
CheerioWebBaseLoader: A document loader that uses Cheerio to efficiently scrape HTML content from web pages.
RecursiveCharacterTextSplitter: Breaks down large documents into smaller, manageable chunks.
GoogleGenerativeAIEmbeddings: Converts text chunks into numerical vector representations (embeddings).
MemoryVectorStore: An in-memory database to store and search our text embeddings.
createStuffDocumentsChain & createRetrievalChain: LangChain utilities to orchestrate the flow of retrieving documents and combining them with our LLM prompt.

Step-by-Step Implementation

Let’s break down the process, mirroring the provided JavaScript code:

1. Initialize the Language Model

First, we set up our LLM. We’re using gemini-2.0-flash for its speed and cost-effectiveness, with a temperature of 0 for more deterministic (less creative) answers, which is often preferred for factual Q&A.

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";

const model = new ChatGoogleGenerativeAI({
    model: "gemini-2.0-flash",
    temperature: 0
});

2. Define the Prompt Template

A prompt template structures how the LLM will receive the retrieved context and the user’s question. This ensures the LLM understands its role: to answer based only on the provided content.

import { ChatPromptTemplate } from "@langchain/core/prompts";

const prompt = ChatPromptTemplate.fromTemplate(
    "Answer users question based on the provided content.\n\n" +
    "Content: {context}\n\n" +
    "Question: {input}"
);

3. Create the Document Combination Chain

This chain takes the retrieved documents and “stuffs” them into the prompt, preparing them for the LLM.

import { createStuffDocumentsChain } from "langchain/chains/combine_documents";

const chain = await createStuffDocumentsChain({
    llm: model,
    prompt: prompt
});

4. Load Data from a Website

Here’s where the web scraping happens! CheerioWebBaseLoader fetches the content from a specified URL.

import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader("https://js.langchain.com/docs/integrations/document_loaders/web_loaders/web_cheerio/");
const doc = await loader.load();

5. Split the Document into Chunks

Large documents need to be broken down. The RecursiveCharacterTextSplitter intelligently splits text, ensuring chunks are of a manageable size for embedding and that some overlap exists to maintain context across splits.

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 200, // Max characters per chunk
    chunkOverlap: 20 // Overlap to prevent loss of context
});
const splitDoc = await splitter.splitDocuments(doc);

6. Generate Embeddings

Embeddings are numerical representations of text that capture its semantic meaning. GoogleGenerativeAIEmbeddings transforms our text chunks into these vectors.

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const embedding = new GoogleGenerativeAIEmbeddings();

7. Create an In-Memory Vector Store

The MemoryVectorStore is where our embedded text chunks live. It allows us to quickly search for chunks that are semantically similar to a given query.

import { MemoryVectorStore } from 'langchain/vectorstores/memory';

const vectorStore = await MemoryVectorStore.fromDocuments(splitDoc, embedding);

8. Set Up the Retriever

The retriever’s job is to query the vectorStore and fetch the most relevant document chunks. We configure it to retrieve the top 2 (k: 2) most similar documents.

const retriever = vectorStore.asRetriever({
    k: 2,
});

9. Build the Retrieval Chain

This is the final orchestration step. The createRetrievalChain combines our document combination chain with the retriever, creating an end-to-end pipeline for answering questions based on retrieved context.

import { createRetrievalChain } from "langchain/chains/retrieval";

const retrieverChain = await createRetrievalChain({
    combineDocsChain: chain,
    retriever: retriever
});

10. Invoke the Chain with a Question

Finally, we can ask our system a question! The retrieverChain will handle finding the relevant information and passing it to the LLM to generate an answer.

const response = await retrieverChain.invoke({
    input: "What is Cheerio",
});

console.log(response);

When you run this code, the response object will contain the LLM’s answer, which is derived from the content scraped from the LangChain.js documentation about Cheerio.

Conclusion

This example demonstrates the power of LangChain.js in building sophisticated LLM applications. By combining web scraping with vector embeddings and retrieval, you can create highly context-aware Q&A systems capable of answering questions based on specific, up-to-date information from virtually any website. This opens up a world of possibilities for custom chatbots, knowledge assistants, and more!

LangChain, LLM