Building Advanced Chatbot On Personal Blog

So I have been working on GenAI applications for past few months and decided to build something using RAG. I could not think of some interesting resources so I decided to take my own blog content as the data to build my RAG chatbot on. Please note that the following content is going to be almost same as my medium blog - https://medium.com/@varunarora1408/building-rag-chatbot-on-my-personal-blog-ff89ac772c69

So, here was the rough idea that came to my mind to create this application-

alt text

Before we start with RAG, we need to actually get the data inside the vector database. For that, we need to scrape the links from websites and get their HTML data processed before putting them inside the vector database.

Scraping

Scrape the blogs website to get all the posts urls
Filter duplicate urls and links not related to posts
Store these in a pkl file to later use for vectorization

Code -

import requests
from bs4 import BeautifulSoup
import re
import time
from dotenv import load_dotenv
import pickle

load_dotenv()

def getBlogLinks(url = 'https://varunarora14.github.io/'):
    source_code = requests.get(url)
    soup = BeautifulSoup(source_code.content, 'lxml')
    data = []
    links = []

    def remove_duplicates(l): # remove duplicates and unURL string
        for item in l:
            match = re.search("(?P<url>https?://[^\s]+)", item)
            if match is not None:
                links.append((match.group("url")))

    for link in soup.find_all('a', href=True):
        data.append(str(link.get('href')))

    # print(data)
    flag = True
    remove_duplicates(data)
    while flag:
        try:
            for link in links:
                for j in soup.find_all('a', href=True):
                    temp = []
                    source_code = requests.get(link)
                    soup = BeautifulSoup(source_code.content, 'lxml')
                    temp.append(str(j.get('href')))
                    remove_duplicates(temp)

                    # breaking loop in case links count very high for blogs leading to google sites
                    if len(links) > 162: # set limitation to number of URLs
                        break
                if len(links) > 162:
                    break
            if len(links) > 162:
                break
        except Exception as e:
            print(e)
            if len(links) > 162:
                break

    uniqueLinks = list(set(links))
    # extract only posts
    blogLinks = [link for link in uniqueLinks if "varunarora14.github.io/posts/" in link]
    with open('blog_urls.pkl', 'wb') as f:
        pickle.dump(blogLinks, f)

# getBlogLinks()

Data Chunking and Vectorization

For loading HTML websites, I used langchain’s AsyncHtmlLoader and RecursiveCharacterTextSplitter for splitting documents to chunks. Chunking helps to make contexts smaller which helps them fit in LLM’s context windows
Html2TextTransformer was used to remove HTML tags to clean each document chunk and empty chunks(empty due to tags removal) are filtered out. The remaining chunks/docs are stored in pkl file to load in vector DB
Hugging Face embedding function all-MiniLM-L6-v2 is used for creating vectors of each chunk and stored in FAISS vector database (I definitely think the embedding function can improve next time)

Code -

import pickle
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_transformers import Html2TextTransformer
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import FAISS



def storeBlogDocs():
    with open('blog_urls.pkl', 'rb') as f:
        links = pickle.load(f)

    loader = AsyncHtmlLoader(links)
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    docs = loader.load_and_split(splitter)

    html2text = Html2TextTransformer()
    docsTransformed = html2text.transform_documents(docs)
    len(docsTransformed)

    # for i,d in enumerate(docsTransformed):
    #     if d.page_content == "\n":
    #         print(i)

    # ignore html data without any important information which have empty page content
    filteredDocs = [d for d in docsTransformed if d.page_content!='\n']
    print("len of filtered docs:", len(filteredDocs))

    with open('filtered_blog_docs.pkl', 'wb') as f:
        pickle.dump(filteredDocs, f)



def createVectorDatabase():
    # can replace with - BAAI/bge-base-en-v1.5
    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    with open('filtered_blog_docs.pkl', 'rb') as f:
        filteredDocs = pickle.load(f)

    vstore = FAISS.from_documents(documents=filteredDocs, embedding=embedding_function)

    # save vectordb locally
    vstore.save_local(folder_path="faissdb_1000", index_name="blog")

RAG

Now, for performing RAG, I decided to use 2 advanced techniques — Query Decomposition and RAG Fusion. The reason being RAG Fusion is quite fast and gives good quality responses while Query Decomposition seems to give me best quality responses taking more time, so why not implement both?

alt text

RAG Fusion

For each user question, we first validate the question using custom class called LLMResponse which inherits from Pydantic where we tell LLM to classify whether the user question is related to our blog content or not. It then uses the field errorMessage of the response to decide whether it is valid or not. Refer to this part again in the code.

The next step is RAG pipeline. Here is the flow -

Generate multiple queries from initial user question
Filter those queries to remove empty data that LLM might give
Do similarity search for each generated query and combine the documents retrieved in a list
Now perform documents re-ranking using reciprocal rank fusion where highest and most occuring docs are ranked higher than others
Pass these re-ranked documents as context to LLM with initial user question and get response
Parse the response in JSON format which contains question, answer and sources where sources are blog urls from which the answer has been generated.

Code -

import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.load import loads, dumps
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain.vectorstores import FAISS
from pydantic import BaseModel, AfterValidator, Field, ValidationError
import instructor
import google.generativeai as genai

load_dotenv()

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
client = instructor.from_gemini(
client=genai.GenerativeModel(
    model_name="models/gemini-1.5-flash-latest",  # model defaults to "gemini-pro"
),
mode=instructor.Mode.GEMINI_JSON,
)

class LLMResponse(BaseModel):
    question: str
    isValidResponse: bool = Field(description="True if the answer is related to Kubernetes, AWS, linux, docker, networking or it's related topics, False otherwise")

def initGeminiLLM():
    GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"]
    llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.5, max_retries=3)
    return llm

def loadVectorDB(folder_path):
    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    vstore2 = FAISS.load_local(folder_path=folder_path, index_name="blog", embeddings=embedding_function, allow_dangerous_deserialization=True)
    return vstore2

def filterQueries(queries):
    filtered_queries = [d.strip() for d in queries if d.strip()!='']
    print("filtered queries:", filtered_queries)
    print("len of filtered queries:", len(filtered_queries))
    return filtered_queries if len(filtered_queries)<=3 else filtered_queries[-3:] # last elements

def getMultiQueryChain(llm):
    multi_question_template = """
    You are an AI assistant and your task is to generate three different versions of the given question to retrieve relevant documents from a vector database. By generating the multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of distance based similarity search.
    Provide these alternative questions separated by newlines and do not generate any extra result other than the alternate questions.

    Question: {question}
    """

    multi_question_prompt = PromptTemplate(template=multi_question_template, input_variables=["question"])
    generate_multi_queries = multi_question_prompt | llm | StrOutputParser() | (lambda x: x.split("\n"))
    return generate_multi_queries

def reciprocalRankFusion(results: list[list], k=60):
    fused_scores = {}

    for docs in results:
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            previous_score = fused_scores[doc_str]
            fused_scores[doc_str] += 1 / (rank + k)

    # No need to send scores for using docs as context
    reranked_results = [
        loads(doc)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

def getMultiQueryDocs(llm, retriever, question:str):
    print("initial question:", question)
    generate_multi_queries = getMultiQueryChain(llm=llm)
    retrieval_chain = generate_multi_queries | filterQueries | retriever.map() | reciprocalRankFusion
    return retrieval_chain.invoke(question)

def parseConversationalFusion(conversation):

    # print(conversation)
    user_query = conversation["data"]["question"]
    llm_answer = conversation["answer"]
    sources = set()

    mapper = {}

    for doc in conversation["data"]["context"]:
        if doc.metadata["source"] not in mapper:
            mapper[doc.metadata["source"]]=0
        mapper[doc.metadata["source"]]+=1

    sourceMappings = sorted(mapper.items(), key=lambda x: x[1], reverse=True)
    print(sourceMappings)
    sources = [s[0] for s in sourceMappings][:4] # top 4 sources
    print(sources)


    res = {
        "question": user_query,
        "answer": llm_answer,
        "sources": sources
    }
    return res

def getFusionLLMResponse(llm, retriever, question:str):
    validatorResponse = None
    try:

        validatorResponse = client.chat.completions.create(
        response_model=LLMResponse,
        messages=[
            {"role": "user", "content": f"Validate the question:  `{question}`"}
        ]
    )
    except ValidationError as ve:
        obj = LLMResponse()
        obj.question = question
        obj.isValidResponse = False
        obj.errorMessage = "Sorry, your query is not related to any of the blog contents"
        validatorResponse = obj
        print("Validation error:", str(ve))

    if not validatorResponse.isValidResponse:
        resp = {
            "question": question,
            "isValidResponse": validatorResponse.isValidResponse,
            "errorMessage": "Sorry, your query is not related to any of the blog contents"
        }
        return resp
    prompt_template = """
    You are a technical expert assisting with descriptive  answers based on provided tech blog content. Carefully analyze the context retrieved from the blog to answer the following question with precision. If the context includes relevant code snippets, provide them exactly as presented. When the context doesn't contain an answer, give a response based on your own knowledge. Be concise, but thorough, prioritizing accuracy from the documents. Clearly distinguish between information from the context and your own knowledge if used."

    Context:
    {context}

    Question:
    {question}"""

    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

    def formatDocs(docs):
        return  "\n\n".join(doc.page_content for doc in docs)

    rag_chain_from_docs = (
        {
            "context": lambda x: formatDocs(x["data"]["context"]),
            "question": lambda x: x["data"]["question"]
        }
        | prompt
        | llm
        | StrOutputParser()
    )

    rag_chain_with_source = RunnableParallel(
        {"data": RunnablePassthrough()}
    ).assign(answer=rag_chain_from_docs)

    res = rag_chain_with_source.invoke({
        "context": getMultiQueryDocs(llm=llm, retriever=retriever, question=question),
        "question": question
    })

    return parseConversationalFusion(res)
    # return res


llm = initGeminiLLM()
print("llm initiated")


vstore2 = loadVectorDB(folder_path="RAG/faissdb_1000")
print("vector store loaded")
user_question = "What is are statefulsets and volumes? Give code for same"
retriever2 = vstore2.as_retriever()
print(getFusionLLMResponse(llm=llm, retriever=retriever2, question=user_question))

Query Decomposition RAG

alt text

The next step is RAG pipeline. Here is the flow -

Generate multiple queries from initial user question
Filter those queries to remove empty data that LLM might give
Now, for each sub-query, perform native RAG call and store it’s source urls inside sources_list array.
Store each subQuestion and subQuestionAnswer in another list which will be used now as context
Function format_qa_pairs takes the above list of questions and answers and formats them as they will be used as context to RAG instead of document chunks.
The sources urls are put in list and are passed through re-ranking using custom Reciprocal Rank Fusion (RRF) to rank sources based off occurences and order of occurences. Top 4 of these sources are sliced to be sent for answer
RAG chain is called with new context created via format_qa_pairs and initial question passed and this LLM generated response is stored.
The question, LLM response and source urls are then converted in dictionary object to return as response when function queryDecompostionRAG is called.

Code -

import os
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.load import loads, dumps
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain.vectorstores import FAISS
from pydantic import BaseModel, AfterValidator, Field, ValidationError
import instructor
import google.generativeai as genai

load_dotenv()

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
client = instructor.from_gemini(
client=genai.GenerativeModel(
    model_name="models/gemini-1.5-flash-latest",  # model defaults to "gemini-pro"
),
mode=instructor.Mode.GEMINI_JSON,
)

class LLMResponse(BaseModel):
    question: str
    isValidResponse: bool = Field(description="True if the answer is related to Kubernetes, AWS, linux, docker, networking or it's related topics, False otherwise")

def initGeminiLLM():
    GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"]
    llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.5, max_retries=3)
    return llm

def loadVectorDB(folder_path):
    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    vstore2 = FAISS.load_local(folder_path=folder_path, index_name="blog", embeddings=embedding_function, allow_dangerous_deserialization=True)
    return vstore2

def filterQueries(queries):
    filtered_queries = [d.strip() for d in queries if d.strip()!='']
    print("filtered queries:", filtered_queries)
    print("len of filtered queries:", len(filtered_queries))
    return filtered_queries if len(filtered_queries)<=3 else filtered_queries[-3:] # last elements

def getMultiQueryChain(llm):
    multi_question_template = """
    You are an AI assistant and your task is to generate three different versions of the given question to retrieve relevant documents from a vector database. By generating the multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of distance based similarity search.
    Provide these alternative questions separated by newlines and do not generate any extra result other than the alternate questions.

    Question: {question}
    """

    multi_question_prompt = PromptTemplate(template=multi_question_template, input_variables=["question"])
    generate_multi_queries = multi_question_prompt | llm | StrOutputParser() | (lambda x: x.split("\n"))
    return generate_multi_queries

def formatDocs(docs):
        return  "\n\n".join(doc.page_content for doc in docs)

def reciprocalRankFusionSources(results: list[list], k=60):
    '''
    Pass all the sources and sort in order of most occuring sources
    '''
    fused_scores = {}

    for sources in results:
        for rank, source in enumerate(sources):
            if source not in fused_scores:
                fused_scores[source] = 0
            fused_scores[source] += 1 / (rank + k)

    # No need to send scores for using sources as context
    reranked_results = [
        sourceUrl
        for sourceUrl, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results[:4] # return top 3 results only

def retrieveSubquestionsRAG(question,sub_question_generator_chain, vectorStore):
    """RAG on each sub-question"""

    prompt_template = """
    You are a technical expert assisting with answers based on provided tech blog content. Carefully analyze the context retrieved from the blog to answer the following question with precision. If the context includes relevant code snippets, provide them exactly as presented. When the context doesn't contain an answer, give a response based on your own knowledge. Be concise, but thorough, prioritizing accuracy from the documents. Clearly distinguish between information from the context and your own knowledge if used."

    Context:
    {context}

    Question:
    {question}"""

    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

    subQuestions = sub_question_generator_chain.invoke({"question":question})
    print("initial question: ", question)
    print("sub-questions: ", subQuestions)

    # Initialize a list to hold RAG chain results
    subqueryAnswers = []
    sources_list = []

    for sub_question in subQuestions:

        # Retrieve documents for each sub-question
        retrieved_docs = vectorStore.similarity_search(sub_question)
        print(retrieved_docs)
        sources = [d.metadata['source'] for d in retrieved_docs]
        sources_list.append(sources)

        # Use retrieved documents and sub-question in RAG chain
        subqueryAnswer = (prompt | llm | StrOutputParser()).invoke({"context": formatDocs(retrieved_docs),
                                                                "question": sub_question})
        subqueryAnswers.append(subqueryAnswer)

        rerankedSourcesFiltered = reciprocalRankFusionSources(sources_list)
    # sub-ques and answer for each sub_question is returned, both as lists
    return subqueryAnswers,subQuestions, rerankedSourcesFiltered

def format_qa_pairs(questions, answers):
    """Format Q and A pairs
    to be passed as context"""

    print("answers lens", len(answers))
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers)):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"

    print("formatted context: ", formatted_string.strip())
    return formatted_string.strip()


def queryDecompostionRAG(question, llm, vectorStore):

    validatorResponse = None
    try:

        validatorResponse = client.chat.completions.create(
        response_model=LLMResponse,
        messages=[
            {"role": "user", "content": f"Validate the question:  `{question}`"}
        ]
    )
    except ValidationError as ve:
        obj = LLMResponse()
        obj.question = question
        obj.isValidResponse = False
        obj.errorMessage = "Sorry, your query is not related to any of the blog contents"
        validatorResponse = obj
        print("Validation error:", str(ve))

    if not validatorResponse.isValidResponse:
        resp = {
            "question": question,
            "isValidResponse": validatorResponse.isValidResponse,
            "errorMessage": "Sorry, your query is not related to any of the blog contents"
        }
        return resp

    generate_multi_queries = getMultiQueryChain(llm=llm)
    sub_question_generator_chain = generate_multi_queries | filterQueries
    subqueryAnswers, subQuestions, rerankedSourcesFiltered = retrieveSubquestionsRAG(question=question, sub_question_generator_chain=sub_question_generator_chain, vectorStore=vectorStore)

    contextData = format_qa_pairs(questions=subQuestions, answers=subqueryAnswers)
    template = """Using the following set of Q&A pairs as context:

    {context}

    Synthesize a clear and comprehensive answer to the original question: "{question}".
    Ensure the answer is based on the context provided, integrating relevant information from the Q&A pairs to address the main question accurately.
    """

    prompt = PromptTemplate(template=template, input_variables=["context", "question"])

    queryDecompositionChain = (
    prompt
    | llm
    | StrOutputParser()
)
    llmAnswer = queryDecompositionChain.invoke({"context": contextData, "question": question})
    return {
        "question": question,
        "answer": llmAnswer,
        "sources": rerankedSourcesFiltered
    }

llm = initGeminiLLM()
print("llm initiated")

user_question = "What is are statefulsets and volumes? Give code for same"
vstore = loadVectorDB(folder_path="faissdb_1000")
# print("vector store loaded")

queryDecompostionRAG(question=user_question, llm=llm, vectorStore=vstore)

Now the next goal is to deploy it via Streamlit. For this we create folder .streamlit/secrets.toml and enter the following API keys -

HUGGINGFACEHUB_API_TOKEN = "HFACE_API_KEY"
GOOGLE_API_KEY = "GEMINI_API_KEY"

The hugging face API key has to retrieved from here — https://huggingface.co/settings/tokens and Gemini API key from here — https://aistudio.google.com/app/apikey

Note — These API keys are free of cost

After this has been done, copy the code from here — https://github.com/VarunArora14/Personal-Blog-Chatbot/tree/master/

The app/pages folder contains same API code as above with streamlit methods for caching and chat support making it ready to run. Refer to below example for Query Decomposition Page.

Code -

import streamlit as st
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
import os
from dotenv import load_dotenv
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.load import loads, dumps
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from pydantic import BaseModel, AfterValidator, Field, ValidationError
from typing_extensions import Annotated
import json

st.set_page_config(
    page_title="Personal Blog Chatbot",
    page_icon="🐳",
)

client,llm,vstore = None, None, None

if "llm" not in st.session_state:
    st.error("LLM NOT IN STATE! Go to Welcome and reload!")
else:
    llm=st.session_state['llm']
    client = st.session_state['client']

if "vstore" not in st.session_state:
    st.error("VSTORE NOT IN STATE! Go to Welcome and reload!")
else:
    vstore = st.session_state["vstore"]

# --------------------------------------
# QUERY DECOMPOSITION API CODE COPY PASTED
# --------------------------------------

st.markdown("""

            # 🧩 Query Decomposition: Breaking Down Complex Questions 💡

            Ever asked a question that was a bit too complex? No problem! 🤔

            With **Query Decomposition**:

            - I take **multi-part or complicated queries** and break them into **smaller, manageable sub-questions** 🔨.
            - Each sub-question is answered individually, and then I piece everything back together to provide a **complete and accurate response** 🔗.

            This approach helps ensure I cover all aspects of your question—even the tough ones—by tackling it step by step! 🎯

            Try asking something complex and see how I break it down into clear, easy-to-digest answers! 🧠

            ---


            """)

# Initialize chat history
if "decomposition_messages" not in st.session_state:
    st.session_state.decomposition_messages = []

# Display chat messages from history on app rerun
for message in st.session_state.decomposition_messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Accept user input
if prompt := st.chat_input("What is up?"):
    # Add user message to chat history
    st.session_state.decomposition_messages.append({"role": "user", "content": prompt})
    # Display user message in chat message container
    with st.chat_message("user"):
        st.markdown(prompt)

    # Display assistant response in chat message container
    with st.chat_message("assistant"):
        if len(st.session_state.decomposition_messages):
            response = queryDecompostionRAG(question=st.session_state.decomposition_messages[-1]["content"], llm=llm, vectorStore=vstore)
            answer = None
            if response["isValidResponse"]==False:
                answer = response["errorMessage"]
            else:
                answer = response['answer'] + "\n\nSource: " + ", ".join(x for x in response['sources'])
            st.write(answer)
    st.session_state.decomposition_messages.append({"role": "assistant", "content": answer})

The main file to run is app/Welcome.py where initialisation and caching of resources also takes place. Decoratorer st.cache_resource makes sure the variables initialised in the methods under this decorator need not initialise again.

I used this for initialisation of my vector database and LLM client and stored them in session state for other pages to use.

Code -

import streamlit as st
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
import os
from dotenv import load_dotenv
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
import instructor
import google.generativeai as genai


st.set_page_config(
    page_title="Personal Blog Chatbot",
    page_icon="🐳",
)

@st.cache_resource
def initGeminiLLM():
    # load_dotenv()
    # GOOGLE_API_KEY = os.environ["GOOGLE_API_KEY"]
    os.environ["GOOGLE_API_KEY"] = st.secrets["GOOGLE_API_KEY"]
    genai.configure(api_key=st.secrets["GOOGLE_API_KEY"])
    client = instructor.from_gemini(
    client=genai.GenerativeModel(
        model_name="models/gemini-1.5-flash-latest",  # model defaults to "gemini-pro"
    ),
    mode=instructor.Mode.GEMINI_JSON,
)

    llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.5, max_retries=3)
    return client, llm

@st.cache_resource
def loadVectorDB(folder_path="RAG/faissdb_1000"):
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = st.secrets["HUGGINGFACEHUB_API_TOKEN"]
    embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
    vstore2 = FAISS.load_local(folder_path=folder_path, index_name="blog", embeddings=embedding_function, allow_dangerous_deserialization=True)
    return vstore2

# to be used by fusion and query decomposition
if "llm" not in st.session_state:
    st.session_state["client"], st.session_state["llm"] = initGeminiLLM()

if "vstore" not in st.session_state:
    st.session_state["vstore"] = loadVectorDB()



st.write("# 🤖 Welcome to the Personal Blog Chatbot! 🌐")

st.sidebar.success("Choose Fusion or Query Decomposition to start asking")

st.markdown(
    """
    Curious how AI can help you find the most relevant information from multiple sources? You've come to the right place! This chatbot is powered by Retrieval-Augmented Generation (RAG), a cutting-edge approach where I blend smart document search with advanced AI responses.

    ✨ How it works:

    1. I gather knowledge by scraping and parsing documents from my blog posts at https://varunarora14.github.io 📄.
    2. Then, using **Query Decomposition** and **RAG Fusion**, I break down complex queries to deliver precise, well-rounded answers 🎯. You should try both and decided which one you like more😉
    3. Plus, I cite my sources—so you can trust where the information comes from 📚.
    Type in your question and let me guide you through the answers! 🌟

    ---

    """
)

Now that we have gone through the core functionality, let’s run this code.

Create virtual environment at root of folder and install packages in requirements.txt. Confirm secrets.toml file exists in .streamlit folder.

At the root of the folder, run the command streamlit run app/Welcome.py

Congrats, you should have you streamlit app running. I have put the vectorDB and APIs part of same folder to make it easy to deploy on streamlit.

Here is my deployed website — https://personal-blog-chatbot.streamlit.app/

Screenshots

home page alt text

RAG fusion question — Gives source urls from my blog alt text

Query decomposition question answered with sources alt text

References

Refer to the source code repo here — https://github.com/VarunArora14/Personal-Blog-Chatbot

Do go through my blogs if you are interested a bit more in Kubernetes, AWS or linux commands — https://varunarora14.github.io

Please let me know if you have any doubts. You can connect with me on linkedin here — https://www.linkedin.com/in/varun-arora-335225192/

Glad you have reached till here. Happy reading!

Scraping#

Data Chunking and Vectorization#

RAG#

RAG Fusion#

Query Decomposition RAG#

Screenshots#

References#

Scraping

Data Chunking and Vectorization

RAG

RAG Fusion

Query Decomposition RAG

Screenshots

References