Vanilla RAG π±
RAG(Retrieval-Augmented Generation) works by finding documents related to the user's question, combining them with a prompt for a large language model (LLM), and then using the LLM to create more accurate and relevant answers.
Hereβs a simple guide to building a RAG pipeline from scratch:
-
Data Loading: Gather and load the documents you want to use for answering questions.
-
Chunking and Embedding: Split the documents into smaller chunks and convert them into numerical vectors (embeddings) that capture their meaning.
-
Vector Store: Create a LanceDB table to store and manage these vectors for quick access during retrieval.
-
Retrieval & Prompt Preparation: When a question is asked, find the most relevant document chunks from the table and prepare a prompt combining these chunks with the question.
-
Answer Generation: Send the prepared prompt to a LLM to generate a detailed and accurate answer.
Hereβs a code snippet for defining a table with the Embedding API, which simplifies the process by handling embedding extraction and querying in one step.
import pandas as pd
import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry
db = lancedb.connect("/tmp/db")
model = get_registry().get("sentence-transformers").create(name="BAAI/bge-small-en-v1.5", device="cpu")
class Docs(LanceModel):
text: str = model.SourceField()
vector: Vector(model.ndims()) = model.VectorField()
table = db.create_table("docs", schema=Docs)
# considering chunks are in list format
df = pd.DataFrame({'text':chunks})
table.add(data=df)
query = "What is issue date of lease?"
actual = table.search(query).limit(1).to_list()[0]
print(actual.text)
Check Colab for the complete code