Here's an example where LangChain uses Ollama to answer questions from a custom knowledge base (like a set of text documents or PDFs) using Retrieval-Augmented Generation (RAG).
🔑 What Will This Example Do?
- Load custom documents into a knowledge base.
- Convert documents into text chunks.
- Embed text chunks using LangChain's Embeddings.
- Store embeddings in a local vector database (FAISS).
- Use Ollama to answer user questions by retrieving relevant chunks.
Prerequisites
-
Install Ollama and pull a model (e.g.,
mistral):brew install ollama ollama pull mistral -
Install required Python packages:
pip install langchain pypdf faiss-cpu
Code Example
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings # You can switch to a local embedding model if needed
from langchain.llms import Ollama
from langchain.chains import RetrievalQA
# 1. Load Custom Knowledge Base (PDF)
pdf_loader = PyPDFLoader("knowledge_base.pdf")
documents = pdf_loader.load()
# 2. Split Text into Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)
# 3. Embed Chunks and Store in FAISS
embeddings = OpenAIEmbeddings() # Replace with local embeddings if privacy is key
db = FAISS.from_documents(chunks, embeddings)
# 4. Setup Ollama LLM
llm = Ollama(model="mistral") # You can also use "llama2" or others
# 5. Create Retrieval-based Question Answering Chain
retriever = db.as_retriever()
qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type="stuff")
# 6. Ask Questions
query = "What is the main conclusion of the document?"
answer = qa_chain.run(query)
print(f"Q: {query}")
print(f"A: {answer}")
🔑 How It Works:
- The PDF is loaded into text format.
- Text is chunked into smaller parts (500 characters with overlap).
- Each chunk is converted into a vector embedding and stored in FAISS.
- When a question is asked, the system retrieves the most relevant chunks using similarity search.
- The retrieved chunks are passed to Ollama to generate a natural language answer.
📌 Optional Improvements
- Use Sentence Transformers or Hugging Face embeddings instead of OpenAI for full local privacy.
- Add context length filtering to avoid irrelevant chunks.
- Enable streaming with:
for chunk in qa_chain.stream(query):
print(chunk, end="")
When to Use This Setup?
| Use Case | Recommendation |
|---|---|
| Private Data | ✅ Best choice (No cloud APIs) |
| Large Documents | ✅ Works well |
| Small Models | ✅ Mistral, LLaMA |
| Local Setup | ✅ Full local pipeline |
Comments
Post a Comment