Tutorial
Build a Private Local Chatbot with LangChain, ChromaDB and Ollama
Learn to create a fully offline, private chatbot on your own machine using LangChain, ChromaDB, and a local LLM. This step-by-step guide covers document ingestion, retrieval-augmented generation, conversational memory, and a Gradio UI.
June 2026 · 8 min read · 1 views · 0 hearts
Advertisement
You don’t need a Silicon Valley budget or a team of PhDs to build a useful chatbot. With open-source tools like LangChain, ChromaDB, and a local LLM, you can create a private, custom assistant that runs on your own machine.
The best part? It’s fully under your control. No API bills, no data leaks, no ratelimit headaches.
Here’s how to get started, step by step.
Choose Your Stack
The core ingredients are simple:
- A large language model (LLM) – This powers the conversation. Options include Llama 3, Mistral, or Phi-3. All run locally via
ollamaorllama.cpp. - An orchestration framework – LangChain or LlamaIndex handles prompt construction, memory, and tool use.
- A vector database – ChromaDB or Qdrant stores your custom knowledge (documents, FAQs, manuals) so the bot can answer specific questions.
- A frontend – Gradio or Streamlit gives you a quick UI. Or run it as a REST API with FastAPI.
Why local? You own the data. No internet required. No privacy concerns.
Step 1: Run a Local LLM
The fastest way is ollama:
# Install (macOS/Linux/WSL)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model (3B parameters = fast on laptop)
ollama pull phi3
# Test it
ollama run phi3 "Hello. Who are you?"
That’s it. You now have a chatbot on your machine. But “raw” model output isn’t very useful with your own data.
Step 2: Load Your Knowledge Base
This is where the chatbot becomes yours. You feed it documents, PDFs, or even a folder of text files.
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
# Load docs
loader = DirectoryLoader("./my_knowledge/", glob="**/*.txt")
docs = loader.load()
# Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)
# Create vector store
embeddings = OllamaEmbeddings(model="phi3")
vectorstore = Chroma.from_documents(chunks, embeddings)
Now your bot has a searchable brain. The embeddings convert meaning into numbers so the database can find the most relevant passages when someone asks a question.
Step 3: Build the Retrieval-Augmented Generation (RAG) Chain
Your bot won’t just guess answers. It will retrieve relevant chunks from your knowledge base and feed them to the LLM as context.
from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama
llm = Ollama(model="phi3", temperature=0.2)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True
)
When a user asks “How do I reset my password?”, the chain finds the three most relevant text chunks about password resets from your documents, passes them to the LLM, and the LLM composes an answer grounded in those facts.
Zero hallucinations (well, drastically reduced).
Step 4: Add Conversational Memory
Right now, the bot only sees the current question. To hold a real conversation, add memory:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
conversation = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
memory=memory
)
Now the bot remembers what you said three messages ago.
Step 5: Frontend — Make It Interactive
The quickest UI is with Gradio:
import gradio as gr
def chat(message, history):
result = conversation.invoke({"question": message})
return result["answer"]
gr.ChatInterface(chat).launch()
Run that script. Open your browser. You’ll see a clean chat interface talking to your locally running bot.
Going Further (Optional)
- Tool use: Give your bot the ability to call APIs, search the web, or query a SQL database using LangChain tools.
- Multimodal: Use Llama 3.2 Vision to analyze images.
- Streaming: Output tokens as they’re generated for a more natural feel.
- Voice: Add speech-to-text with Whisper and text-to-speech with piper or Coqui.
What You End Up With
A completely offline, private chatbot that:
- Answers questions using your own documents
- Remembers conversation context
- Costs $0 in API charges
- Can run on a laptop (8GB+ RAM recommended for 3B models)
No cloud. No vendor lock-in. Just a terminal, Python, and a few pip install commands.
The hardest part is deciding what to teach your bot first.
Advertisement
Comments
Questions, corrections, and tips stay visible for everyone reading this page.
Join the discussion
No comments yet
Be the first to leave a note — it helps the next reader.