Run Teapot LLM on Windows: Step by Step Installation Guide
Teapot LLM is an open-source language model with approximately 800 million parameters, fine-tuned on synthetic data and optimized to run locally on resource-constrained devices such as smartphones and CPUs.
Developed by the community, Teapot LLM is designed to perform a variety of tasks, including hallucination-resistant Question Answering (QnA), Retrieval-Augmented Generation (RAG), and JSON extraction.
Key Features
- Hallucination Resistance: Teapot LLM is trained to only answer questions using context from provided documents, reducing the likelihood of generating inaccurate or irrelevant responses.
- Retrieval-Augmented Generation: The model can determine which documents are relevant before answering a question, ensuring responses are based on the most pertinent information.
- Information Extraction: Teapot LLM can extract structured information from context using predefined JSON structures, making it useful for parsing documents.
Training Details
Teapot LLM is fine-tuned from flan-t5-large
on a synthetic dataset of LLM tasks generated using DeepSeek-V3. The training process involves:
- Dataset: A ~10MB synthetic dataset consisting of QnA pairs with a variety of task-specific formats.
- Methodology: The model is trained to mimic task-specific output formats and is scored based on its ability to output relevant, succinct, and verifiable answers.
- Hardware: Trained for approximately 10 hours on an A100 GPU provided by Google Colab.
- Hyperparameters: Various learning rates were used, and the model was monitored to ensure task-specific performance without catastrophic forgetting.
System Requirements
Before installing Teapot LLM, ensure your system meets the following requirements:
Hardware
- CPU: A modern multi-core processor (Intel i5/i7 or AMD Ryzen recommended).
- GPU: NVIDIA RTX GPU with at least 8 GB VRAM for optimal performance (optional for CPU-only inference).
- RAM: Minimum 16 GB; 32 GB or more recommended for larger models.
- Storage: SSD with at least 100 GB free space for model files and dependencies.
Software
- Operating System: Windows 10 or later.
- Python: Version 3.10 or higher.
- CUDA Toolkit: Version 12.8 or higher (for GPU acceleration).
- Docker (Optional): For containerized setups.
Installation Methods
1. Using Docker Containers
Docker simplifies the setup process by bundling dependencies into containers.
- Install Docker Desktop for Windows.
Create directories to store model files and configurations:
mkdir ollama-files open-webui-files
- Access the Web Interface:
Open your browser and navigate tohttp://localhost:4000
to interact with the model.
Pull the Teapot LLM Docker image:
docker run -d -p 4000:8080 -v /path/to/ollama-files:/root/.ollama -v /path/to/open-webui-files:/app/backend/data --name teapot-webui --restart always ghcr.io/open-webui/open-webui:teapot
2. Native Installation
For users preferring a direct installation without containers:
- Install Python:
- Download Python from the official website.
- During installation, check "Add Python to PATH."
- Install CUDA Toolkit (if using GPU):
- Download from NVIDIA's website and follow the installation instructions.
Run the Model:
python main.py --model teapot --port 8080
Install Dependencies using PowerShell:
./setup_env.ps1
Clone the Teapot Repository:
git clone https://github.com/teapot-ai/teapot.git
cd teapot
3. Using Llamafile
Llamafile simplifies running LLMs by bundling them into single executables.
- Download the Teapot Llamafile Executable:
Obtain it from the official release page. - Launch the Application:
Double-click the.exe
file to start the application. - Interact with the Model:
Use the provided web interface or command-line prompts to work with Teapot LLM.
Getting Started
To use Teapot LLM, you can leverage the teapotai
library, which simplifies model integration into production environments. Here’s a basic example of using Teapot LLM for general question answering:PythonCopy
from teapotai import TeapotAI
# Sample context
context = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
It stands at a height of 330 meters and is one of the most recognizable structures in the world.
"""
teapot_ai = TeapotAI()
answer = teapot_ai.query(
query="What is the height of the Eiffel Tower?",
context=context
)
print(answer) # Output: "The Eiffel Tower stands at a height of 330 meters."
For more advanced use cases, such as Retrieval-Augmented Generation, Teapot LLM can be used with multiple documents to answer questions based on the most relevant information.
Optimization Techniques
1. GPU Acceleration
Leverage NVIDIA TensorRT for faster inference:
- Install TensorRT: Follow NVIDIA's guidelines for installation.
Configure Teapot to Use GPU:
python main.py --model teapot --gpu
2. Quantization
Reduce model size by quantizing weights (e.g., converting to INT8). This process can greatly improve performance on machines with limited resources while maintaining acceptable accuracy.
3. Batch Processing
Increase batch sizes for tasks like text generation to improve throughput and overall efficiency.
Practical Coding Examples of Teapot LLM
Example 1: General Question Answering (QnA)
In this example, we showcase how to use Teapot LLM to answer questions based on a provided context. The model is optimized for conversational responses and is trained to avoid answering questions beyond the given context, thereby reducing hallucinations.
from teapotai import TeapotAI
# Sample context about the Eiffel Tower
context = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
It stands at a height of 330 meters and is one of the most recognizable structures in the world.
"""
# Initialize TeapotAI
teapot_ai = TeapotAI()
# Get the answer using the provided context
answer = teapot_ai.query(
query="What is the height of the Eiffel Tower?",
context=context
)
print(answer) # Expected Output: "The Eiffel Tower stands at a height of 330 meters."
# Example demonstrating hallucination resistance:
context_without_height = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
"""
answer = teapot_ai.query(
query="What is the height of the Eiffel Tower?",
context=context_without_height
)
print(answer) # Expected Output: "I don't have information on the height of the Eiffel Tower."
Example 2: Chat with Retrieval-Augmented Generation (RAG)
This example illustrates how to use Teapot LLM with Retrieval-Augmented Generation (RAG) to automatically select the most relevant documents before generating an answer. This approach is particularly useful when you have multiple documents and need the model to extract the most pertinent information.
from teapotai import TeapotAI
# Sample documents about various famous landmarks
documents = [
"The Eiffel Tower is located in Paris, France. It was built in 1889 and stands 330 meters tall.",
"The Great Wall of China is a historic fortification that stretches over 13,000 miles.",
"The Amazon Rainforest is the largest tropical rainforest in the world, covering over 5.5 million square kilometers.",
"The Grand Canyon is a natural landmark located in Arizona, USA, carved by the Colorado River.",
"Mount Everest is the tallest mountain on Earth, located in the Himalayas along the border between Nepal and China.",
"The Colosseum in Rome, Italy, is an ancient amphitheater known for its gladiator battles.",
"The Sahara Desert is the largest hot desert in the world, located in North Africa.",
"The Nile River is the longest river in the world, flowing through northeastern Africa.",
"The Empire State Building is an iconic skyscraper in New York City that was completed in 1931 and stands at 1454 feet tall."
]
# Initialize TeapotAI with documents for RAG
teapot_ai = TeapotAI(documents=documents)
# Start a chat session with a retrieval prompt
answer = teapot_ai.chat([
{
"role": "system",
"content": "You are an agent designed to answer facts about famous landmarks."
},
{
"role": "user",
"content": "What landmark was constructed in the 1800s?"
}
])
print(answer) # Expected Output: "The Eiffel Tower was constructed in the 1800s."
Additional Tips and Best Practices
Using a Virtual Environment
Creating a virtual environment is a best practice to manage project dependencies effectively. Use the following commands to set up a virtual environment:
python3 -m venv teapot-env
source teapot-env/bin/activate # On Windows use: teapot-env\Scripts\activate
Keeping TeapotAI Updated
Always ensure you have the latest version of TeapotAI to take advantage of new features and improvements:
pip install --upgrade teapotai
Saving and Loading Models with Precomputed Embeddings
To reduce loading times, you can save a TeapotAI instance with precomputed embeddings using Python’s pickle module:
import pickle
# Save the TeapotAI model to a file
with open("teapot_ai.pkl", "wb") as f:
pickle.dump(teapot_ai, f)
# Load the saved TeapotAI model
with open("teapot_ai.pkl", "rb") as f:
loaded_teapot_ai = pickle.load(f)
# Verify the loaded model works as expected
print(len(loaded_teapot_ai.documents)) # Expected Output: Number of documents, e.g., 9
loaded_teapot_ai.query("What city is the Eiffel Tower in?") # Expected Output: "The Eiffel Tower is located in Paris, France."
Applications
Teapot LLM is particularly useful for:
- Conversational QnA: Providing friendly, conversational answers using context and documents as references.
- Document Parsing: Efficiently extracting information from documents in various formats.
- Educational Tools: Assisting in teaching core computer science subjects by generating examples and visualizing step-by-step logic.
Troubleshooting Common Issues:
- Docker Container Not Starting: Ensure Docker Desktop is running and properly configured.
- Python Path Errors: Verify that Python is correctly added to the system PATH.
- Insufficient VRAM: Switch to CPU inference if your GPU resources are inadequate.
Limitations
While Teapot LLM excels in question answering and information extraction, it is not intended for code generation, creative writing, or critical decision-making applications. Additionally, Teapot LLM has been trained primarily on English and may not perform well in other languages.
Conclusion
Running Teapot LLM locally on Windows offers unparalleled flexibility, enhanced privacy, and significant cost savings for developers and AI enthusiasts alike. Whether you choose Docker containers, native installation, or executables like Llamafile, this guide provides the steps needed for a smooth setup process.