teapot llm

How to Run Teapot LLM on Ubuntu: An Installation Guide

Anas Mohammad

Apr 1, 2025 • 6 min read

Teapot LLM is an open-source, lightweight language model optimized for resource-constrained devices. This guide will walk you through the process of setting up and running Teapot LLM on an Ubuntu system, covering everything from system preparation to running the model locally.

Whether you’re building a conversational AI, performing retrieval-augmented generation, or extracting structured information, this guide ensures a smooth and efficient deployment.

What is Teapot LLM?

Teapot LLM is a compact language model (~800 million parameters) designed to run efficiently on devices with limited computational power. Key features include:

Hallucination Resistance: Provides contextually accurate answers based solely on the given input.
Retrieval-Augmented Generation (RAG): Combines document retrieval with text generation to answer complex queries.
Information Extraction: Outputs structured data such as JSON and numerical values.
Conversational Q&A: Delivers user-friendly, document-based answers.

System Requirements

Before proceeding, ensure your Ubuntu system meets the following requirements:

Operating System: Ubuntu 20.04 or later.
Hardware:
- Minimum 4 GB RAM.
- CPU with AVX2 support (GPU optional but recommended for faster inference).
Dependencies:
- Python 3.8 or higher.
- Docker (optional for containerized deployment).
- NVIDIA GPU with CUDA drivers (if using GPU acceleration).

Step by Step Installation Guide

Step 1: Prepare Your Ubuntu System

Update and Upgrade

Update your system packages with:

sudo apt update && sudo apt upgrade -y

Install Essential Tools

Install tools like curl, git, and build-essential:

sudo apt install curl git build-essential -y

Install Python

Ensure Python 3.8 or higher is installed:

sudo apt install python3 python3-pip -y

Verify the installation:

python3 --version
pip3 --version

Step 2: Install Teapot LLM

Using Hugging Face Transformers

Teapot LLM can be loaded using the Hugging Face Transformers library:

Load the Teapot LLM model in Python:

from transformers import pipeline

teapot_ai = pipeline("text2text-generation", model="teapotai/teapotllm")

context = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
It stands at a height of 330 meters and is one of the most recognizable structures in the world.
"""

question = "What is the height of the Eiffel Tower?"
answer = teapot_ai(context + "\n" + question)
print(answer[0]['generated_text'])

Install the Transformers library:

pip3 install transformers

This script initializes Teapot LLM and queries it for information.

Step 3: Dockerized Deployment (Optional)

For a containerized setup, Docker can simplify installation and management.

Install Docker

Install Docker on your Ubuntu system:

sudo apt install docker.io -y
sudo systemctl enable docker --now

Verify Docker installation:

docker --version

Run Teapot LLM in a Container

Use a pre-built container image or create one for Teapot LLM:

Run the container:

docker run -d --name teapotllm -p 8080:8080 teapotai/teapotllm:latest

Pull the container image:

docker pull teapotai/teapotllm:latest

Access the model via http://localhost:8080.

Step 4: GPU Acceleration (Optional)

If you have an NVIDIA GPU, leverage CUDA for faster inference.

Install NVIDIA Drivers and CUDA Toolkit

Install CUDA Toolkit:

sudo apt install nvidia-cuda-toolkit -y

Verify installation:

nvidia-smi

Install drivers:

sudo apt install nvidia-driver-470 -y

Add NVIDIA's package repository:

sudo add-apt-repository ppa:graphics-drivers/ppa -y && sudo apt update

Enable GPU in Docker Container

Run the container with GPU support:

docker run --gpus all -d --name teapotllm_gpu -p 8080:8080 teapotai/teapotllm:latest

Step 5: Web Interface Setup

For ease of use, you can set up a web-based interface.

Install Open WebUI

Build and run the interface:

docker-compose up -d

Clone the Open WebUI repository:

git clone https://github.com/open-webui/open-webui.git && cd open-webui

Access the interface at http://localhost:3000.

Step 6: Testing and Usage

Test the model by providing context and questions either via Python scripts or through the web interface.

Example query in Python:

context = "The Great Wall of China is over 13,000 miles long."
question = "How long is the Great Wall of China?"
answer = teapot_ai(context + "\n" + question)
print(answer[0]['generated_text'])

Expected output:

The Great Wall of China is over 13,000 miles long.

Practical Coding Examples of Teapot LLM

Example 1: General Question Answering (QnA)

In this example, we will use Teapot LLM to answer questions based on a provided context. The model is optimized to respond conversationally and is trained to avoid answering questions that can't be answered from the given context, reducing hallucinations.PythonCopy

from teapotai import TeapotAI

# Sample context
context = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
It stands at a height of 330 meters and is one of the most recognizable structures in the world.
"""

# Initialize TeapotAI
teapot_ai = TeapotAI()

# Get the answer using the provided context
answer = teapot_ai.query(
    query="What is the height of the Eiffel Tower?",
    context=context
)
print(answer)  # Output: "The Eiffel Tower stands at a height of 330 meters." 

# Example of hallucination resistance
context_without_height = """
The Eiffel Tower is a wrought iron lattice tower in Paris, France. It was designed by Gustave Eiffel and completed in 1889.
"""

answer = teapot_ai.query(
    query="What is the height of the Eiffel Tower?",
    context=context_without_height
)
print(answer)  # Output: "I don't have information on the height of the Eiffel Tower."

Example 2: Chat with Retrieval-Augmented Generation (RAG)

In this example, we will use Teapot LLM with Retrieval-Augmented Generation (RAG) to determine which documents are relevant before answering a question. This is useful when you have multiple documents and want the model to answer based on the most relevant ones.PythonCopy

from teapotai import TeapotAI

# Sample documents
documents = [
    "The Eiffel Tower is located in Paris, France. It was built in 1889 and stands 330 meters tall.",
    "The Great Wall of China is a historic fortification that stretches over 13,000 miles.",
    "The Amazon Rainforest is the largest tropical rainforest in the world, covering over 5.5 million square kilometers.",
    "The Grand Canyon is a natural landmark located in Arizona, USA, carved by the Colorado River.",
    "Mount Everest is the tallest mountain on Earth, located in the Himalayas along the border between Nepal and China.",
    "The Colosseum in Rome, Italy, is an ancient amphitheater known for its gladiator battles.",
    "The Sahara Desert is the largest hot desert in the world, located in North Africa.",
    "The Nile River is the longest river in the world, flowing through northeastern Africa.",
    "The Empire State Building is an iconic skyscraper in New York City that was completed in 1931 and stands at 1454 feet tall."
]

# Initialize TeapotAI with documents for RAG
teapot_ai = TeapotAI(documents=documents)

# Get the answer using RAG
answer = teapot_ai.chat([
    {
        "role": "system",
        "content": "You are an agent designed to answer facts about famous landmarks."
    },
    {
        "role": "user",
        "content": "What landmark was constructed in the 1800s?"
    }
])
print(answer)  # Output: "The Eiffel Tower was constructed in the 1800s."

Additional Tips

Saving and Loading Models: You can save a model with pre-computed embeddings to reduce loading times. TeapotAI is pickle-compatible and can be saved and loaded as shown below.PythonCopy

import pickle

# Pickle the TeapotAI model to a file with pre-computed embeddings
with open("teapot_ai.pkl", "wb") as f:
    pickle.dump(teapot_ai, f)

# Load the pickled model
with open("teapot_ai.pkl", "rb") as f:
    loaded_teapot_ai = pickle.load(f)

# You can now use the loaded instance as you would normally
print(len(loaded_teapot_ai.documents))  # Output: 10 Documents with precomputed embeddings
loaded_teapot_ai.query("What city is the Eiffel Tower in?")  # Output: "The Eiffel Tower is located in Paris, France."

Updating TeapotAI: Ensure you have the latest version of TeapotAI by running:bashCopy

pip install --upgrade teapotai

Using a Virtual Environment: It's a good practice to create a virtual environment for your project to manage dependencies. You can create a virtual environment using:bashCopy

python3 -m venv teapot-env
source teapot-env/bin/activate

These examples demonstrate how to use Teapot LLM for general question answering and retrieval-augmented generation. By leveraging these capabilities, you can build robust applications for a variety of natural language processing tasks.

Troubleshooting

Model Not Loading:
Ensure all dependencies are installed correctly and verify internet connectivity for downloading models.
GPU Not Detected:
Check your NVIDIA driver installation using nvidia-smi.

Web Interface Issues:
Verify that the Docker containers are running with:

docker ps

Conclusion

Running Teapot LLM on Ubuntu offers a cost-effective and efficient way to deploy a versatile language model locally. Whether you’re experimenting with conversational Q&A, retrieval-augmented generation, or information extraction, this guide provides all the necessary steps for a seamless setup.