Running DeepSeek Janus Pro 1B on Hugging Face

Running DeepSeek Janus Pro 1B on Hugging Face

DeepSeek Janus Pro 1B is a cutting-edge multimodal model capable of text-to-image generation and text understanding. This guide walks you through setting it up on Hugging Face and leveraging its advanced capabilities.

1. Environment Setup

Prerequisites

  • Python 3.8+ and pip installed.
  • A Hugging Face account (to access models and libraries).
  • GPU recommended (for faster inference).

Step 1: Install Libraries

Install core dependencies via pip:

pip install transformers torch accelerate diffusers  # Base libraries
pip install -U datasets huggingface_hub  # Optional for data handling

Step 2: Clone the Repository (Optional)

For example scripts and custom utilities, clone the DeepSeek repository:

git clone https://github.com/deepseek-ai/Janus.git
cd Janus && pip install -r requirements.txt  # Install model-specific dependencies

2. Running the Model

Text-to-Image Generation

Use the MultiModalityCausalLM class for multimodal tasks:

from transformers import AutoProcessor, MultiModalityCausalLM

# Load model and processor
processor = AutoProcessor.from_pretrained("deepseek-ai/Janus-Pro-1B")
model = MultiModalityCausalLM.from_pretrained(
    "deepseek-ai/Janus-Pro-1B", 
    device_map="auto",  # Auto-detects GPU/CPU
    torch_dtype="auto"  # Optimizes precision (float16/32)
)

# Generate an image from text
prompt = "A futuristic cityscape at sunset"
inputs = processor(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs)

# Save the generated image
outputs.images[0].save("cityscape.png")

Text Generation (Chat/QA)

For text-only tasks, use the standard text-generation pipeline:

from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/Janus-Pro-1B")
response = pipe("Explain quantum computing simply:", max_length=200, temperature=0.7)
print(response[0]['generated_text'])

3. Advanced Features & Optimization

WebGPU Browser Integration

Run Janus Pro 1B in-browser using Transformers.js:

import { AutoProcessor, MultiModalityCausalLM } from '@xenova/transformers';

const model = await MultiModalityCausalLM.from_pretrained("deepseek-ai/Janus-Pro-1B");
const processor = await AutoProcessor.from_pretrained(model);
// Generate images/text directly in the browser

Quantization for Low-Memory Devices

Reduce VRAM usage with 4-bit quantization:

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4"
)
model = MultiModalityCausalLM.from_pretrained(
    "deepseek-ai/Janus-Pro-1B", 
    quantization_config=bnb_config
)

4. Troubleshooting

Issue Solution
CUDA Out of Memory Use fp16 or 4-bit quantization.
Slow Inference Enable device_map="auto" and torch.compile(model).
Model Not Found Ensure you’re logged into Hugging Face: huggingface-cli login.

5. Key Notes

  1. Hardware Requirements:
    • The 1B parameter model runs on 8GB+ VRAM GPUs (e.g., NVIDIA T4, RTX 3060).
    • For CPU-only setups, expect slower performance (~10 sec/token).
  2. Multimodal Flexibility:
    • Use processor(images=..., text=...) for image-to-text tasks (e.g., captioning).
    • Adjust temperature (0.1–1.0) to balance creativity vs. determinism.
  3. Ethical Use:
    • Comply with Hugging Face’s Model License and avoid harmful content generation.
  4. Hugging Face Spaces Demo:
    • Test the model without local setup:

Additional Resources

For further assistance or updates:

By following these detailed steps, you should be able to successfully install and run DeepSeek Janus-Pro 1B on Hugging Face!