Run Nari Dia 1.6B on Mac: Installation Guide

Nari Dia 1.6B is an advanced, open-source text-to-speech (TTS) model developed by Nari Labs. With 1.6 billion parameters, it is designed to generate highly realistic, multi-speaker conversational audio.
Its open weights and code, Apache 2.0 license, and dialogue-centric features make it a compelling alternative to commercial TTS services like ElevenLabs.
What is Nari Dia 1.6B?
- Model Size: 1.6 billion parameters, optimized for capturing intricate speech patterns.
- Dialogue Generation: Supports scripts with multiple speakers using simple tags (e.g.,
[S1]
,[S2]
). - Non-Verbal Communication: Can generate sounds like laughter, coughs, and throat clearing when specified in the input.
- Audio Conditioning: Allows users to influence voice output via audio samples, enabling emotion and tone control.
- Open Source: Released under Apache 2.0, with open weights and code available on Hugging Face.
- Language Support: Currently, only English is supported1.
Hardware and Software Requirements
A. Hardware Requirements
- GPU Dependency: Dia 1.6B is designed for CUDA-enabled NVIDIA GPUs, requiring about 10GB of GPU memory for full performance. This typically means you need a mid-range to high-end GPU (e.g., RTX 3070/4070 or better)1.
- Mac Hardware: Most Macs, including the latest Apple Silicon (M1, M2, M3), do not natively support CUDA. This presents a challenge for running Dia 1.6B at full speed on a Mac.
- CPU Support: As of now, official CPU support is planned but not yet available. Running the model on CPU will be possible in the future, but with significant performance limitations1.
- RAM: At least 16GB of system RAM is recommended, especially if running in a virtualized or emulated environment.
B. Software Requirements
- Operating System: macOS (latest version recommended for compatibility and security)2.
- Python: Python 3.8 or later.
- Git: For cloning the repository.
- uv (Recommended): A fast Python package manager (
pip install uv
)1. - PyTorch: The model requires PyTorch 2.0+ and CUDA 12.6 for GPU acceleration. On Mac, PyTorch can be installed for CPU, but without CUDA support.
- Other Dependencies: Hugging Face Transformers, Gradio, and Descript Audio Codec (handled by the setup script).
Challenges of Running Dia 1.6B on Mac
A. Lack of Native CUDA Support
Apple Silicon and Intel-based Macs do not natively support NVIDIA’s CUDA, which is required for optimal Dia 1.6B performance. This means:
- No direct GPU acceleration: Running on Mac will default to CPU, resulting in much slower inference.
- Workarounds: Advanced users may attempt to use cloud GPUs, Docker containers with GPU passthrough (on supported hardware), or emulation/virtualization, but these are complex and not officially supported.
B. Alternatives for Mac Users
- Try Online Demo: Use the Hugging Face ZeroGPU Space for a cloud-hosted demo without local setup.
- Wait for CPU/Quantized Versions: Nari Labs plans to release CPU-compatible and quantized versions, which will lower hardware requirements and improve accessibility for Mac users.
- Explore Orpheus.CPP: Other TTS models like Orpheus.CPP can be run on Mac CPUs, though with different features and quality.
Step-by-Step Installation Guide
A. Preparing Your Mac
- Update macOS: Ensure your system is up to date for best compatibility and security.
- Install Homebrew (if not already installed):bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install Python and Git:bash
brew install python git
- (Optional) Install uv:bash
pip3 install
uv
B. Clone the Dia Repository
Open Terminal and run:
bashgit
clone https://github.com/nari-labs/dia.gitcd
dia
C. Set Up Python Environment
Using uv (Recommended):
bashuv run app.py
- The first run will install all dependencies and download the model weights. This may take some time.
Manual Setup (Alternative):
bashpython3 -m venv .venvsource
.venv/bin/activatepip install
-r requirements.txt
python app.py
D. Running the Application
- The application will launch a Gradio web interface, allowing you to enter text, select speakers, and generate audio.
- On Mac, expect slower performance since inference will run on CPU.
Using Dia 1.6B: Features and Workflow
A. Dialogue Input Format
- Use tags to designate speakers:text[S1] Hello, how are you?
[S2] I'm fine, thank you! (laughs) - Non-verbal cues (e.g.,
(laughs)
,(coughs)
) are supported in the script.
B. Audio Conditioning
- Upload a short audio sample to clone a voice or set emotional tone.
- This feature enables custom voices and expressive speech output.
C. Gradio UI
- The Gradio interface provides fields to input text, select speaker tags, upload audio samples, and listen to generated speech.
Performance Considerations
- CPU Inference: On Mac, generation will be significantly slower than on a CUDA-enabled GPU. Expect long wait times for audio synthesis, especially for longer scripts.
- Resource Usage: The model is memory-intensive; ensure you have sufficient RAM and disk space.
- Future Improvements: Quantized and CPU-optimized versions are expected to improve performance and accessibility for Mac users1.
Troubleshooting and Tips
- Dependency Issues: Ensure all required Python packages are installed. If issues arise, try updating pip and reinstalling dependencies.
- Model Download Problems: Check your internet connection and available disk space.
- Slow Performance: This is expected on Mac due to lack of GPU acceleration. Consider using cloud-based inference for faster results.
- Audio Output Issues: Verify your Mac’s sound settings and output device configuration.
Alternative Approaches for Mac Users
A. Use Hugging Face ZeroGPU Space
- No installation required.
- Try Dia 1.6B online with limited usage1.
B. Wait for CPU/Quantized Support
- Monitor Nari Labs’ announcements for updates on CPU-compatible releases.
C. Explore Other TTS Options
- Orpheus.CPP: Can run on Mac CPU, supports text-to-speech with less hardware demand, though with different feature sets.
- Other Open-Source TTS Models: Research alternatives that are optimized for CPU or Apple Silicon.
Conclusion
Running Nari Dia 1.6B on a Mac is possible, but with significant performance limitations due to the lack of native CUDA GPU support. The model’s open-source nature and advanced dialogue capabilities make it an exciting tool for developers, researchers, and hobbyists interested in TTS and voice cloning.
For the best experience, use a CUDA-enabled GPU on a supported system, or leverage cloud-based demos until Mac-optimized versions are available.