DeepSeek V3.2 API Guide: Using deepseek-chat and deepseek-reasoner with the OpenAI SDK

The DeepSeek API is a drop-in replacement for the OpenAI API. Change your base URL and model name, and every Python or Node.js app built against OpenAI's SDK starts routing requests to DeepSeek V3.2 instead. This DeepSeek API tutorial walks you through the full setup — from generating an API key to handling the reasoning tokens that deepseek-reasoner produces — so you can integrate either model into production code today.

Why Use the DeepSeek API in Your Projects

DeepSeek V3.2 competes with GPT-4o and Claude Sonnet on most developer benchmarks while costing a fraction of what those APIs charge per million tokens. More importantly for teams already on OpenAI: the DeepSeek API is fully compatible with the OpenAI REST spec. There is no new SDK to learn, no new message format, and no migration script. You point your existing client at https://api.deepseek.com and the rest stays the same.

The API exposes two models, each serving a distinct use case:

deepseek-chat vs deepseek-reasoner: What's the Actual Difference

deepseek-chat is DeepSeek V3.2 in standard mode. It behaves like any capable large language model — fast, cost-efficient, suited to classification, summarisation, code generation, and general Q&A. No special output structure, no extra tokens.

deepseek-reasoner is DeepSeek V3.2 in thinking mode. Before it writes the final answer, the model produces a full chain-of-thought trace stored in a separate reasoning_content field. This makes it significantly stronger on multi-step problems: maths, complex code review, legal analysis, and anything that benefits from explicit intermediate reasoning. The trade-off is latency and cost — reasoning tokens are generated and billed in addition to the answer tokens.

For a deeper look at how DeepSeek's reasoning and chat variants compare on benchmarks, see our DeepSeek V3 vs DeepSeek R1 technical comparison.

Step 1 — Get Your API Key

  1. Go to platform.deepseek.com and create an account.
  2. Navigate to API Keys in the left sidebar.
  3. Click Create new secret key, copy it immediately — it is shown only once.
  4. Store it as an environment variable. Never hardcode it in source files.
# Linux / macOS
export DEEPSEEK_API_KEY="sk-..."

# Windows (PowerShell)
$Env:DEEPSEEK_API_KEY = "sk-..."

Add the export to your .bashrc, .zshrc, or your CI/CD secrets store so it persists across sessions.

Step 2 — Configure the OpenAI SDK for the DeepSeek API Tutorial

You do not need to install a DeepSeek-specific package. The standard OpenAI SDK handles everything once you override base_url.

Python

pip install openai
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

That is the entire configuration. Every client.chat.completions.create() call you write from this point targets the DeepSeek API.

Node.js / JavaScript

npm install openai
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com",
});

The rest of this guide uses Python examples, but every pattern translates directly to the JavaScript SDK — the method names and response shapes are identical.

Your First DeepSeek API Call

The following example sends a single user message to deepseek-chat and prints the response:

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what a context window is in one sentence."},
    ],
)

print(response.choices[0].message.content)

The response object is identical to an OpenAI response. response.choices[0].message.content holds the assistant text. response.usage breaks down prompt tokens, completion tokens, and total tokens for billing.

Switching to deepseek-reasoner is a one-word change:

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "user", "content": "What is the derivative of x^3 + 2x?"},
    ],
)

# The final answer
print(response.choices[0].message.content)

# The full chain-of-thought trace (unique to deepseek-reasoner)
print(response.choices[0].message.reasoning_content)

Streaming Responses from the DeepSeek API

For interactive applications, streaming avoids the long blank wait before any output appears. Set stream=True and iterate over the returned chunks:

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Write a Python function to flatten a nested list."},
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

When streaming with deepseek-reasoner, the response carries two concurrent channels — delta.reasoning_content arrives first (the thinking phase), followed by delta.content (the answer). The hasattr guard below prevents attribute errors on SDK versions that do not expose reasoning_content for non-reasoner models:

stream = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "user", "content": "Prove that sqrt(2) is irrational."},
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    # reasoning_content is only present on deepseek-reasoner responses
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        print(delta.reasoning_content, end="", flush=True)
    elif delta.content:
        print(delta.content, end="", flush=True)

Using deepseek-reasoner: Thinking Tokens and the Multi-Turn Pitfall

The reasoning_content field is the most powerful — and most mishandled — part of the deepseek-reasoner API. Here is what developers get wrong in multi-turn conversations.

Critical: Never include reasoning_content from a previous assistant turn when constructing the next request's message list. The API returns a 400 error if it detects reasoning_content in the input messages. Strip it before building the next turn.

Correct multi-turn pattern:

messages = [{"role": "user", "content": "What is 17 x 23?"}]

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
)

assistant_msg = response.choices[0].message

# Build next turn: only include role + content, NOT reasoning_content
messages.append({
    "role": "assistant",
    "content": assistant_msg.content,
    # Do NOT append reasoning_content here -- it causes a 400 error
})

messages.append({"role": "user", "content": "Now multiply that result by 4."})

response2 = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages,
)

print(response2.choices[0].message.content)

If you are storing conversation history in a database, save reasoning_content separately for observability or debugging — but never rehydrate it into the messages array sent to the API.

For details on how DeepSeek V3.2 performs across different API providers and which endpoint configurations give the best throughput, see our DeepSeek V3.2 API providers and performance guide.

JSON Mode and Function Calling

JSON Mode

To guarantee that the model returns a valid JSON string, set response_format to json_object and include the word "json" in your system or user prompt. Without the prompt keyword, the model may ignore the format instruction.

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {
            "role": "system",
            "content": "Return a JSON object with keys: name, language, stars.",
        },
        {
            "role": "user",
            "content": "Describe the FastAPI framework as json.",
        },
    ],
    response_format={"type": "json_object"},
)

import json
data = json.loads(response.choices[0].message.content)
print(data)

Function Calling

DeepSeek V3.2 supports OpenAI-style tool use in both deepseek-chat and deepseek-reasoner modes. Define tools as JSON Schema objects and pass them in the tools parameter:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Returns current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "What is the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # get_weather
print(tool_call.function.arguments)  # {"city": "Berlin"}

After receiving the tool call, execute your function, then append a tool role message with the result and make a second API call to get the model's final response. The pattern is identical to OpenAI function calling.

Best Practices: Temperature, Context, and Error Handling

  • Temperature for deepseek-chat: Default is 1.0. For deterministic tasks (code, structured data), use 0.0 to 0.3. For creative tasks, 1.0 to 1.3.
  • Temperature for deepseek-reasoner: The recommended value is 0.6. In thinking mode, top_p, presence_penalty, and frequency_penalty are accepted by the API but have no effect on output.
  • System prompts with deepseek-reasoner: Keep them short and direct. The model reasons natively — complex chain-of-thought prompting instructions add noise without improving output.
  • Context window: Both deepseek-chat and deepseek-reasoner share a 128K token context window. For long documents, chunk and summarise rather than stuffing the full context — reasoning token costs scale with context length.
  • Error handling: Wrap calls in try/except and handle 429 (rate throttling during high traffic) with exponential backoff. DeepSeek does not impose hard rate limits but the API can slow under load.
import time
from openai import RateLimitError

def call_with_backoff(client, **kwargs):
    for attempt in range(4):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            if attempt == 3:
                raise
            time.sleep(2 ** attempt)

If you are deploying DeepSeek API calls as part of a serverless application, our guide on integrating DeepSeek with Vercel covers environment variable handling and edge function patterns.

DeepSeek API Pricing and Rate Limits

DeepSeek's API pricing is substantially lower than equivalent OpenAI models. The approximate current rates are listed below — verify at platform.deepseek.com/pricing before committing to production budgets.

  • deepseek-chat (V3.2): ~$0.28 per 1M input tokens / ~$0.42 per 1M output tokens
  • deepseek-reasoner (V3.2): ~$0.55 per 1M input tokens / ~$2.19 per 1M output tokens (reasoning tokens billed as output)

For cost-sensitive workloads, use deepseek-chat for classification, summarisation, and standard generation. Reserve deepseek-reasoner for tasks where the quality improvement from chain-of-thought reasoning is measurable and justifies the higher output token cost.

DeepSeek does not publish hard rate limits. The platform attempts to serve all requests, but during peak usage you may see increased latency. Implement the retry pattern above, and monitor response.usage to track token consumption per call.

For a broader look at what the current V3.2 model is capable of across different tasks, see the DeepSeek V3.2 benchmarks and feature guide.