How to Run And Install MiniMax M2.7 for Coding and AI Agents: Benchmark and Test

MiniMax M2.7 is a new AI model focused on coding, agents, and complex workflows. It competes with top models on engineering and agent benchmarks while keeping a lower token price.

This guide explains what MiniMax M2.7 is, how to install and run it, and how it performs on public tests. You will also see comparisons, pricing details, and an end‑to‑end demo you can follow.

What Is MiniMax M2.7

MiniMax M2.7 is a large language model from MiniMax, built for real‑world software engineering and productivity tasks. MiniMax reports that M2.7 scored 56.22 percent on the SWE‑Pro coding benchmark and 55.6 percent on VIBE‑Pro, which measure end‑to‑end software tasks.

MiniMax designed M2.7 to improve its own training process through a loop of experiments and feedback, which they call model self‑evolution. Reinforcement learning runs let the model update its internal “agent harness” and skills based on experiment results.

The model also targets professional office work. On the GDPval‑AA evaluation for office productivity, MiniMax reports an ELO score of 1495, which they state is the highest among open models.

Key Features

Strong coding and debugging
M2.7 scores 56.22 percent on SWE‑Pro, a benchmark that tests real software bug fixes and code tasks.
End‑to‑end project delivery
On VIBE‑Pro, it reaches 55.6 percent, which covers full project building workflows from requirements to working code.
Deep system understanding
On Terminal Bench 2, which focuses on terminal‑based system debugging, M2.7 scores 57.0 percent.
Office document editing
It improves complex editing for Excel, PowerPoint, and Word, and reaches 1495 ELO on GDPval‑AA in office tasks.
Tool and agent performance
In MMClaw tests with 40 complex skills over 2000 tokens each, the model keeps 97 percent skill compliance.
Toolathon results show 46.3 percent accuracy, placing it among top tool‑using agents.
Agent coding benchmarks
On PinchBench, an OpenClaw coding‑agent benchmark, M2.7 reaches a best score of 86.2 percent, close to Claude Opus 4.6 and other top models.
Custom agent behavior
Kilo Bench results show a 47 percent task pass rate, with a tendency to read more context before acting, which helps on complex problems.
Long context window
AI SDK data lists a 204,800 token context window, which supports large codebases and long conversations.
A context window is the maximum amount of text (input plus output) the model can handle in one session.
API and tooling compatibility
MiniMax exposes M2.7 with Anthropic‑style and OpenAI‑style APIs, so many existing tools can swap it in as a drop‑in model.
Competitive price
MiniMax pay‑as‑you‑go pricing sets MiniMax‑M2.7 at 0.30 USD per million input tokens and 1.20 USD per million output tokens.
A token is a short chunk of text, often a few characters or part of a word.
Reasoning and intelligence index
Artificial Analysis reports a composite Intelligence Index score of 50, far above the average of 20 for comparable models.

How to Install or Set Up

MiniMax M2.7 runs through cloud APIs, so you install the client and configure endpoints, not the model weights. Below is a basic setup path for most developers.

Step 1: Create a MiniMax Account

Go to the MiniMax developer platform and sign up with email or supported login providers.
Complete identity checks if the platform asks for them, for billing and abuse control.
Log in to the dashboard and open the API or “Models” section.

Step 2: Get an API Key

In the dashboard, locate the API Keys area under settings or models.
Create a new key for server use and copy it to a safe place.
Store this key in environment variables, for example MINIMAX_API_KEY, not in code.

An API key is a secret string that identifies your account for each request.

Step 3: Choose the Right Endpoint

MiniMax offers different endpoints based on region and protocol.

For international users, the standard base URL is https://api.minimax.io/v1 for OpenAI‑style chat APIs.
For users in mainland China, the base URL is https://api.minimaxi.com/v1.
For Anthropic‑compatible setups, some tools use https://api.minimax.io/anthropic.

Step 4: Configure an SDK or Client

Many tools treat M2.7 as an Anthropic‑compatible or OpenAI‑compatible model.

For AI coding tools that rely on Anthropic variables, MiniMax shows an example configuration with ANTHROPIC_BASE_URL set to https://api.minimax.io/anthropic and the model name MiniMax-M2.7.
For generic OpenAI‑style clients, you set the base URL to https://api.minimax.io/v1 and use MiniMax-M2.7 as the model name.

An SDK is a ready‑made library that wraps HTTP calls and helps with authentication, retries, and streaming.

Step 5: Optional – Use a Hosted Toolchain

You can avoid direct API setup in some cases.

AI SDK playground lists MiniMax M2.7 with built‑in pricing and context settings, so you can swap it in by model name.
Some coding tools, such as OpenCode and other mini IDE agents, include direct provider options for MiniMax.

Step 6: Confirm Access

Use a small test prompt from your terminal or tool of choice.
Check that the model responds within allowed time and does not return authentication errors.
If you see HTTP errors, confirm base URL, API key, and model name fields in your config.

How to Run or Use It

This section shows how to run MiniMax M2.7 through a standard chat‑style API and how to design prompts for real tasks.

Basic Chat Completion Call (Conceptual)

You can call M2.7 through any HTTP client that supports JSON.
Here is a typical request shape in plain language, based on the OpenAI‑style API structure.

URL: https://api.minimax.io/v1/chat/completions (for international users).
Headers: Authorization: Bearer <MINIMAX_API_KEY> plus Content-Type: application/json.
Body fields:
- model: "MiniMax-M2.7"
- messages: a list with a system prompt and user prompt
- max_tokens: response limit
- temperature: controls output variety

A system prompt is a short message that defines the role and rules for the model.

Example: General Coding Assistant

You might send a request like this in JSON form (described, not exact code).

System message: “You are a senior software engineer. Explain changes before you show code.”
User message: “I have a failing test in my payment service. Here is the stack trace and code snippet …”

M2.7 then reads the whole context, analyzes dependencies, and proposes edits.
Kilo’s analysis notes that the model tends to read surrounding files and trace call chains before writing changes, which helps on hard bugs.

Example: Multi‑File Refactor with Agents

M2.7 is used inside agent frameworks such as OpenClaw and custom agent harnesses to perform multi‑step coding workflows.

A typical agent loop:

The agent sends the current task, file list, and constraints to M2.7.
M2.7 decides which files to inspect and which functions to modify.
The agent executes those edits and re‑runs tests.
M2.7 reviews new logs and continues until tests pass or a limit triggers.

This style matches the behavior seen in SWE‑Pro, VIBE‑Pro, and Kilo Bench evaluations, where M2.7 works across many files and test runs.

Example: Office Automation

For office tasks, the prompt can include instructions for Excel formulas, slide outlines, or Word document edits.

Workflow:

Provide the current document structure or a sample in text.
Ask M2.7 to add or edit sections, tables, or bullet points.
Feed its output into a script that writes actual .xlsx, .pptx, or .docx files.

MiniMax reports that M2.7 handles multi‑round editing and keeps high‑fidelity output for Office files.

Runtime Tips

Keep prompts focused on one task at a time, but include needed context such as key files or requirements.
Use the long context window for cross‑file tasks, but watch token counts, since Kilo’s testing shows M2.7 tends to consume around 2.8 million input tokens per trial.
For sensitive or production workloads, start with smaller max tokens and increase after you confirm cost and stability.

Benchmark Results

The table below collects public benchmark numbers that include MiniMax M2.7.

MiniMax M2.7 Benchmark Scores

Benchmark / Metric	Score / Value	Category	Source
SWE‑Pro	56.22%	Software engineering (bug fixes, code tasks)	MiniMax official report
VIBE‑Pro	55.6%	End‑to‑end project delivery	MiniMax official report
Terminal Bench 2	57.0%	Terminal and system debugging	MiniMax official report
GDPval‑AA ELO	1495	Office productivity benchmark	MiniMax official report
MMClaw skill adherence	97% across 40 skills	Tool‑using agent compliance	MiniMax official report
Toolathon	46.3%	Tool‑use benchmark	MiniMax official report
PinchBench (best score)	86.2%	OpenClaw coding agent success	PinchBench leaderboard, Kilo blog
Kilo Bench pass rate	47%	Custom agent coding benchmark	Kilo analysis
Intelligence Index (AA)	50	Composite intelligence score	Artificial Analysis
Output tokens per second	49.9 t/s	Generation speed	Artificial Analysis

These scores place M2.7 in the top group for real‑world coding agents and strong among reasoning models in its price class.

Testing Details

MiniMax and third‑party teams use a mix of synthetic and real‑world tasks to test M2.7.

Software Engineering Benchmarks

SWE‑Pro measures how often the model fixes real GitHub issues by editing multiple files and passing tests.
VIBE‑Pro covers full project flows, including planning, coding, and validation tasks for complex systems.
Terminal Bench 2 focuses on shell‑level debugging, log inspection, and command design in realistic production‑style setups.

MiniMax reports that M2.7 matches or approaches top closed models on SWE‑Pro and similar tasks.

Office and Knowledge Work

GDPval‑AA is a benchmark for office work such as document editing, spreadsheet operations, and presentation changes.

M2.7’s ELO 1495 result suggests stronger performance than other open models in document processing settings, according to MiniMax.

Tool and Agent Evaluations

MMClaw tests 40 complex skills, each with long prompts, and measures how well the model follows skill definitions.
Toolathon evaluates tool‑using agents across many tools and scenarios; M2.7’s 46.3 percent score places it with top tool‑use models.

In these tests, M2.7 must call tools in the correct order, pass the right arguments, and maintain memory across long runs.

Independent Benchmarks

Artificial Analysis runs a unified Intelligence Index across many tasks and reports a score of 50 for M2.7, well above the average of 20 for similar models.

They also report a generation speed near 49.9 tokens per second and note that the model tends to use more tokens than peers during evaluations.

Kilo’s independent tests on PinchBench and Kilo Bench highlight M2.7’s deep reading behavior and strong success on hard coding tasks that require broad context.

Comparison Table

This table compares MiniMax M2.7 with several strong coding and agent models on public metrics and pricing.

MiniMax M2.7 vs Other Coding / Agent Models

Model	Provider	Best PinchBench %	Intelligence Index (if public)	Input Price (USD / 1M tokens)	Output Price (USD / 1M tokens)	Context Window (tokens)	Notes
MiniMax M2.7	MiniMax	86.2%	50	0.30	1.20	204,800	Focus on coding agents and productivity tasks
MiniMax M2.5	MiniMax	86.6%	Not listed	0.30	1.20	~197,000 (M2 family)	Older, strong coding model, faster in some tasks
Qwen3.5‑plus‑02‑15	Alibaba	85.8%	Not listed	Not public in same sources	Not public in same sources	Not listed	Strong agent model across OpenClaw tasks
GLM‑5	Zhipu AI	86.4%	Not listed	Not public in same sources	Not public in same sources	Not listed	High PinchBench score and strong coding tasks
Claude Opus 4.6	Anthropic	87.4%	Not yet on same index	5.00	25.00	1,000,000	Frontier closed model with very large context

From this view, M2.7 sits close to top‑tier models on PinchBench while keeping a much lower token price than Claude Opus 4.6.

Pricing Table

MiniMax provides both pay‑as‑you‑go pricing and subscription “Token Plan” options for M2.7, plus some free paths through partners.

MiniMax M2.7 Pricing Overview

Tier / Option	What You Get	Price / Limits	Notes
Free trial (MiniMax direct)	Intro token quota across models	Around 500k free tokens for first 2 months (all models)	Good for initial experiments
Pay‑as‑you‑go: MiniMax‑M2.7	Standard M2.7 API usage	0.30 USD per 1M input tokens; 1.20 USD per 1M output tokens	Same rate as other M2.x text models
Pay‑as‑you‑go: M2.7‑highspeed	Faster M2.7 variant	0.60 USD per 1M input tokens; 2.40 USD per 1M output tokens	Higher cost for lower latency
Token Plan Starter / Plus / Max	Fixed requests every 5 hours, multiple models	Prices vary by plan; support for M2.7 and related models	Good for steady workloads
Max‑Highspeed subscription	High‑speed access for M2.7‑highspeed	80 USD per month for ~15,000 requests per 5 hours	For heavy coding teams
Ultra‑High‑Speed subscription	Highest request volume, high‑speed tier	150 USD per month for ~30,000 requests per 5 hours	For large organizations
Puter.js user‑pays integration	Serverless access to MiniMax models	App users pay their own AI costs; developer pays zero	Useful when you build consumer apps

For many developers, pay‑as‑you‑go M2.7 plus occasional high‑speed use covers most needs.
Larger teams can cap costs with fixed request plans and high‑speed tiers.

USP — What Makes MiniMax M2.7 Different

MiniMax M2.7 focuses on deep, context‑heavy workflows rather than short, single‑response chats.

Kilo’s analysis shows that the model reads more files and explores more paths in hard coding tasks, which lets it solve problems that other models miss.

Pros and Cons

Pros

Strong results on SWE‑Pro, VIBE‑Pro, Terminal Bench 2, and other real engineering benchmarks.
High office productivity performance with 1495 ELO on GDPval‑AA and strong multi‑round document editing.
Excellent tool and agent behavior, with 97 percent skill adherence in MMClaw and good Toolathon accuracy.
Near‑top PinchBench and Kilo Bench scores for coding agents, close to premium closed models.
Long context window around 204,800 tokens for large codebases and long workflows.
Pay‑as‑you‑go pricing much lower than frontier closed models such as Claude Opus 4.6.
Fits into Anthropic‑style and OpenAI‑style APIs, which eases integration with many existing tools.

Cons

Output speed near 49.9 tokens per second, which is slower than the median speed for similar reasoning models.
Higher token usage per task in some agent runs; Kilo reports around 2.8 million input tokens per trial.
Many benchmark claims, such as SWE‑Pro and VIBE‑Pro scores, come from MiniMax’s own tests and still need wider independent verification.
Model weights are not fully open for local offline use; most users depend on cloud APIs or managed platforms.

Quick Comparison Chart

This chart summarizes where MiniMax M2.7 fits best compared to nearby options.

Recommended Use by Task Type

Task Type	Best Fit Model / Option	Reason
Deep refactors across large codebases	MiniMax M2.7	Strong long‑context agent behavior and high coding benchmarks
Time‑critical coding with short tasks	MiniMax M2.5 or lighter models	Faster task completion for some workflows, lower token usage
High‑stakes reasoning or research with huge context	Claude Opus 4.6	Very large context window and top‑tier quality, but higher price
Budget‑sensitive apps with many users	MiniMax M2.7 via pay‑as‑you‑go or Puter.js	Lower per‑token price and user‑pays options
General OpenClaw agent benchmarks	M2.5, M2.7, Qwen3.5, GLM‑5	All near the top of PinchBench scores

Demo or Real‑World Example

This demo shows how you might use MiniMax M2.7 to fix a real bug in a web service through an API‑driven workflow.

Scenario

You have a backend service with a failing integration test after a change in the payment module. You want an agent that can read the test failure, inspect code, and propose a patch that passes tests.

Step 1: Collect Inputs

Gather the failing test output, including stack trace and error message.
Export the main payment service file and any helper modules that touch the payment logic.
Note database schema details and external API constraints that the code must respect.

Step 2: Design the Prompt

Prepare a structured prompt for M2.7 that includes:

A short task description: “Find and fix the bug that breaks this payment test.”
The failing test log.
The content of the payment service file and any related modules.

M2.7 has been shown to read surrounding files and understand dependencies before writing fixes, which aligns with this task style.

Step 3: Send the Request

Through your chat‑completion client:

Set the system message to define the role, such as “You are a senior backend engineer who writes safe, tested changes.”
Put the combined logs and code into the user message, marked with clear sections.
Limit max_tokens to a safe value for cost while leaving enough room for explanations and patches.

Step 4: Review the Response

M2.7 should:

Explain what caused the failure based on the stack trace and code path.
Point to the exact function or block that needs adjustment.
Propose a patch with code changes and sometimes new tests.

You then:

Apply the patch in a feature branch.
Run tests in your CI or local environment.
If tests still fail, send new logs back to M2.7 for another iteration.

This loop mirrors the workflows behind benchmarks like SWE‑Pro and Terminal Bench, where the model edits code and uses tests as feedback.

Step 5: Automate with an Agent Harness

For repeatable use:

Wrap this process in a small agent script that:
- Calls M2.7 with logs and code.
- Applies edits to files.
- Runs tests and collects output.
Store run history to track which prompts and patches lead to stable fixes.

MiniMax’s own harness work and MMClaw tests show that M2.7 handles multi‑step agent plans with high skill adherence, so this approach maps well to its strengths.

Conclusion

MiniMax M2.7 targets developers and teams that need strong coding agents, tool use, and office automation at a moderate price per token.

Its benchmark scores on SWE‑Pro, PinchBench, and other evaluations show that it sits close to premium closed models in many real tasks.

If your main need is deep analysis across large codebases or complex workflows, M2.7 is a strong candidate to test in your stack.

FAQ

1. Is MiniMax M2.7 open source?

No, MiniMax M2.7 is accessed through cloud APIs, not through fully open weights you can host yourself.

2. Can I use M2.7 for general chat?

Yes, but the model focuses on coding, agents, and productivity tasks, so it shines most in structured workflows rather than small talk.

3. How expensive is M2.7 compared to Claude Opus?

M2.7 costs about 0.30 USD per 1M input tokens and 1.20 USD per 1M output tokens, while Claude Opus 4.6 costs 5.00 and 25.00 USD for the same amounts.

4. Does M2.7 support long documents and big codebases?

Yes, it supports a context window around 204,800 tokens, which fits large repositories and long reports in a single session.

5. Where should I start if I just want to try it?

Use the MiniMax free tier or a playground like AI SDK or Puter.js, set MiniMax-M2.7 as the model, and try a small coding or document task.