How to Run And Install MiniMax M2.7 for Coding and AI Agents: Benchmark and Test
MiniMax M2.7 is a new AI model focused on coding, agents, and complex workflows. It competes with top models on engineering and agent benchmarks while keeping a lower token price.
This guide explains what MiniMax M2.7 is, how to install and run it, and how it performs on public tests. You will also see comparisons, pricing details, and an end‑to‑end demo you can follow.
What Is MiniMax M2.7
MiniMax M2.7 is a large language model from MiniMax, built for real‑world software engineering and productivity tasks. MiniMax reports that M2.7 scored 56.22 percent on the SWE‑Pro coding benchmark and 55.6 percent on VIBE‑Pro, which measure end‑to‑end software tasks.
MiniMax designed M2.7 to improve its own training process through a loop of experiments and feedback, which they call model self‑evolution. Reinforcement learning runs let the model update its internal “agent harness” and skills based on experiment results.
The model also targets professional office work. On the GDPval‑AA evaluation for office productivity, MiniMax reports an ELO score of 1495, which they state is the highest among open models.
Key Features
- Strong coding and debugging
M2.7 scores 56.22 percent on SWE‑Pro, a benchmark that tests real software bug fixes and code tasks. - End‑to‑end project delivery
On VIBE‑Pro, it reaches 55.6 percent, which covers full project building workflows from requirements to working code. - Deep system understanding
On Terminal Bench 2, which focuses on terminal‑based system debugging, M2.7 scores 57.0 percent. - Office document editing
It improves complex editing for Excel, PowerPoint, and Word, and reaches 1495 ELO on GDPval‑AA in office tasks. - Tool and agent performance
In MMClaw tests with 40 complex skills over 2000 tokens each, the model keeps 97 percent skill compliance.
Toolathon results show 46.3 percent accuracy, placing it among top tool‑using agents. - Agent coding benchmarks
On PinchBench, an OpenClaw coding‑agent benchmark, M2.7 reaches a best score of 86.2 percent, close to Claude Opus 4.6 and other top models. - Custom agent behavior
Kilo Bench results show a 47 percent task pass rate, with a tendency to read more context before acting, which helps on complex problems. - Long context window
AI SDK data lists a 204,800 token context window, which supports large codebases and long conversations.
A context window is the maximum amount of text (input plus output) the model can handle in one session. - API and tooling compatibility
MiniMax exposes M2.7 with Anthropic‑style and OpenAI‑style APIs, so many existing tools can swap it in as a drop‑in model. - Competitive price
MiniMax pay‑as‑you‑go pricing sets MiniMax‑M2.7 at 0.30 USD per million input tokens and 1.20 USD per million output tokens.
A token is a short chunk of text, often a few characters or part of a word. - Reasoning and intelligence index
Artificial Analysis reports a composite Intelligence Index score of 50, far above the average of 20 for comparable models.
How to Install or Set Up
MiniMax M2.7 runs through cloud APIs, so you install the client and configure endpoints, not the model weights. Below is a basic setup path for most developers.
Step 1: Create a MiniMax Account
- Go to the MiniMax developer platform and sign up with email or supported login providers.
- Complete identity checks if the platform asks for them, for billing and abuse control.
- Log in to the dashboard and open the API or “Models” section.
Step 2: Get an API Key
- In the dashboard, locate the API Keys area under settings or models.
- Create a new key for server use and copy it to a safe place.
- Store this key in environment variables, for example
MINIMAX_API_KEY, not in code.
An API key is a secret string that identifies your account for each request.
Step 3: Choose the Right Endpoint
MiniMax offers different endpoints based on region and protocol.
- For international users, the standard base URL is
https://api.minimax.io/v1for OpenAI‑style chat APIs. - For users in mainland China, the base URL is
https://api.minimaxi.com/v1. - For Anthropic‑compatible setups, some tools use
https://api.minimax.io/anthropic.
Step 4: Configure an SDK or Client
Many tools treat M2.7 as an Anthropic‑compatible or OpenAI‑compatible model.
- For AI coding tools that rely on Anthropic variables, MiniMax shows an example configuration with
ANTHROPIC_BASE_URLset tohttps://api.minimax.io/anthropicand the model nameMiniMax-M2.7. - For generic OpenAI‑style clients, you set the base URL to
https://api.minimax.io/v1and useMiniMax-M2.7as the model name.
An SDK is a ready‑made library that wraps HTTP calls and helps with authentication, retries, and streaming.
Step 5: Optional – Use a Hosted Toolchain
You can avoid direct API setup in some cases.
- AI SDK playground lists MiniMax M2.7 with built‑in pricing and context settings, so you can swap it in by model name.
- Some coding tools, such as OpenCode and other mini IDE agents, include direct provider options for MiniMax.
Step 6: Confirm Access
- Use a small test prompt from your terminal or tool of choice.
- Check that the model responds within allowed time and does not return authentication errors.
- If you see HTTP errors, confirm base URL, API key, and model name fields in your config.
How to Run or Use It
This section shows how to run MiniMax M2.7 through a standard chat‑style API and how to design prompts for real tasks.
Basic Chat Completion Call (Conceptual)
You can call M2.7 through any HTTP client that supports JSON.
Here is a typical request shape in plain language, based on the OpenAI‑style API structure.
- URL:
https://api.minimax.io/v1/chat/completions(for international users). - Headers:
Authorization: Bearer <MINIMAX_API_KEY>plusContent-Type: application/json. - Body fields:
model:"MiniMax-M2.7"messages: a list with a system prompt and user promptmax_tokens: response limittemperature: controls output variety
A system prompt is a short message that defines the role and rules for the model.
Example: General Coding Assistant
You might send a request like this in JSON form (described, not exact code).
- System message: “You are a senior software engineer. Explain changes before you show code.”
- User message: “I have a failing test in my payment service. Here is the stack trace and code snippet …”
M2.7 then reads the whole context, analyzes dependencies, and proposes edits.
Kilo’s analysis notes that the model tends to read surrounding files and trace call chains before writing changes, which helps on hard bugs.
Example: Multi‑File Refactor with Agents
M2.7 is used inside agent frameworks such as OpenClaw and custom agent harnesses to perform multi‑step coding workflows.
A typical agent loop:
- The agent sends the current task, file list, and constraints to M2.7.
- M2.7 decides which files to inspect and which functions to modify.
- The agent executes those edits and re‑runs tests.
- M2.7 reviews new logs and continues until tests pass or a limit triggers.
This style matches the behavior seen in SWE‑Pro, VIBE‑Pro, and Kilo Bench evaluations, where M2.7 works across many files and test runs.
Example: Office Automation
For office tasks, the prompt can include instructions for Excel formulas, slide outlines, or Word document edits.
Workflow:
- Provide the current document structure or a sample in text.
- Ask M2.7 to add or edit sections, tables, or bullet points.
- Feed its output into a script that writes actual
.xlsx,.pptx, or.docxfiles.
MiniMax reports that M2.7 handles multi‑round editing and keeps high‑fidelity output for Office files.
Runtime Tips
- Keep prompts focused on one task at a time, but include needed context such as key files or requirements.
- Use the long context window for cross‑file tasks, but watch token counts, since Kilo’s testing shows M2.7 tends to consume around 2.8 million input tokens per trial.
- For sensitive or production workloads, start with smaller max tokens and increase after you confirm cost and stability.
Benchmark Results
The table below collects public benchmark numbers that include MiniMax M2.7.
MiniMax M2.7 Benchmark Scores
These scores place M2.7 in the top group for real‑world coding agents and strong among reasoning models in its price class.
Testing Details
MiniMax and third‑party teams use a mix of synthetic and real‑world tasks to test M2.7.
Software Engineering Benchmarks
- SWE‑Pro measures how often the model fixes real GitHub issues by editing multiple files and passing tests.
- VIBE‑Pro covers full project flows, including planning, coding, and validation tasks for complex systems.
- Terminal Bench 2 focuses on shell‑level debugging, log inspection, and command design in realistic production‑style setups.
MiniMax reports that M2.7 matches or approaches top closed models on SWE‑Pro and similar tasks.
Office and Knowledge Work
GDPval‑AA is a benchmark for office work such as document editing, spreadsheet operations, and presentation changes.
M2.7’s ELO 1495 result suggests stronger performance than other open models in document processing settings, according to MiniMax.
Tool and Agent Evaluations
- MMClaw tests 40 complex skills, each with long prompts, and measures how well the model follows skill definitions.
- Toolathon evaluates tool‑using agents across many tools and scenarios; M2.7’s 46.3 percent score places it with top tool‑use models.
In these tests, M2.7 must call tools in the correct order, pass the right arguments, and maintain memory across long runs.
Independent Benchmarks
Artificial Analysis runs a unified Intelligence Index across many tasks and reports a score of 50 for M2.7, well above the average of 20 for similar models.
They also report a generation speed near 49.9 tokens per second and note that the model tends to use more tokens than peers during evaluations.
Kilo’s independent tests on PinchBench and Kilo Bench highlight M2.7’s deep reading behavior and strong success on hard coding tasks that require broad context.
Comparison Table
This table compares MiniMax M2.7 with several strong coding and agent models on public metrics and pricing.
MiniMax M2.7 vs Other Coding / Agent Models
From this view, M2.7 sits close to top‑tier models on PinchBench while keeping a much lower token price than Claude Opus 4.6.
Pricing Table
MiniMax provides both pay‑as‑you‑go pricing and subscription “Token Plan” options for M2.7, plus some free paths through partners.
MiniMax M2.7 Pricing Overview
For many developers, pay‑as‑you‑go M2.7 plus occasional high‑speed use covers most needs.
Larger teams can cap costs with fixed request plans and high‑speed tiers.
USP — What Makes MiniMax M2.7 Different
MiniMax M2.7 focuses on deep, context‑heavy workflows rather than short, single‑response chats.
Kilo’s analysis shows that the model reads more files and explores more paths in hard coding tasks, which lets it solve problems that other models miss.
Pros and Cons
Pros
- Strong results on SWE‑Pro, VIBE‑Pro, Terminal Bench 2, and other real engineering benchmarks.
- High office productivity performance with 1495 ELO on GDPval‑AA and strong multi‑round document editing.
- Excellent tool and agent behavior, with 97 percent skill adherence in MMClaw and good Toolathon accuracy.
- Near‑top PinchBench and Kilo Bench scores for coding agents, close to premium closed models.
- Long context window around 204,800 tokens for large codebases and long workflows.
- Pay‑as‑you‑go pricing much lower than frontier closed models such as Claude Opus 4.6.
- Fits into Anthropic‑style and OpenAI‑style APIs, which eases integration with many existing tools.
Cons
- Output speed near 49.9 tokens per second, which is slower than the median speed for similar reasoning models.
- Higher token usage per task in some agent runs; Kilo reports around 2.8 million input tokens per trial.
- Many benchmark claims, such as SWE‑Pro and VIBE‑Pro scores, come from MiniMax’s own tests and still need wider independent verification.
- Model weights are not fully open for local offline use; most users depend on cloud APIs or managed platforms.
Quick Comparison Chart
This chart summarizes where MiniMax M2.7 fits best compared to nearby options.
Recommended Use by Task Type
Demo or Real‑World Example
This demo shows how you might use MiniMax M2.7 to fix a real bug in a web service through an API‑driven workflow.
Scenario
You have a backend service with a failing integration test after a change in the payment module. You want an agent that can read the test failure, inspect code, and propose a patch that passes tests.
Step 1: Collect Inputs
- Gather the failing test output, including stack trace and error message.
- Export the main payment service file and any helper modules that touch the payment logic.
- Note database schema details and external API constraints that the code must respect.
Step 2: Design the Prompt
Prepare a structured prompt for M2.7 that includes:
- A short task description: “Find and fix the bug that breaks this payment test.”
- The failing test log.
- The content of the payment service file and any related modules.
M2.7 has been shown to read surrounding files and understand dependencies before writing fixes, which aligns with this task style.
Step 3: Send the Request
Through your chat‑completion client:
- Set the system message to define the role, such as “You are a senior backend engineer who writes safe, tested changes.”
- Put the combined logs and code into the user message, marked with clear sections.
- Limit
max_tokensto a safe value for cost while leaving enough room for explanations and patches.
Step 4: Review the Response
M2.7 should:
- Explain what caused the failure based on the stack trace and code path.
- Point to the exact function or block that needs adjustment.
- Propose a patch with code changes and sometimes new tests.
You then:
- Apply the patch in a feature branch.
- Run tests in your CI or local environment.
- If tests still fail, send new logs back to M2.7 for another iteration.
This loop mirrors the workflows behind benchmarks like SWE‑Pro and Terminal Bench, where the model edits code and uses tests as feedback.
Step 5: Automate with an Agent Harness
For repeatable use:
- Wrap this process in a small agent script that:
- Calls M2.7 with logs and code.
- Applies edits to files.
- Runs tests and collects output.
- Store run history to track which prompts and patches lead to stable fixes.
MiniMax’s own harness work and MMClaw tests show that M2.7 handles multi‑step agent plans with high skill adherence, so this approach maps well to its strengths.
Conclusion
MiniMax M2.7 targets developers and teams that need strong coding agents, tool use, and office automation at a moderate price per token.
Its benchmark scores on SWE‑Pro, PinchBench, and other evaluations show that it sits close to premium closed models in many real tasks.
If your main need is deep analysis across large codebases or complex workflows, M2.7 is a strong candidate to test in your stack.
FAQ
1. Is MiniMax M2.7 open source?
No, MiniMax M2.7 is accessed through cloud APIs, not through fully open weights you can host yourself.
2. Can I use M2.7 for general chat?
Yes, but the model focuses on coding, agents, and productivity tasks, so it shines most in structured workflows rather than small talk.
3. How expensive is M2.7 compared to Claude Opus?
M2.7 costs about 0.30 USD per 1M input tokens and 1.20 USD per 1M output tokens, while Claude Opus 4.6 costs 5.00 and 25.00 USD for the same amounts.
4. Does M2.7 support long documents and big codebases?
Yes, it supports a context window around 204,800 tokens, which fits large repositories and long reports in a single session.
5. Where should I start if I just want to try it?
Use the MiniMax free tier or a playground like AI SDK or Puter.js, set MiniMax-M2.7 as the model, and try a small coding or document task.