Run Schematron-3B Guide (2026): Local Web Scraping AI
Learn how to install, run, benchmark, and compare Schematron-3B, a 3B local AI model for HTML‑to‑JSON web scraping. Includes setup steps, code demo, benchmarks, pricing, and competitor comparison.
Schematron‑3B is a 3‑billion‑parameter language model specialized for turning messy HTML pages into clean, strictly schema‑valid JSON.
Instead of trying to chat, translate, code, and scrape all at once (like general LLMs), it focuses on one thing: reliable web data extraction.
Key ideas:
- Schema‑first extraction: you provide a JSON Schema (the structure of the data you want), plus HTML. The model outputs JSON that always matches the schema.
- Long context (up to 128K tokens): it can handle very long, noisy pages without truncation, such as product lists, full article pages, or documentation.
- Small but specialized: at 3B parameters, it runs locally on ordinary hardware but reaches extraction quality close to much larger models.
Benchmarks show:
- Schematron‑3B scores 4.41/5 on structured extraction quality. The 8B version scores 4.64/5, only 0.1 behind GPT‑4.1 (4.74) on the same task.
- When used as an HTML‑to‑JSON extractor in a web‑augmented QA pipeline, accuracy jumps from 8.54% (using only GPT‑5 Nano) to 82.87% when Schematron handles structured extraction.
This makes Schematron‑3B a strong fit if the main goal is:
- Scraping e‑commerce sites for prices and product details
- Aggregating articles, listings, or reviews into a database
- Feeding clean structured data into downstream LLMs or analytics systems
2. Core Features of Schematron‑3B
2.1 Schema‑First, 100% JSON‑Conformant Outputs
The model is trained to always obey a given JSON Schema. You describe the fields and types you want (strings, numbers, arrays, nested objects), then feed that schema plus HTML.
Benefits:
- No extra “cleanup” step to remove explanation text or hallucinated fields
- Safer to plug directly into downstream systems that expect strict JSON
- Easier validation: standard JSON Schema validators can be used to check outputs
According to the model card, the output is strict JSON that conforms to your schema, with no conversational fluff. This is reinforced in community demos which show the model returning just the data fields requested.
2.2 Long‑Context HTML Handling (Up to 128K Tokens)
Schematron models support context windows up to 128K tokens.
This matters because:
- Real‑world HTML often includes huge navigation menus, tracking scripts, and long product lists.
- General LLM APIs can struggle or become expensive when fed raw 100K+ token HTML.
The model is trained with curriculum strategies specifically to remain accurate at long contexts, and benchmarks confirm that it maintains quality even at those lengths.
2.3 Cost‑Performance “King” for Web Scraping
The 3B variant is described as the “cost‑performance king” in the Schematron family: it delivers nearly the same extraction quality as 8B, at about half the inference cost.
From internal benchmarks and public discussion:
- Schematron‑3B: recommended default for most scraping and ingestion work
- Schematron‑8B: marginally higher quality for edge cases, but ~2× cost per request
A Reddit engineering report shows that processing 1 million pages per day with a frontier model (GPT‑5) would cost roughly 20,000 USD, while using Schematron‑8B brings that down to about 480 USD, and Schematron‑3B to around 240 USD for the same workload. That is roughly 40–80× cheaper than frontier APIs for this specific task.
2.4 Local‑Friendly Deployment
Schematron‑3B is available as:
- An open‑source model on Hugging Face (
inference-net/Schematron-3B) - A backend for MCP (Model Context Protocol) servers using MLX on Apple Silicon
- A model that can run in popular local LLM GUIs such as LM Studio (demonstrated for the 8B version at around 8 GB RAM on a Mac).
General VRAM guidance suggests that 3–4B models run comfortably on entry‑level GPUs (3–4 GB VRAM) at moderate context windows, with CPU‑only setups also possible at lower speed. Given the 8B variant is reported to run locally on a Mac with about 8 GB of RAM, the 3B variant is even more accessible for local setups.
3. Quick Comparison Chart
A short, high‑level comparison of Schematron‑3B vs other options for HTML‑to‑JSON extraction:
Note: API models can also scrape HTML, but they are not schema‑first, and cost can be much higher at scraping scale.
4. How to Install and Run Schematron‑3B Locally
This section focuses on a practical, step‑by‑step setup for local use.
4.1 Option 1 – Quick Test via Hugging Face / Hosted API
If you only want to test the model before installing it:
- The model is hosted as
inference-net/Schematron-3Bon Hugging Face. - Some third‑party sites list it with “free API” or quick‑start links for testing.
- The creators also offer a serverless API with $10 free credits, so you can try it without infrastructure.
This is useful to prototype schemas and prompts before committing to a full local deployment.
4.2 Option 2 – Python + Hugging Face Transformers (Local)
Prerequisites:
- Python 3.10+
torchinstalled (with CUDA or Metal/mps if you want GPU acceleration)transformersand optional quantization libraries
1. Create a virtual environment
bashpython -m venv schematron-envsource schematron-env/bin/activate # On Windows: schematron-env\Scripts\activate
2. Install dependencies
bashpip install "torch" "transformers" "accelerate" "sentencepiece" "lxml"
# Optional for 4-bit quantization:
pip install "bitsandbytes"
3. Download the model
Using transformers in code automatically pulls from Hugging Face:
pythonfrom transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "inference-net/Schematron-3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
The model card confirms it accepts HTML plus JSON Schema and outputs strict JSON.
5. Demo: Extracting Product Data from HTML
Below is a simplified example inspired by the official demos and video walkthroughs.
5.1 Define Your JSON Schema
The schema tells the model exactly what fields to extract.
pythonimport jsonproduct_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"currency": {"type": "string"},
"in_stock": {"type": "boolean"}
},
"required": ["name", "price"]
}
5.2 Clean the HTML (Optional but Recommended)
The YouTube demo uses lxml to remove scripts and styles before sending HTML to the model.
pythonfrom lxml import html, etreedef clean_html(raw_html: str) -> str:
doc = html.fromstring(raw_html)
etree.strip_elements(doc, "script", "style", "noscript", with_tail=False)
return etree.tostring(doc, encoding="unicode")
5.3 Build the Prompt and Run Inference
pythondef build_prompt(schema: dict, html_text: str) -> str:f"""
return
You are an HTML-to-JSON extraction model.
- Input: HTML of a web page.
- Goal: Return ONLY valid JSON that strictly conforms to this JSON Schema:{json.dumps(schema, indent=2)}
HTML:
<document>{html_text}
</document>
Return only JSON. No explanation.""".strip() pipeline
from transformers importgenerator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.0, # Deterministic output, as recommended in demos[cite:6]
)
raw_html = open("sample_product_page.html").read()
cleaned_html = clean_html(raw_html)
prompt = build_prompt(product_schema, cleaned_html)
output = generator(prompt)[0]["generated_text"]
# Extract the JSON segment (if needed)
start = output.find("{")
end = output.rfind("}")
json_str = output[start:end+1]
product = json.loads(json_str)
print(product)
In real demos, this approach extracts field values like product name and price from arbitrary product pages and returns valid JSON that exactly matches the schema.
6. Installing Schematron‑3B via MCP
If you use an AI IDE or agent framework that supports MCP (Model Context Protocol), a dedicated Schematron MCP server exists.
According to the MCP project:
- The server runs Schematron‑3B locally using MLX (Apple Silicon‑optimized inference).
- It exposes a tool for “HTML‑to‑JSON extraction” so agents can call it with HTML + schema and get JSON back.
- The model path can be configured via
SCHEMATRON_MODEL_PATHenvironment variable for custom installations.
This is ideal if:
- You want your main LLM (e.g., GPT‑4.1) to call Schematron as a tool for scraping.
- You prefer a GUI or agent interface instead of writing raw Python scripts.
7. Hardware Requirements and Performance
7.1 RAM and VRAM Considerations
Key data points from public sources:
- 8B Schematron has been demonstrated running locally on Mac via LM Studio using around 8 GB of RAM, which is in the same range as a heavy browser session.
- General VRAM guidance shows that 3–4B parameter models work on entry‑level GPUs with 3–4 GB VRAM at moderate context windows (e.g., 4K tokens), while longer contexts demand more memory.
Implications for Schematron‑3B:
- For small to medium pages (few thousand tokens), a mid‑range GPU or even CPU‑only setup is realistic.
- For very long pages or heavy batch jobs at 128K context, a stronger GPU or more system RAM will help.
- Quantized variants (e.g., 4‑bit) can reduce memory significantly while keeping most of the accuracy, similar to other 3B models in the ecosystem.
7.2 Speed vs. Frontier Models
The Reddit benchmark describes:
- Average 0.54 seconds per page for Schematron models vs about 6 seconds per page for GPT‑5 on the same workload, roughly 10× faster per page.
- In addition, because Schematron outputs compact JSON instead of long raw HTML, downstream LLMs process far fewer tokens, saving more time and money.
8. Benchmarks and How to Test Schematron‑3B Yourself
8.1 Published Benchmarks
From model card and blog analysis:
- Extraction quality (0–5 rating, LLM‑as‑judge):
- Schematron‑3B: 4.41
- Schematron‑8B: 4.64
- GPT‑4.1: 4.74 (reference frontier model)
Web‑augmented QA (SimpleQA pipeline):
| Setup | Accuracy |
|---|---|
| GPT‑5 Nano alone (no Schematron) | 8.54% |
| GPT‑5 Nano + Schematron‑8B extraction | 82.87% |
| GPT‑5 Nano + Schematron + SERP provider | 64.2% |
| GPT‑5 Nano + Schematron + Exa provider | 82.9% |
| Gemini 2.5 Flash baseline | 80.61% |
| GPT‑4.1 + Schematron‑8B | 85.58% |
This shows that:
- Structured JSON extraction is the key factor in accuracy (jump from 8.54% to 82.87%).
- Specialized extraction models like Schematron can outperform much larger general LLMs (e.g., Schematron‑8B vs Gemini 2.5 Flash on this task).
8.2 How to Create Your Own Benchmark
To test Schematron‑3B in your environment:
- Build a small dataset
- 50–200 HTML pages from real target websites (e‑commerce, blogs, job boards).
- For each page, manually label ground‑truth JSON with the fields you care about.
- Define JSON Schemas
- One schema per page type (e.g.,
Product,Article,JobListing). - Use simple types first: strings, numbers, booleans, arrays.
- One schema per page type (e.g.,
- Write a test harness
- For each page:
- Clean HTML → build prompt → call Schematron‑3B → parse JSON.
- Validate outputs using a JSON Schema validator.
- For each page:
- Track metrics
- Schema adherence rate: % of outputs that pass JSON Schema validation (expect near 100% with proper prompting).
- Field‑level accuracy: precision/recall or simple match rate for each field.
- Latency: time per page, with and without batching.
- Cost: if using API, track tokens or request cost.
- Compare to alternatives
- Run the same benchmark using a general LLM like GPT‑4.1, prompting it to “output JSON only”.
- Count how often it breaks schema, adds extra text, or misses fields.
This method gives a realistic view of how Schematron‑3B performs on your HTML and schema design.
9. How Schematron‑3B Differs from Competitors
9.1 Versus General LLMs (GPT‑4.x, GPT‑5, Gemini, etc.)
General LLMs:
- Can parse HTML but are not specialized for it.
- Prone to adding explanations, comments, or partial JSON that breaks downstream tools.
- Have higher per‑token costs, especially at long context windows.
Schematron‑3B:
- Trained explicitly on HTML‑to‑JSON with schema adherence as a hard constraint.
- Optimized prompts and training ensure only JSON output, with no narrative text.
- Much cheaper and faster at scale, up to 40–80× lower cost for large scraping workloads.
9.2 Versus Traditional Scraping Tools (BeautifulSoup, XPath, Regex)
Traditional tools like BeautifulSoup or XPath:
- Require brittle, site‑specific selectors. Small HTML changes can break them.
- Need manual maintenance for each site and layout.
- Struggle with semantic understanding (e.g., guessing which span is “current price”).
Schematron‑3B:
- Uses language understanding to adapt to different HTML structures and naming patterns.
- You describe what you want (via schema), not how to locate it in DOM nodes.
- Handles messy or inconsistent markup across multiple sites more robustly.
Traditional tools remain useful for pre‑cleaning or handling highly structured sites, but Schematron is more resilient for messy, modern web pages.
9.3 Unique Selling Points (USP) of Schematron‑3B
- Schema‑First, Reliable JSON
Guarantees schema‑conformant JSON, which is rare among LLM‑based scrapers. - Long‑Context HTML Support
Robust handling of up to 128K tokens allows entire pages or multi‑page content to be processed in one go. - Cost‑Performance Leadership
Delivers near‑frontier quality at a fraction of the cost and resource usage, especially at scraping scale. - Local‑First, Privacy‑Friendly
Runs fully on local hardware (CPU, GPU, or Apple Silicon/MLX), keeping raw HTML and extracted data inside your infrastructure. - Ecosystem Integration
Available on Hugging Face, used via MCP, and integrated into multiple toolchains for web‑augmented LLM systems.
10. Pricing Overview
Schematron‑3B itself is distributed as an open‑source model; the “price” depends on how you use it:
10.1 Self‑Hosted / Local
- Model cost: free to download (standard open‑source distribution on Hugging Face).
- Infrastructure cost: your hardware, electricity, and any cloud VMs.
- Best for: high‑volume scraping where API costs would be large, and for strict data privacy.
Given benchmark data, a self‑hosted Schematron deployment can reduce scraping costs dramatically compared with calling frontier APIs repeatedly.
10.2 Serverless API from the Creators
The creators offer:
- A serverless API with $10 free credits to start.
- A pricing structure that, according to their benchmarks, allows processing 1M pages daily for around 240–480 USD with Schematron models, compared to ~20,000 USD with GPT‑5 on the same workload.
This keeps operational overhead low while still taking advantage of the specialized model.
11. Practical Testing Scenarios
To get a realistic feel for Schematron‑3B, consider testing it with these practical scenarios:
11.1 E‑Commerce Price Monitoring
- Goal: Extract product name, brand, price, currency, availability, rating, and URL.
- Steps:
- Crawl product pages from several competitor sites.
- Write a generic
ProductJSON Schema that works across sites. - Run Schematron‑3B extraction and compare to manual ground‑truth.
- Track schema adherence, price accuracy, and extraction latency.
11.2 News or Blog Aggregation
- Goal: Turn arbitrary article pages into structured records: title, author, date, tags, summary.
- Steps:
- Use a long context window for pages with heavy navigation and comments.
- Provide schema including a
summaryfield to compress main ideas. - Compare outputs to a general LLM’s extraction for accuracy and JSON cleanliness.
11.3 Knowledge Base Ingestion
- Goal: Build a structured knowledge base from documentation pages.
- Steps:
- Design a schema for
Section,CodeExample,FAQItem, etc. - Feed full documentation pages to Schematron‑3B.
- Use the resulting JSON as input for search, RAG, or analytics.
- Design a schema for
Each scenario helps measure not only extraction quality but also long‑term maintainability. Because the model understands semantics, changes in HTML layout often require zero or minimal maintenance compared to XPath rules.
12. Best Practices for Using Schematron‑3B
- Invest time in schema design
Clear and minimal schemas usually give the most stable results. Avoid unnecessary nested fields at the beginning. - Keep temperature low (0–0.2)
Official demos set temperature to zero to ensure deterministic and repeatable JSON outputs. - Always validate outputs
Run JSON Schema validation as a post‑step. This both catches errors and provides feedback for prompt/schema tweaks. - Pre‑clean HTML
Remove scripts, styles, and irrelevant elements usinglxmlor another parser. Benchmarks and demos show this makes extraction more reliable. - Monitor for drift
Websites change. Periodically re‑sample pages and re‑evaluate accuracy so you can adjust schemas or prompts if needed.
13. When to Choose Schematron‑3B vs Schematron‑8B
Choose Schematron‑3B when:
- Running on modest hardware or local machines.
- Doing standard scraping: products, articles, listings, with relatively predictable structure.
- Optimizing for cost and throughput.
Consider Schematron‑8B when:
- You see frequent edge‑case failures with 3B on complex or very noisy pages.
- You can afford slightly higher compute costs for a small quality lift.
- You need the strongest possible extraction in a mission‑critical pipeline.
The model authors themselves recommend Schematron‑3B as the default and only switching to 8B for special cases.
FAQ
Q1. Do I need a powerful GPU to run Schematron‑3B locally?
Not necessarily. Community guidance shows 3–4B models running on entry‑level GPUs and even CPUs at smaller context windows, though a GPU improves speed.
Q2. Can Schematron‑3B scrape any website out of the box?
It can parse any HTML, but you must define a JSON Schema and prompt for your use case. Different sites may need slightly different schemas.
Q3. How is Schematron‑3B better than using GPT‑4 or GPT‑5 for scraping?
It is specialized for HTML‑to‑JSON, produces schema‑conformant JSON, and can be 40–80× cheaper at large scale than frontier APIs while keeping similar extraction quality.
Q4. Is the model safe to use with sensitive data?
When run locally or on your own servers, raw HTML and JSON never leave your infrastructure, which is better for privacy than remote APIs.
Q5. Can I combine Schematron‑3B with other LLMs?
Yes. A common pattern is: search → HTML pages → Schematron‑3B → structured JSON → a general LLM like GPT‑4.1 for reasoning or answer synthesis, greatly boosting accuracy.
15. Summary
Schematron‑3B is a specialized local AI model designed to convert messy, real‑world HTML into clean, schema‑valid JSON. It combines:
- Schema‑first training with 100% JSON conformity goals
- Long‑context resilience up to 128K tokens
- Cost and speed advantages over frontier API models, especially at scale
- Flexibility to run locally via Python, MCP servers, or GUI tools, including on consumer‑grade hardware.
For teams serious about web scraping, ingestion, or building web‑grounded AI agents, Schematron‑3B provides a modern alternative to fragile XPaths and expensive general LLM calls. By carefully designing schemas, validating outputs, and benchmarking against your real pages, it is possible to build a robust, cost‑effective, and privacy‑friendly web data pipeline.