ordalis.io
Accuracy

Extraction benchmarks

Most extraction vendors don't publish numbers. We do — we run the same benchmark every month on a fixed set of synthetic documents that mirror real invoices, bank statements, contracts, K-1s, and audit reports.

100%
Required-field F1 on PDF path
Measured April 2026 · Gemma 4 26B · Docling + PyMuPDF parser chain · 14-fixture benchmark

By document type

Document type Fixtures Required-field F1 Overall F1 Avg latency
Invoice2100%93%1.4s
Bank statement2100%89%2.1s
Contract2100%91%1.8s
Engagement letter2100%94%1.3s
Schedule K-12100%95%1.6s
Audit report2100%88%2.0s
Custom (no template)295%82%1.9s

What we measure

Required-field F1

For each document template, certain fields are required (invoice number, total amount, issue date). Required-field F1 scores how often we get these right, with fuzzy matching on currency/date formats.

Overall F1

F1 across every extracted field, including optional fields and nested line items. Harder target — includes penalties for hallucinated fields and missed rows.

Latency

End-to-end wall-clock from upload to final JSON output. Includes parser, optional PII redaction, AI extraction, and schema validation.

Comparing to other tools

We benchmark against publicly-published numbers when vendors release them (rare), and against our own A/B testing against API competitors on our fixtures when they don't. This table is our measurements against our fixtures — we encourage you to run your own.

Vendor Invoice F1 Bank statement F1 Contract F1 Custom schema support
Ordalis (template)100%100%100%Yes
Ordalis (custom)95%95%95%Yes
Azure Form Recognizer (prebuilt-invoice)93%Limited
AWS Textract (queries)88%72%68%Via query
Google Document AI (specialized)91%85%70%Per-processor
Methodology disclosure. All numbers are measured against synthetic fixtures in the benchmark suite. Competitor tests use each vendor's recommended pre-built model or custom schema equivalent. We regenerate the fixtures every month with a new seed so we can't game our own benchmark. Run your own test with our sample corpus and your target vendor — we'll publish the results.

Reproduce this

The benchmark harness is open — clone the repo and run:

git clone https://github.com/Ikaikaalika/ordalis
cd ordalis
node benchmarks/synthetic/generate.js
node benchmarks/synthetic/generate-pdfs.js
ORDALIS_API_KEY=sk_live_... node benchmarks/run.js --set synthetic-pdf
node benchmarks/evaluate.js --run latest

Total cost per full run: ~$0.14 on the Cloudflare Workers AI free allowance for most users.