Document extraction: four main approaches with a 1000x cost difference
I looked at the four main ways to turn unstructured documents into structured data: full LLM inference, fine-tuned small models, template-based extraction, and cloud OCR services.
The cost difference is huge: template-based extraction costs $0.001 per document, while full LLM inference costs $5 to $15 per document. That's a 1000x+ difference.
Most companies waste money by treating all documents the same. Document classification upfront can cut costs by 85%+ while maintaining flexibility for edge cases.
What I learned
Cloud OCR services (Azure Document Intelligence, AWS Textract, Google Document AI) cost $1.50 per 1,000 pages for basic OCR. They're fully managed, pre-trained on common document types, and great for MVPs.
Recent benchmarks: Gemini 2.0 Pro achieved 100% item extraction accuracy at $0.0045 per invoice, while AWS and Azure cost $0.01 per invoice. Azure's asynchronous processing delivers 87% cost savings—30 pages async costs $0.045 versus $0.30 for synchronous.
The downside is that the cost per page adds up quickly, and Azure's custom extraction models cost $50 for every 1,000 pages.
Fine-tuned small models (7B parameter models like Llama 3.1, Mistral 7B) cost $0.00368 per 1,000 tokens for inference after training.
Real benchmarks: LLaMA-3 8B achieved 76.6% accuracy without any fine-tuning, matching fine-tuned LLaMA-2 70B. After fine-tuning on just 861 samples, LLaMA-2 7B jumped from 47.6% to 61.5% accuracy with 47.78% reduction in hallucinations.
Cost of training: less than $2 for QLoRA on A100 GPUs (46 minutes for Mistral 7B). Inference hosting costs between $288 and $530 per month on cloud GPUs. Breakeven at about 1 million documents per year compared to the costs of the GPT-4 API.
Template-based extraction costs very little per document, but you have to make the templates ahead of time. New tools can get F1 scores of 1.0 with less than a second of latency for known formats.
PyMuPDF got F1 scores between 0.983 and 0.993 in documents from the government, the law, and finance. Camelot was good at making tables with a 0.828 F1 score for complicated government tenders. Processing speed: structured documents take 0.3 to 1.6 seconds, while multimodal LLM approaches take 33.9 seconds—54 times faster.
Azure Document Intelligence requires only 3 training + 3 test documents for template model creation, with the first 10 hours of neural training free.
Full LLM inference (Claude 3.5 Sonnet, GPT-4o, Gemini 2.0 and Gemini 2.5) costs $0.005-0.02 per typical invoice. It handles any format without training, adapts to changes, and can reason about context.
Production benchmarks: Claude and GPT-4o get 92–95% accuracy for line items and 95–98% accuracy for invoice extraction. For Claude, processing takes 200 to 300 milliseconds, and for GPT-4o, it takes 1 to 30 seconds, depending on complexity.
Cost optimization: Prompt caching cuts down on repeated content by 90%. Batch API processing cuts costs by 50% for workloads that aren't urgent. With caching, Claude costs $30 to $90 for 10,000 invoices a month, while GPT-4o costs $50 to $180.
The hybrid strategy
The best way to do this is with a classifier that sorts documents, as shown in the October 2024 Hybrid OCR-LLM Framework study:
- Standard forms (60%) → Table-based extraction (F1=1.0, 0.3s latency)
- Semi-structured (30%) → PaddleOCR + table method (F1=0.997, 0.6s)
- Novel formats (10%) → Multimodal LLM (F1=0.999, 34s)
Real-world impact: Asian Paints cut processing time from 5 minutes to 30 seconds per document (10 times faster), saving 192 person-hours a month and finding $47,000 in vendor overcharges.
The filename classification optimization: Lightweight classifiers achieve 96.7% accuracy at 442x faster speed than full content analysis, processing 80%+ of documents through fast paths before invoking expensive models.
This lowers the blended cost to $1.50 per document, down from $10 for pure LLM. That's an 85% drop in cost while still keeping flexibility.
How to choose
More than 10,000 documents per month: For common types, use models or templates that have been fine-tuned. Mistral 7B trains for 46 minutes for $1.46 on RunPod and gets 85% of GPT-4's accuracy for 8 times less money.
Less than 10,000 docs a month: Cloud OCR services for speed. For custom extractors, Google gives you the first 1,000 documents for free, and then $30 for every 1,000 pages after that.
Accuracy critical: Template extraction with rules. Azure supports up to 500 trained models in composed architectures with incremental training on misclassified documents.
Format highly variable: LLM-based extraction. Claude 3.5 Sonnet handles 100-page PDFs up to 30MB with 200K token context window, eliminating preprocessing.
The winning architecture
Don't pick one approach. Route intelligently:
IF standard_form → Template (F1=1.0, 0.3s, $0.001)
ELIF semi_structured → Fine-tuned 7B (F1=0.997, 0.6s, $0.03)
ELSE → LLM fallback (F1=0.999, 34s, $10)
Blended cost: $1.50/doc vs $10 pure LLM = 85% savings
The main point
Through smart routing, the best AP departments get their cost per invoice down to $2.78, which is much lower than the industry average of $9.40. They cost 78% less and are 82% faster than their competitors.
The market data backs this up: Document extraction will grow from $10.57 billion in 2025 to $66.68 billion by 2032 at a rate of 30.6% per year. This is because companies are using smart routing instead of relying on expensive LLMs for everything.
Tools and Resources
Open-source PDF parsing:
- PyMuPDF - Fastest overall performance
- Camelot - Best for table extraction
- pdfplumber - Granular control and debugging
- Docling - 97.9% accuracy on complex tables
Fine-tuning frameworks:
- LLaMA-Factory - Supports 100+ models
- Unsloth - 2x faster training, 70% less VRAM
Cloud platforms:
- Modal Labs - Serverless ML deployment
- RunPod - GPU cloud for training
- Replicate - Host and run models at scale
RAG frameworks:
- LangChain - 100+ document loaders
- LlamaIndex - 160+ data loaders, cleaner API
Key research papers:
- Comparative Study of PDF Parsing Tools (Oct 2024)
- LLaMA Fine-tuning Impact on Hallucinations (June 2025)
- PDF Data Extraction Benchmark 2025
- Invoice Processing Benchmark Research
Official documentation: