Document processing: where a cloud LLM bill quietly becomes the largest line item
· Avery NXR
Every operations team in the world is processing documents.
A finance team processes invoices, expense reports, purchase orders, vendor statements. A legal team processes contracts, NDAs, redlines, amendments. An insurance team processes claims, declarations, supporting documentation. A logistics team processes bills of lading, customs forms, delivery receipts. The list goes on for every operational function in every industry.
For most of recorded history, this work has been done by humans, with software helping but not deciding. In the past three years, a quiet shift has happened. A large share of this work is now being done by AI — specifically, by a cloud LLM with a vision modality, called from a workflow tool, processing each document one at a time.
This works. It also produces a bill that, in many of the teams we have talked to, is now the largest single line item in their AI tooling budget.
The shape of the cost
Here is a worked example. A mid-sized finance team processes five thousand invoices per month. Each invoice is one to three pages. Each one needs to be read, understood, classified, and have its key fields extracted into a structured format the ERP system can consume.
A typical cloud LLM call to do this work uses about six thousand input tokens (the document image, encoded as tokens) and produces about five hundred output tokens (the structured JSON). At current frontier pricing — roughly $3 per million input tokens, $15 per million output tokens — each invoice costs about $0.026 to process.
Five thousand invoices per month: about $130 per month. That is not a number that gets anyone's attention.
But finance teams do not process five thousand invoices per month forever. They scale. When the same team is processing fifty thousand invoices per month — a level many mid-market companies reach — the bill is $1,300 per month, or about $15,600 per year, for one workflow at one company. Add in expense reports, purchase orders, and vendor statements, and the number is two to three times that.
We have talked to teams where the document-processing bill is north of $50,000 per year. We have talked to teams where it is the largest software cost in the department. The line item creeps in slowly because it grows with volume, and most teams do not notice it has compounded until the year-end review.
Why this is a perfect local-SLM workflow
Document processing has nearly every property that makes a workflow well-suited to a specialized local model.
It is narrow. The model needs to know one thing — how to read documents of a specific shape — and not anything else. A model trained on invoice patterns specifically will outperform a general-purpose model on invoice extraction, every time.
It is repetitive. The same shape of document, the same shape of output, the same shape of decision, repeated thousands of times per month. This is exactly the kind of repetition a small, specialized model thrives on.
It is high-volume. The economic argument scales linearly with the number of documents. Every document processed locally is a document not on the cloud bill. The break-even point on a one-time license is, for most teams, measured in weeks.
It is privacy-sensitive. Invoices contain vendor identities, line items, payment terms. Contracts contain customer names, deal sizes, special clauses. Many compliance regimes — finance, healthcare, defense — are uncomfortable sending these documents to a third-party cloud LLM. A local model removes the question entirely; the document does not leave the machine.
It is latency-tolerant, but not insensitive. Most document-processing workflows are batch, so two seconds versus two hundred milliseconds does not matter much per document. But across fifty thousand documents per month, the total wall-clock time of the batch does matter; a faster per-document time means the day's pipeline completes earlier.
What this looks like in practice
A team operating in this workflow with a local SLM has a setup that looks like this.
The model is fine-tuned for the specific document type the team processes — invoices in their format, contracts in their language, claims in their structure. The model lives on a server the team owns, or, in some configurations, on the operator's local machine.
The pipeline ingests documents, runs them through the model, validates the output, and writes the structured data to the downstream system. The flow is identical to a cloud-based pipeline — just with the inference step running on the team's hardware instead of in someone else's data center.
The economics flip. The team pays for the model — a one-time or annual license — and the hardware to run it. They do not pay per document. The cost is bounded; it does not grow with volume.
The privacy story improves. The documents do not cross any organizational boundary. Audit logs of the inference are local; they can be inspected, retained, and presented to compliance as needed.
Why most teams haven't done this yet
The economic argument is clear. The privacy argument is clear. Yet most teams are still running their document processing through a cloud LLM. Why?
The honest answer is that the tooling is not there yet. Building a local-SLM pipeline for document processing requires either (a) buying a vertical product from a vendor that has trained the right model and packaged it well, or (b) doing significant in-house ML work to train and deploy your own.
The first option is mostly absent from the market today. There are a handful of vertical tools for specific document types — receipt parsers, invoice extractors — but the broader category is unaddressed. The second option requires a level of ML investment most operations teams do not have.
This gap is what we expect to close in the next twenty-four months. The technology to train specialized models is maturing. The hardware to run them affordably is widely available. The business model for selling them — flat-rate, perpetual, like Avery NXR — is becoming familiar.
The Avery Software connection
Avery NXR is not a document-processing tool. It scaffolds Next.js applications. We mention it here because the architectural pattern is the same.
In both cases — code scaffolding and document processing — the work is narrow, repetitive, high-volume, privacy-sensitive, and well-suited to a specialized model rather than a generalist. The economics that make Avery NXR a better-than-cloud choice for Next.js scaffolding are the same economics that would make a local document-processing tool a better-than-cloud choice for invoice extraction.
We are not announcing a document-processing product. We are observing that the pattern generalizes, and that document processing is one of the workflows where the generalization will eventually arrive. Whether it arrives from us or from another team building in this space, the underlying logic — narrow, local, flat-rate, auditable — is going to be the right shape.
The cloud LLM bill is going to keep growing. The local-SLM alternatives are going to keep maturing. At some point in the next few years, every operations team that processes documents at volume is going to do the math and switch.
That is the change we are watching for.