Avery.Software — Native Execution Runtime
RuntimeUse casesPricingHelpBlog
← All postsBlog

Insurance claims processing: where every claim is a regulated AI workflow

2026-05-28 · Avery NXR

Insurance carriers process an enormous volume of claims. A mid-sized property and casualty insurer handles hundreds of thousands of claims per year. A large multiline carrier handles millions. Each claim moves through a pipeline that has been thoroughly transformed by AI in the past three years.

The pipeline now looks roughly like this. A claim comes in. AI triages it by complexity and likely severity. AI extracts the structured information from the various documents the claimant submitted — police reports, medical records, repair estimates, photographs. AI flags potential fraud signals. AI compares the claim against the policy to determine coverage. AI drafts adjuster summaries, settlement offers, and communications to the claimant. AI updates the claims management system with structured data and recommendations.

Every step of that pipeline is, in most current implementations, a cloud LLM call. The bill is real, the regulation is real, and the case for moving inference local is — for insurance specifically — closer to mandatory than to optional.

The math

A representative mid-sized carrier handles, say, two hundred thousand claims per year. Each claim moves through five to ten AI operations across its lifecycle. The token consumption varies by operation, but a reasonable aggregate is around eighty thousand input tokens and four thousand output tokens per claim across the full pipeline.

At frontier pricing, that's roughly $0.30 per claim. Across two hundred thousand claims per year, about $60,000 per year, for one carrier.

The numbers get larger fast. A large multiline carrier handling two million claims per year is at $600,000 per year for the AI layer alone. Reinsurance companies, third-party administrators, and very large primary carriers can be at multiples of that.

These figures exclude the upstream data ingestion, the underlying scoring models, the policy administration systems, and the claims management software. The AI augmentation layer on top is the line item we're examining.

Why insurance is structurally a local-SLM case

Insurance claims processing has every property that favors local inference, with several at the extreme end of the spectrum.

The work is narrow. Each carrier has its own claim types, policy structures, coverage language, and adjustment philosophies. A model trained on the carrier's own claims corpus outperforms a general model on the carrier's own work.

The work is repetitive. The same shape of claim, the same shape of documentation, the same shape of decision, repeated hundreds of thousands of times per year. Specialization compounds.

The volume is enormous and scales with the carrier's growth and the broader insurance market activity.

The privacy posture is structurally restrictive. Insurance is a heavily regulated industry in every major jurisdiction. State insurance regulators in the US, the FCA in the UK, BaFin in Germany, equivalent bodies across markets — all have positions on AI use, model governance, data residency, and explainability. Sending claim documents (which contain PII, medical information, financial information, and sometimes more sensitive material) to a third-party cloud LLM creates a regulatory posture that is harder to defend every year.

The latency story matters in first-notice-of-loss workflows where a customer is on the phone reporting a claim and the AI is producing real-time guidance for the agent.

The audit trail matters acutely. Every claim decision is potentially auditable by regulators, by reinsurers, by external auditors, and in extreme cases by litigation. The audit trail an AI-augmented workflow produces is now part of the evidence record. A black-box cloud-LLM workflow produces an inferior audit trail compared to a local model that writes structured decisions alongside its work.

What changes with local inference

A claims workflow on a local SLM looks like this.

A model is fine-tuned on the carrier's claims corpus — historical claims, policy language, adjustment patterns, fraud cases, settlement histories. The fine-tuning is done in a compliance-controlled environment.

The model runs on infrastructure the carrier controls — on-premises, in a regulated private cloud, or in a sovereign cloud that meets state insurance regulator requirements. The deployment is documented, audited, and approved.

Claims flow through the pipeline. The model produces structured decisions, recommendations, and drafts at each step. The audit trail is local, versioned, and reviewable.

The cost flips from per-claim to fixed. The carrier pays for the model, the hardware, the integrations. Claim volume can grow without the bill spiking.

The regulator's questions become easier to answer. "Where does the data live? How is the model governed? What's the audit trail?" — all of these have local-friendly answers when the architecture is local.

What the regulator wants to see

For insurance specifically, the regulatory pressure on AI use is intensifying. The NAIC has issued model guidance on AI governance. Several states have passed or are considering legislation. The EU AI Act covers insurance underwriting and claims decisions as a high-risk category.

The questions regulators ask in AI audits map cleanly onto the local-vs-cloud architectural distinction. Where is the inference happening? How can the carrier demonstrate model governance? Where are the training data, the inference logs, the audit trail? A carrier running on cloud LLMs is structurally weaker on every one of these questions than a carrier running on local infrastructure.

The carriers that move to local inference early will have an easier time with regulators in the next several years. The carriers that stay on cloud will be answering harder questions in every audit.

Where the cloud LLM is still possible

A few cases.

For workflows operating only on de-identified data, the BAA-equivalent constraints are less binding. Some research-and-analytics use cases can be designed this way.

For jurisdictions with looser regulatory frameworks, the regulatory pressure is jurisdiction-dependent. Cost may still favor local, but the privacy compulsion varies.

For pilot deployments and early validation where the volume doesn't justify the infrastructure investment and the carrier has explicitly accepted the compliance risk of the pilot.

For high-volume, regulator-relevant claims processing at any serious carrier, the case for local is overwhelming.

The pattern, in insurance

Avery NXR is not an insurance tool. It scaffolds Next.js applications. The architectural pattern repeats.

Insurance claims processing is a narrow, repetitive, extreme-volume, extreme-privacy, regulator-relevant, audit-critical workload. The economics, the privacy story, and the regulatory pressure all point toward local inference as the right architecture. The latency story is real in real-time workflows. The audit trail story is real for every workflow.

The InsurTech vendors that build excellent claims AI on local infrastructure — with appropriate fine-tuning, regulator-friendly deployment models, and evidence packages for state insurance department audits — will own the institutional segment of this market. The cloud-LLM-default products will hold a portion of the market until the regulatory pressure forces the architectural conversation.

We expect this shift to happen relatively quickly in insurance, because the regulatory pressure is intense, the audit trail story is concrete, and the cost scale is large. Carriers that move first will be ahead on regulatory standing, cost, and operational quality simultaneously.