Avery.Software — Native Execution Runtime
RuntimeUse casesPricingHelpBlog
← All postsBlog

Healthcare clinical documentation: where AI cost meets HIPAA

2026-05-27 · Avery NXR

Healthcare has been one of the most thoroughly AI-instrumented industries in the past three years, and most of that instrumentation has happened in clinical documentation.

The work covered by "clinical documentation" is broader than the term suggests. AI-generated visit notes from ambient audio captured during patient encounters. EHR record summarization for the next provider in the patient's chain of care. Prior authorization paperwork drafted from the patient's history. Discharge instructions personalized for the patient's specific situation. Coding suggestions for billing. Patient communication drafts. Each of these workflows has gone from a research demo to a deployed product in a remarkably short time.

The bill is real, and the regulatory constraints are real, and the case for moving inference local is — in healthcare specifically — closer to mandatory than to optional.

The math on a real-sized provider organization

A representative integrated healthcare provider — a multi-specialty practice or a regional health system — sees a large volume of patient encounters every day.

A mid-sized practice with three hundred providers, each seeing twenty patients per workday, has six thousand encounters per workday, or about a hundred and fifty thousand encounters per month. Larger systems are multiples of this.

For each encounter, a typical AI workflow does several things: transcribe the audio (if applicable), generate a structured clinical note from the transcript, suggest codes for billing, draft any necessary patient communications, summarize the encounter for the EHR. The downstream AI layer (after transcription) uses about ten thousand input tokens and one thousand output tokens per encounter, at frontier pricing about $0.045.

A hundred and fifty thousand encounters per month at $0.045 each is $6,750 per month, or about $81,000 per year, for one mid-sized practice. A larger health system — five thousand providers, a million encounters per month — is at $540,000 per year.

These numbers exclude the transcription cost itself (separate vendor, similar order of magnitude) and the prior-authorization-paperwork workload, which adds another significant chunk for any practice with meaningful payer interaction.

Why healthcare is structurally a local-SLM case

The standard properties for local-SLM suitability are present, and the privacy and regulatory ones are at the most extreme of any workload we've looked at.

The work is narrow. The model needs to know one practice's specialties, one practice's patient population, one practice's documentation conventions. A model trained on the practice's own clinical corpus will outperform a general medical model on the practice's own work.

The work is repetitive. The same shape of encounter, the same shape of note, the same shape of code suggestion, repeated across hundreds of patient interactions per provider per week.

The volume is high. The cost scales linearly with patient volume, which scales with the practice's growth.

The privacy story is, in healthcare, not a "story" — it is law. HIPAA requires a Business Associate Agreement with any third party that touches Protected Health Information. The available BAAs from cloud LLM providers cover specific models, specific endpoints, and specific use cases. Many clinical documentation workflows do not fit cleanly within the available BAAs. The risk of non-compliance is large; the penalties are explicit.

The latency story is real in the live-documentation case. When ambient audio is being processed in real time during a patient encounter, the AI is producing draft notes the provider may glance at during the visit. Two-second cloud latency is too slow for that interaction; two hundred milliseconds is fast enough.

What changes with local inference

A clinical documentation workflow on a local SLM looks like this.

A model is fine-tuned on the practice's clinical corpus — historical notes, encounter patterns, the specific vocabulary the practice uses, the documentation style each specialty prefers. The fine-tuning is done in a controlled environment that itself meets the HIPAA requirements.

The model runs on infrastructure the practice owns — on-premises servers, or in a HIPAA-compliant private cloud the practice has direct control over. The deployment is documented, audited, and approved.

Patient encounters flow through the audio pipeline, which feeds a local transcription model, which feeds the local clinical-documentation model, which produces structured notes that flow into the EHR. Nothing crosses the security boundary.

For real-time use during patient encounters, the inference happens close enough to the encounter to be useful in flow — typically on a server at the practice, with low-latency network to the provider's device.

The cost flips. The practice pays for the model, the hardware, and the integration. Encounter volume can grow without the bill moving.

The audit trail that compliance wants

A specific benefit in healthcare: the audit trail.

A local model writes a structured log of every decision it made — what it considered, what it produced, what confidence it had. For HIPAA audits, for medical malpractice considerations, and for internal quality review, this audit trail is a useful artifact.

A cloud LLM-based workflow has, at best, a chat log of what was sent and what came back. The audit trail of decisions is opaque. A local-SLM workflow has, by design, the full decision record. The auditor can see what the model did, the lawyer can see what evidence the practice has, the quality team can see where the model's errors cluster.

This isn't a marketing point; it is an operational requirement at most serious healthcare providers.

Where the cloud LLM is still possible

A few cases where cloud-LLM-based healthcare AI is still defensible.

For workflows that touch only de-identified data. Some research-oriented or population-health workflows can be designed to remove PHI before any data leaves the controlled environment, in which case the BAA constraints are less binding.

For workflows in jurisdictions with looser privacy regimes. Healthcare AI deployments outside the US, in countries with different regulatory frameworks, have different constraints. The local-SLM argument may still be the right one for cost reasons, but the privacy compulsion is jurisdiction-dependent.

For pilot deployments and early validation work where the volume doesn't justify the infrastructure investment, and where the practice has accepted the compliance risk of the pilot.

For everything else — the high-volume, PHI-touching, regulator-relevant clinical documentation work that constitutes the bulk of healthcare AI — the local-SLM case is overwhelming.

The pattern, at maximum strength

Avery NXR is a Next.js scaffolding tool. It is not a healthcare tool. The architectural pattern repeats, at its strongest.

Healthcare clinical documentation is the workload where the cost case is real, the privacy case is mandated by law, the regulatory case is intense, and the latency case is meaningful. Every dimension that favors local inference is present and strong.

The healthcare AI vendors that build excellent vertical tools on local infrastructure — with appropriate fine-tuning, HIPAA-compliant deployment, audit-trail evidence — will own this category. The cloud-LLM-default products are operating on borrowed time. The architectural shift is not a question of if, but of when, and which vendors are ready when it happens.

We expect this shift to be relatively rapid in healthcare compared to other categories, because the regulatory clarity is sharper here than in most operational AI domains. HIPAA isn't ambiguous; the BAA constraints are well-understood; the financial and reputational penalties for compliance failures are explicit. The institutions that move to local inference first will be ahead on cost, ahead on privacy, and ahead on regulatory standing simultaneously.