Meeting transcription and summarization: the workload every company is over-spending on

2026-05-26 · Avery NXR

Three years ago, AI in meetings was a curiosity — a Chrome extension that someone on the team had installed, producing transcripts of variable quality that mostly nobody read.

Today, AI in meetings is a default. Every internal call, every customer call, every interview is being recorded, transcribed, summarized into action items, and pushed to a CRM or a project management tool or a Slack channel. For knowledge workers, this is one of the most visible AI integrations in their daily work.

It is also one of the workloads where the cloud LLM bill is large, growing, and overlooked.

The economics of "transcribe everything"

A typical knowledge worker is in three to six meetings per workday. If every meeting is transcribed and summarized, that is three to six AI workflows per worker per day.

The transcription part is relatively cheap; speech-to-text has been a solved problem for a while and the per-minute cost is low. The expensive part is what happens after transcription — summarization, action-item extraction, sentiment analysis, key quote identification, follow-up email drafting.

For a company with five hundred knowledge workers, the daily volume is twenty-five hundred to three thousand meeting-AI runs. A typical meeting transcript is fifteen hundred to four thousand words, depending on length. After transcription, summarization passes through a cloud LLM with about ten thousand input tokens (the transcript plus context) and five hundred output tokens (the summary, action items, and quote highlights). At frontier pricing, that is about $0.038 per meeting.

Three thousand meetings per workday at $0.038 each is about $114 per day, or $22,800 per year for this company. For a larger company — five thousand knowledge workers — the bill is closer to $228,000 per year. We have talked to companies where the meeting-AI line item is among the top five SaaS expenses.

These numbers exclude the transcription cost itself, which is in the same order of magnitude. Total meeting-AI cost at a midmarket company is often north of $50,000 per year; at a large company, north of $500,000 per year.

Why this is a near-perfect local-SLM workload

Meeting summarization has the full set of properties that make a workflow well-suited to local inference.

It is narrow. The model needs to know one thing — how to read a transcript of a business conversation and produce a structured summary. A model trained on the company's own meeting transcripts will outperform a general-purpose model on the company's own meetings, every time. The specific vocabulary, the specific projects, the specific people who tend to be in calls together — all of it becomes context the model can use.

It is repetitive. The same shape of input, the same shape of output, the same shape of decision, repeated thousands of times per day. Specialization compounds.

It is high-volume. The cost scales linearly with meeting volume, which scales linearly with company size. As long as the company keeps growing, the cloud-LLM bill keeps growing.

It is privacy-sensitive in a more acute way than most workloads in this series. Meetings contain everything. Internal strategy. Personnel discussions. Customer information. Pricing negotiations. Performance reviews. Compensation conversations. Salary discussions. Mergers. Layoffs.

Companies have wildly varying tolerance for sending this content to third-party cloud LLMs. Some are comfortable because the contracts say the right things. Some are uncomfortable but accept the risk because the productivity gain is too large to walk away from. Some refuse on principle.

A local-inference architecture removes the risk entirely. The meeting content does not leave the company's systems. The model produces the summary locally; the summary lives in the systems the company already owns.

It is latency-tolerant in the batch case (transcribe and summarize after the meeting ends) but latency-sensitive in the live case (real-time transcription with live action-item suggestions). Both modes are well-served by a local model; the cloud model struggles with the live case because of the round-trip latency.

What the architecture looks like

A meeting-AI workflow on a local SLM has a configuration that looks like this.

A small specialized model — fine-tuned on the company's own meeting transcripts, where available, or on a curated corpus of business conversations as a starting point — runs on infrastructure the company controls. For a smaller company, that can be a single mid-grade GPU server. For larger deployments, it scales to a small cluster.

The inference happens on the company's side of the security boundary. Transcripts come in from the audio pipeline, get summarized by the local model, and the summaries go out to the downstream tools (Slack, CRM, project management). Nothing crosses the boundary.

In the live case — meetings where the model is producing real-time suggestions during the call — the model can run on the workstation of the person running the meeting, for the fastest possible response. The latency drops from 1-2 seconds to under 200ms, which is the difference between "the AI is keeping up with the conversation" and "the AI is one beat behind."

What gets better, beyond cost

The cost savings are the obvious story. The under-told story is that a model trained on the company's meetings is a meaningfully better tool.

A general-purpose cloud LLM, summarizing a meeting, produces a competent summary. But it doesn't know which people in the meeting are senior, which initiatives are strategic priorities, which acronyms are internal jargon for what. It writes summaries that sound like a generic AI summarized a generic meeting.

A model trained on the company's transcripts knows the org chart, the product names, the strategic priorities. The summaries are richer, the action items are better-attributed, the quote highlights are more relevant. The summaries sound like they came from someone who was in the meeting.

Across hundreds of meetings per day at a real company, the quality difference is enormous. People actually read the summaries. The action items get followed up on. The CRM stays current because the summaries that flow into it are actually useful.

What we won't pretend

We are not going to pretend that the local-SLM case for meeting AI is uncomplicated.

The fine-tuning is non-trivial. The training data — historical meeting transcripts — exists at most companies but is typically scattered, inconsistent, and full of PII that has to be handled carefully before it can be used for training. Getting a good local model from a company's own meeting history is real work.

The deployment is non-trivial. Running inference on a small cluster requires engineering attention. The companies that do this well today have invested significantly in their infrastructure.

The local model gives up some breadth. For unusual meeting types — a one-off training session, a board meeting with unfamiliar participants, a brainstorm in a domain the model wasn't trained on — a frontier cloud LLM may produce a better summary. A well-designed pipeline routes the unusual cases to the cloud and the routine cases to the local model.

For most companies in most weeks, the local-SLM architecture is the right choice — but it is not the easier choice.

The pattern, once more

Avery NXR is not a meeting AI tool. It scaffolds Next.js applications. The architectural pattern that makes Avery NXR work for code scaffolding is the same one that makes a local SLM the right tool for meeting summarization.

Meetings are where a lot of confidential, high-volume, repetitive AI work is happening today. They are also where the local-SLM case is, in some sense, most urgent — because the privacy stakes are the highest. Customer information, internal strategy, personnel decisions: these are the things companies most want to keep inside their own systems.

The companies that build excellent meeting AI on local infrastructure — with sensible fine-tuning, sensible deployment, sensible business models — are going to win this category. The cloud-LLM-first incumbents will hold the market for another year or two, until the operational and privacy math gets too obvious to ignore.

We will be watching. We may, eventually, build in this space. For now, we are pointing at it and noting that the pattern continues to repeat.