Email processing: when your inbox becomes a recurring AI invoice

2026-05-26 · Avery NXR

Email is the operational lifeblood of most companies. It is also, increasingly, where a quiet share of the company's AI budget is being spent.

The list of things AI is now doing inside email systems is long. Classifying inbound messages by intent. Routing them to the right department or queue. Summarizing long threads into a few bullet points for a busy executive. Drafting first-pass replies for support, sales, and HR teams. Extracting structured data from emails — order confirmations, shipping notifications, calendar invites — and writing it back to the systems that care. Detecting and flagging suspicious or off-policy messages.

Almost every one of these workflows is, in the current default implementation, calling a cloud LLM for each email that flows through the system. The bill is real, and it is growing in lockstep with email volume — which means it never stops growing.

The math, on a real-sized company

A mid-sized company — let's say five hundred employees — generates a remarkable amount of email. Internal messages, customer correspondence, vendor coordination, automated alerts. A conservative estimate of the email volume crossing some kind of AI-assisted workflow at such a company is one million messages per month. Larger companies are several times that.

Of those million emails, perhaps 60 percent are classified or routed by a model, 20 percent are summarized in some form, and 5 percent get a model-drafted first-pass reply. The token math for each operation looks roughly like this.

A classification or routing call uses about 1,500 input tokens (the email body and metadata) and produces a few dozen output tokens. At frontier pricing, that is about $0.005 per email. Across 600,000 messages, $3,000 per month.

A summarization call uses about 3,000 input tokens (a longer thread) and produces 200 output tokens. Roughly $0.012 per email. Across 200,000 messages, $2,400 per month.

A reply-drafting call uses about 4,000 input tokens (the thread plus context) and produces 400 output tokens. Roughly $0.018 per email. Across 50,000 messages, $900 per month.

Total for this one company: about $6,300 per month, or $75,600 per year — just for the AI layer in the email system. And those numbers grow as the company grows, as email volume grows, and as the team adds new AI-assisted workflows.

Why email is a near-perfect local-SLM workload

Almost every property that makes a workflow well-suited to a specialized local model is present in email processing.

It is narrow. A model that knows your company's email patterns — the products you sell, the vendors you work with, the internal jargon your team uses — will outperform a general-purpose model on every operation, every time. The narrowness is bought at the cost of breadth, which is a cost email processing does not pay.

It is repetitive. The same shape of message, the same shape of decision, repeated millions of times per month. Specialization compounds.

It is high-volume. The bill scales linearly with volume, which means the local-vs-cloud math gets more favorable as the company grows. The teams that have the strongest case for switching are exactly the teams that have the most to gain — the largest senders.

It is privacy-sensitive. Email contains everything. Personal information, financial data, customer correspondence, internal strategy. Sending every email through a third-party cloud LLM is a posture some companies are uncomfortable with even when the contracts say the right things. A local model removes the question.

It is latency-sensitive in the interactive cases. When a sales rep is in their inbox looking at a draft reply, a 200ms suggestion feels instant; a 2-second suggestion feels slow. When a CS agent is triaging a queue, the same applies. The batch operations are latency-tolerant, but a meaningful share of email operations are interactive.

What the architecture looks like

A team running email processing on a local SLM has a setup that looks like this.

The model is fine-tuned on the company's own email corpus — historical messages, the structure of the company's products and services, the patterns of inbound and outbound communication. Fine-tuning is a one-time investment that pays back over the life of the model.

The model runs on infrastructure the company owns. For an interactive workflow — a CS agent in a ticketing tool — it can run on the agent's machine for the fastest possible response. For a batch workflow — overnight classification of incoming messages — it runs on a server.

The cost flips from per-message to fixed. The team pays once for the model, once for the hardware (or as part of the desk setup they already have), and the volume can grow ten-fold without the bill moving. Compare to the cloud model, where ten-fold volume growth means ten-fold cost growth.

Where cloud LLMs still win

We want to be honest about the cases where the cloud LLM is the right tool for email work.

Open-ended drafting where the model needs to reason about a novel situation. A specialized local model can handle 95 percent of email drafting, but for the 5 percent of cases that require unusual reasoning, a frontier model is genuinely better.

Multi-modal workflows where the email includes attachments — images, PDFs, spreadsheets — that need to be analyzed alongside the text. A general-purpose multi-modal model is currently better at this than a narrow text-focused local model.

One-off explorations and prototypes. If you are just figuring out whether AI email processing is worth doing at all, a cloud LLM with a fast iteration loop is a reasonable starting point. You can move local once you've validated the workflow.

For everything else — the 95 percent of email operations that are routine, repetitive, and high-volume — the local-SLM case is strong and getting stronger.

Why this hasn't happened yet, broadly

The same answer as document processing. The tooling is not yet there for most teams. Building a fine-tuned email model in-house requires ML investment most ops teams do not have. Buying a vertical product is possible in a narrow band — there are decent local-first tools for some specific email use cases — but the broader category is largely unaddressed.

We expect this to change. The training and deployment patterns are maturing. The economics are obvious to anyone who looks at the bill. The privacy story resonates with every legal team that has thought about it.

The companies that switch first will save substantial money. The companies that switch last will pay a recurring tax to cloud providers for work that did not need to be done in the cloud in the first place.

The pattern, again

Avery NXR is built for Next.js scaffolding, not email processing. But the pattern is the same: a narrow, repetitive, high-volume, privacy-sensitive, latency-relevant workload. The economics that make a local SLM the right choice for one are the same economics that make it the right choice for the other.

We are not announcing an email tool. We are pointing out — for the third time in this series, and not for the last — that the pattern generalizes across many of the operational workflows that companies currently route through cloud LLMs by default.

The question is not whether the work will move local. The question is when, and which teams move first.