Log analysis and observability: the AI workload that can't go to the cloud
· Avery NXR
Most of the use cases in this series describe workloads that could plausibly run on a cloud LLM, but where a local SLM is more cost-effective at scale.
Log analysis is different. Log analysis is a workload where running it on a cloud LLM is, for most companies, not just more expensive — it is economically impossible. The volume is too high. The bill, if you sent every log line through a frontier model, would dwarf the rest of the engineering organization's budget.
This is the workload where the local-SLM case is least controversial, and where the lack of mature tooling is most conspicuous.
The volume problem
A medium-sized engineering organization produces a lot of logs. A small SRE-led service might emit a few hundred megabytes of logs per day; a large distributed system can produce terabytes per day. The numbers vary by company, but the order of magnitude does not: log volume is one of the highest-volume data streams inside any technology company.
Consider a midmarket SaaS company with a moderately complex backend. Five hundred gigabytes of logs per day is a reasonable estimate. That is about a billion log lines, with an average size of five hundred bytes each.
If you wanted to run every log line through a cloud LLM to enrich it — classify it, summarize it, link it to related events, suggest a remediation — the token cost would be staggering. Each log line, with surrounding context, is maybe two hundred input tokens. A billion lines per day is two hundred billion tokens per day. At $3 per million tokens, that is $600,000 per day, or roughly $220 million per year.
For one company. For one workload. Obviously, nobody does this. The bill is so large that it does not get proposed.
What companies actually do is run AI on a small, filtered subset of their logs — the ones that hit a threshold, the ones that are flagged anomalous, the ones from a critical path. The subset is small enough to be affordable, but the cost of being selective is that the AI never sees most of the data, and so it misses patterns that would only be visible across the whole stream.
Why this is the most obvious local-SLM workload
Log analysis has the highest-volume, highest-cost cloud profile of any workflow we have looked at. It also has every property that makes a workload well-suited to local inference.
It is narrow. Logs from a given system have a specific shape — specific service names, specific error codes, specific patterns of upstream and downstream events. A model trained on that company's log patterns will outperform a general model that has to figure out the conventions from scratch.
It is repetitive. The same shape of log line, the same shape of error, the same shape of incident, repeated millions of times across the stream. Specialization compounds.
It is so high-volume that the cloud LLM cost is, as shown above, mathematically infeasible. There is no architecture in which you process every log line through a cloud LLM; the bill simply does not allow it. A local SLM, with fixed cost regardless of volume, is the only architecture in which full-stream AI analysis is possible.
It is privacy-sensitive in a specific way. Logs at most companies contain a mix of operational data (request IDs, status codes, latency numbers) and sensitive data (user IDs, sometimes PII, sometimes secrets that should have been scrubbed but were not). Sending all of this to a third-party cloud LLM is uncomfortable even at companies that are otherwise comfortable with cloud AI; it is unacceptable at regulated companies.
It is latency-relevant. For real-time incident response — when an alert fires and an engineer is trying to understand what happened — the difference between a 200ms summarization and a 2-second one is the difference between staying in flow and being interrupted.
What "AI on every log line" enables
Once a local SLM is processing the whole log stream, a few things become possible that were not possible before.
Real-time anomaly explanation. Every log line is contextualized against the patterns the model has learned. When something unusual happens, the model can offer a structured hypothesis — what changed, what upstream dependencies are involved, what similar incidents looked like — in milliseconds, on the same machine that is generating the logs.
Cross-stream pattern detection. With AI looking at every log line from every service, the model can see patterns that span services and time windows. A correlation between a deployment in service A and a latency spike in service B, three hops downstream, becomes visible — without anyone having to know to look for it.
Cheap retrospective analysis. Because the cost of running AI on logs is fixed rather than per-line, going back and re-running analysis with a new question is essentially free. "Show me every incident in the last six months where service X's error rate exceeded a threshold immediately after a deployment of service Y" — that query, on a cloud-LLM architecture, is unaffordable. On a local architecture, it is overnight work for a single machine.
Audit and compliance. Logs are the canonical record of what a system did. Running them through a local model that writes structured analysis alongside them creates an audit trail that is itself local, itself versioned, and itself inspectable.
The cost flip
The cost story for log analysis on a local SLM is unusually clean.
A reasonable mid-grade GPU server can process hundreds of millions of log lines per day. The capital cost of that server is in the tens of thousands of dollars; the operational cost (power, cooling, maintenance) is in the low thousands of dollars per year.
Compare that to the hypothetical $220 million annual cloud bill. The break-even point is, conservatively, in the first hour of operation.
This is not a marginal economic case. It is not "the local model is somewhat cheaper at scale." It is "the local model is the only architecture in which this workload is economically viable at all."
Why this hasn't happened yet
The same reason every other case in this series has not happened yet: the tooling is not mature.
Training a model that knows your specific log patterns is non-trivial. Deploying a model that can keep up with a multi-terabyte-per-day log stream requires engineering work. Integrating that model into the observability tools your team already uses requires significant glue code.
The companies that solve this end-to-end — train the model, package it well, integrate it with the major observability platforms — are going to find a large market. The cost asymmetry is so extreme that the savings, on any deployment beyond a few months, pay for substantial engineering investment.
We are not building a log analysis tool. Avery NXR is a Next.js scaffolding tool. But the gap in the market here is so large that we expect somebody will build the right product for it within the next eighteen months. When they do, every team running a serious observability stack will switch.
The general lesson
The general lesson from looking at log analysis is that the cloud-LLM economic model breaks down at extreme volume. There is some threshold above which the bill becomes infeasible. Below that threshold, the cloud LLM is fine and often the right choice. Above it, the only architecture that works is local.
Different workloads cross that threshold at different points. Log analysis crosses it at the volume of a small startup. Email processing crosses it for a midmarket company. Document processing crosses it for a regulated enterprise.
But every operational workload eventually crosses some version of this threshold, given enough growth. The companies that recognize the architecture pivot early — and build their workflows on local infrastructure — keep their costs flat. The companies that recognize it late pay a recurring tax to cloud providers for work that did not need to be in the cloud.
We think log analysis is the canary. It is the workload where the threshold is crossed earliest, the math is most obvious, and the lack of good tooling is most conspicuous. Watching this category mature will give a preview of how the broader operational-AI ecosystem ends up structured.