Code review automation: when every PR runs up the meter
· Avery NXR
A few years ago, code review was a fully human activity. Engineers wrote pull requests. Other engineers read them. Comments happened, debates happened, code got merged or sent back for changes.
The pattern is changing fast. Modern engineering teams are layering AI into the code review process — not to replace human reviewers, but to do a first pass that catches the obvious issues before a human sees the PR. The AI looks for typos, unhandled errors, security anti-patterns, missing tests, style violations, and dozens of other categories of routine issue. The human reviewer then focuses on the architectural and judgment questions that the model isn't equipped to handle.
This is a real productivity gain. It is also, in the cloud-LLM-default implementation, a workload that costs more than people realize.
The math, on a productive team
A productive engineering team of fifty engineers ships a lot of pull requests. A reasonable average is two to three PRs per engineer per week, or about a hundred and twenty PRs per week for the team. Across a fifty-week working year, that is six thousand PRs.
Each PR, in a typical AI-augmented review workflow, gets passed through one or more cloud LLM operations. The model reads the diff, the surrounding code, the PR description, and sometimes the linked tickets or design documents. It then produces structured review comments — a list of issues, ranked by severity, with line references.
A reasonable token budget per PR is twenty thousand input tokens (the diff plus context) and eight hundred output tokens (the review comments). At frontier pricing, about $0.072 per PR.
Six thousand PRs per year at $0.072 is about $432 per year. That sounds small. It is small, for a fifty-engineer team. For a five-hundred-engineer org, it is $4,320 per year. For a five-thousand-engineer org, $43,200 per year.
The numbers are not the largest in this series. But there is a multiplier we haven't accounted for yet: the AI review is often run multiple times per PR — once when the PR is opened, once after each push, once before merge. The actual bill is two to four times the naive calculation, and at a five-thousand-engineer org it pushes well into six figures per year.
Why this is a strong local-SLM workload
Code review is narrower than it looks at first. The model needs to know one thing — how to read code in the language and framework the team uses — and not anything else. A model trained on the team's own codebase patterns will outperform a general-purpose model on the team's own code, every time.
It is repetitive. The same shape of issue, the same shape of code-review comment, repeated thousands of times across the year. A model that has seen this team's PRs improves at this team's patterns.
It is high-volume in a way that matters more than the raw bill suggests. Code review is a critical-path workflow; PRs that wait for review are PRs that aren't merged, which is engineering velocity that isn't being captured. A local model that responds in two hundred milliseconds delivers review comments while the engineer is still in flow on the PR. A cloud model that takes one to three seconds per analysis pushes the workflow asynchronous; the engineer moves to another task and comes back later.
It is privacy-sensitive in a specific way. The code in a private repository is one of the most sensitive things a company owns. Sending every diff to a third-party cloud LLM is a posture not every CTO is comfortable with — and at companies with proprietary algorithms, trade secrets, or unreleased product code, it is a non-starter.
It is latency-sensitive in the interactive case. When an engineer pushes a commit and waits to see what the AI review says, the wait time is friction they feel directly. Two hundred milliseconds feels instant; two seconds feels like the wrong tool for the job.
What "review on every commit" enables
The interesting thing about a local AI code review workflow is what becomes possible once the per-review cost drops to zero.
Review on every commit, not just on PR. With a local model, the engineer can get AI feedback on the very first commit of a feature branch — long before they would have opened a PR. The feedback loop tightens from "open PR, wait for AI, address comments, push again" to "commit, see comments inline, fix as I go."
Review on speculative changes. The engineer can ask the model to review changes they haven't even committed yet — staged changes, working-tree changes, partial work. The cloud-LLM model makes this prohibitively expensive at scale; the local-SLM model makes it free.
Per-engineer customization. The local model can be fine-tuned on the patterns the team — or even the individual engineer — has historically been corrected on. The model learns the team's coding standards as a side effect of being trained on the team's PRs.
Continuous baseline review. Run the model over the existing codebase, in the background, looking for issues that were merged in before the AI review process was in place. With cloud LLMs, this kind of retrospective work is too expensive to do often. With a local model, it is free overnight work.
These workflows are not feasible in a cloud-LLM-first architecture because the cost of running them at the necessary volume is too high. A local-SLM-first architecture makes them straightforward.
What the architecture looks like
A code review workflow on a local SLM has a configuration like this.
A model — fine-tuned on the team's codebase, language conventions, and historical review comments — runs on the engineer's workstation, the team's CI infrastructure, or both.
For interactive review (every commit, every save in some configurations), the model on the workstation provides feedback in real time. For PR-level review, the same model (or a slightly different one, if the team chooses) runs in CI when the PR is opened or updated.
The output flows into the existing code review tools — GitHub PR comments, GitLab discussion threads, Bitbucket inline comments. The integration looks identical to a cloud-LLM-based tool; just the inference happens locally.
The cost flips. The team pays once for the model and once for the hardware (which they have anyway — every engineer has a workstation). After that, the marginal cost of review is zero.
When cloud LLMs are still the right call
A few cases where cloud-LLM-based review wins.
For very small teams (under ten engineers), the volume is too low to justify the infrastructure investment. The cloud LLM works fine.
For polyglot teams that work across many languages and frameworks, training a single local model that covers all of them is hard. The cloud LLM's breadth helps.
For code review tasks that require reasoning about open-ended problems — "is this architecture decision correct?" — the local model may not have the breadth. A well-designed pipeline routes the architecture questions to the cloud and the routine review to local.
For everything else — the typical engineering team, working in a typical language stack, on a typical codebase — the local-SLM case is strong.
The Avery NXR connection, again
We have noted this connection in every post of this series, and we will note it again: Avery NXR is a code scaffolding tool, not a code review tool. The architectural pattern is the same.
The case for a specialized local model — narrower, faster, cheaper, more private, more idiomatic — that we make for code scaffolding is the same case that applies to code review. Different workflow. Same shape of solution.
We are not building a code review product. The pattern, however, is going to produce a category of products in the next eighteen months, and the companies that build them well will find willing customers among every engineering team that has noticed the cloud-LLM review bill creeping into the six-figure range.
The teams that recognize the pattern early — both the builders of the tools and the engineering orgs that adopt them — are going to be ahead of the cost curve while the rest of the industry is still defaulting to cloud.