Avery.Software — Native Execution Runtime
RuntimeUse casesPricingHelpBlog
← All postsBlog

The cost of an AI-generated app, over a year

2026-05-25 · Avery NXR

The cost of any individual AI coding session is small. A few cents on a frontier model. Maybe a dollar if the session is long. It is easy to look at the per-prompt number and conclude that AI coding tools are essentially free.

The receipt at the end of the year tells a different story. We ran the numbers for a five-person engineering team. Here is what we found.

The setup

Imagine a team of five engineers, each shipping code daily, each using an AI coding assistant for roughly a third of their workday. That is a conservative estimate — for a lot of teams now, the ratio is higher.

Each engineer runs, on average, eighty AI sessions per workday. A session is anywhere from one prompt to twenty. The average session uses about ten thousand input tokens (the context window plus the project files the model needs to see) and about two thousand output tokens (the generated code, the explanation, the diff).

We are going to be charitable to the cloud model on these numbers. Real teams use larger contexts and longer sessions.

The annual bill on a cloud frontier model

At current pricing for a frontier cloud LLM — roughly $3 per million input tokens and $15 per million output tokens — each session costs about $0.06.

Eighty sessions per engineer per day. Twenty workdays per month. Five engineers.

That comes out to roughly $480 per engineer per month, or about $5,760 per engineer per year. For a team of five, the annual bill is around $29,000.

If the team is heavier on the AI tool — twice the sessions, or larger contexts — the number doubles. We have seen teams report monthly bills above $2,000 per engineer on frontier models.

This is a real number. It is not the kind of number that hides in the SaaS budget. It shows up on the cap table conversation.

The annual bill on a local SLM

Avery NXR ships with a local Small Language Model. The model runs on the engineer's existing laptop. There is no per-token cost. There is no API.

The cost structure is:

  • The one-time price of Avery NXR (a flat-rate license, no per-prompt billing).
  • The electricity to run the laptop's CPU/GPU during inference, which is rounding-error compared to the laptop's normal power draw.
  • The time to download the initial model weights, once.

For a team of five, the annual bill is the license cost, full stop. There is no usage curve. There is no surprise invoice at the end of a heavy month. The cost does not scale with how much the team uses the tool.

This is the economic argument for local SLMs in one sentence: a sunk cost is not a recurring cost.

Where the cloud model is worth the bill

We are not arguing the cloud model is overpriced. For some workloads, $29,000 a year is well worth it.

If your team's AI coding work is wide-ranging — Python, Rust, Go, frontend, backend, devops, all of it — a frontier model is the right tool. The breadth of the frontier model is what you are paying for.

If your team needs to reason about large, novel refactors across thousands of files — the kind of task where context size is the bottleneck — a frontier model with a million-token context is genuinely useful. Avery NXR's local SLM is not going to compete on that workload.

If your team's AI work is narrow and bounded — scaffolding new applications, generating CRUD layers, wiring auth and billing — the breadth of the frontier model is wasted. You are paying for capacity you are not using.

The economic case for the local SLM is strongest when the work is narrow. The economic case for the frontier model is strongest when the work is broad. Most real teams have both kinds of work, which is why we expect most teams to use both kinds of tools.

What this means for the budget conversation

The interesting move in the budget conversation is not "switch from the cloud model to the SLM." It is "switch the narrow work to the SLM and keep the cloud model for the broad work."

A team doing this saves the bulk of the bill on the narrow work, which is also the work that runs most frequently. Scaffolding a new app, adding auth, generating a CRUD layer, building a dashboard — these are the high-frequency operations, and they are what eats most of the token budget on a cloud plan.

After the switch, the cloud bill drops dramatically — sometimes by ninety percent. The cloud model is still there for the times the team needs it. But the daily, high-frequency work is now sunk cost, not recurring cost.

We expect this two-model split to become the default in the next two years. The frontier model for the open-ended work. The local SLM for the narrow, high-frequency work. The bill at the end of the year reflects the actual breadth of what each model was needed for.

A closing note for solo developers

The numbers above are for teams. For a solo developer, the scale is smaller, but the shape is the same.

A solo developer running heavy AI coding on a frontier model spends $100 to $300 a month, depending on intensity. Over a year, that is $1,200 to $3,600 — a real chunk of an indie hacker's runway.

For a solo developer building Next.js applications with Avery NXR, the cost is the license. No monthly bill. No metering. No watching the token counter while you iterate. The tool that scaffolds the application is sunk cost, and the developer's runway lasts longer.

That is the local SLM economics in one paragraph. Not a manifesto. Just a smaller invoice.