Resume screening and candidate matching: where AI cost and PII risk compound together

2026-05-26 · Avery NXR

Recruiting has been quietly transformed by AI in the past three years.

Every modern recruiting pipeline now runs AI on inbound resumes. The model extracts structured information (name, work history, education, skills), classifies the candidate against a role, ranks them against other applicants, drafts initial outreach, and feeds the structured data into the ATS. For high-volume roles at high-volume companies, the AI does the first-pass filter on a candidate pool that no human team could realistically read in full.

This is a real productivity win. It is also a workload that combines high cost, dense PII, and emerging regulatory pressure — making it one of the strongest cases for local inference in any operational domain.

The volume problem

Modern recruiting is high-volume in a way that surprises people who haven't looked recently.

A growing technology company with an active recruiting function — say, fifty open roles at any given time — receives roughly ten to twenty thousand inbound resumes per month. For a single popular role (an engineering position at a recognized company, a marketing role at a hot startup), inbound can be one to five thousand resumes per role per week.

Across a year, a midmarket company is processing well over a hundred thousand resumes. A large company is processing several million.

Each resume gets put through an AI workflow that does several things: parse the document, extract structured fields, classify against the role requirements, score the match, possibly draft outreach for high-matches, possibly suggest interview questions, possibly check for resume inconsistencies.

A reasonable token budget per resume is eight thousand input tokens (the resume text plus job description and context) and six hundred output tokens (the structured output). At frontier pricing, about $0.033 per resume.

A midmarket company processing a hundred thousand resumes per year pays about $3,300 per year just for the AI layer. A large company processing five million pays $165,000 per year. We have talked to enterprise talent acquisition functions whose AI bill is north of $500,000 per year.

These numbers are not enormous on their own. But they grow with hiring volume — and the recruiting market is unusually cyclical. A company in a hiring sprint can see its AI bill triple in a quarter. A company that doubles in headcount over a year sees its annual recruiting AI bill move proportionally.

The PII story

Resumes are not just generic documents. They are concentrated PII.

Every resume contains the candidate's name, contact information, education history (often with dates that imply age), work history (with company names and tenure), and frequently more — addresses, work authorization status, sometimes age, sometimes ethnicity-suggesting details, sometimes disability disclosures, sometimes religious or political affiliations.

For some categories of role and candidate, the resume also includes information that is explicitly regulated. Medical professionals' resumes contain identifiers that interact with healthcare regulations. Defense workers' resumes contain clearance information. Government workers' resumes touch various reporting requirements.

Sending all of this data to a third-party cloud LLM is a position not every TA leader, not every legal team, and not every regulator is comfortable with.

The PII story interacts with the recruiting AI story in specific ways. The European Union's AI Act, in particular, classifies AI systems used in employment decisions as "high-risk," which triggers significant compliance obligations. Many US states have introduced legislation requiring AI bias audits for hiring tools. Several jurisdictions require explicit candidate consent for automated decision-making in recruiting.

Companies running recruiting AI through third-party cloud LLMs are signing up for a compliance posture that is going to be increasingly difficult to defend over the next few years. The architecture that puts inference inside the company's own controlled environment dramatically simplifies the regulatory conversation.

Why this is a strong local-SLM workload

The standard properties for local-SLM suitability are present, and several are unusually strong.

The work is narrow. Reading resumes against role requirements is a specific, well-bounded task. A model trained on the company's own historical resumes, hiring outcomes, and successful employee profiles will outperform a general-purpose model on the company's own work.

The work is repetitive. The same shape of input (a resume), the same shape of output (structured fields and a match score), repeated tens of thousands of times. Specialization compounds.

The volume is meaningful and scales with company growth in a way that makes the cloud bill keep climbing.

The privacy posture, as discussed above, is materially stronger with local inference. PII risk is reduced. Regulatory compliance is simpler.

The latency matters less than in other workloads (resume screening is mostly batch, not interactive), but it still matters in a few cases — the recruiter looking at a candidate profile and waiting for the AI's match analysis, for instance.

What changes with local inference

A local-inference recruiting workflow looks like this.

A model is fine-tuned on the company's own hiring data — historical resumes, hiring outcomes, performance data of placed employees where it is available and ethically usable. The fine-tune captures the company's specific signals — what historical patterns predict good hires for which roles, which credentials matter for which functions, what the local hiring committee tends to weigh.

The model runs on the company's own infrastructure. Resumes flow in from the careers site, the ATS, sourcing tools, and recruiter inboxes. The local model processes them and writes structured data and match scores into the ATS. The recruiter sees the analysis the way they would see analysis from any other source.

The cost flips from per-resume to fixed. Hiring volume can spike — a new product launch, an expansion into a new market, a sudden funding round — without the AI bill spiking with it.

The privacy story improves. The PII does not leave the company. The data flow is auditable. The compliance conversation is simpler.

The bias-audit story improves. Bias audits of recruiting AI systems are easier to do when the model is yours, the training data is yours, and the inference is yours. The cloud-LLM-based system requires assertions about a model you don't own; the local-SLM-based system makes the audit a fully in-house exercise.

Where the cloud LLM is still defensible

A few cases where cloud-LLM-based recruiting AI is still the right answer.

For very small recruiting functions — say, a startup with a handful of roles and a few hundred resumes per month — the volume doesn't justify the local infrastructure.

For very polyglot recruiting situations — multinational companies hiring across many languages — a multi-lingual cloud model may be easier to deploy than a comparable local-SLM stack. (Though multi-lingual local models exist and the gap is closing.)

For recruiting workflows that lean heavily on reasoning about non-resume content — analyzing portfolio code, evaluating writing samples, scoring creative work — the breadth of a frontier model still helps. The local-SLM case is strongest for the structured part of recruiting (matching, scoring, ranking) and weaker for the open-ended part (evaluating creative output).

For a recruiting function of any meaningful size at a company subject to any of the emerging AI-in-hiring regulations, the local-SLM case is strong and getting stronger.

The pattern, the eighth time

Avery NXR is a Next.js scaffolding tool. It is not a recruiting product. The architectural pattern repeats.

Resume screening is narrow, repetitive, high-volume, PII-dense, regulation-sensitive work. The economics that favor a specialized local model for Next.js scaffolding are the same economics that favor a specialized local model for resume screening. The privacy and regulatory story makes the case stronger here than in most other workloads.

The companies that build excellent local-inference recruiting tools — with appropriate fine-tuning, bias-audit frameworks, ATS integrations, and business models — are going to find buyers in every TA function of any meaningful size. The cloud-LLM-default products will hold the market until the regulatory pressure and the cost compounding force the conversation.

The architectural shift is going to happen here, as it is going to happen across most operational AI workloads, in the next few years. The question is which companies build the right products in time to lead the shift, and which companies wait to follow.