Extract from a PDF / Word / image (layout-aware)
3 min · 6 steps
The 'File Extract' node reads a document on disk (PDF, image, Word, scanned), runs layout-aware OCR, and optionally fills typed fields. For text already in the run, use 'Extract from text'.
Use File Extract (palette: AI → File Extract — node kind `file.extract`) when the input is a FILE on disk. It runs Kreuzberg layout-aware OCR (PaddleOCR / PP-Structure when the bundled engine supports it, falling back to Tesseract) and exposes the document's text, tables, and layout metadata. You then choose what to produce.
For text that's already in the run (an email body, a web page, an upstream node's text output) use Extract from text instead — it skips OCR and chunks the text for any size.
Steps
- Drop a 'File Extract' node onto the canvas.
From the Builder palette: AI → File Extract.
- Bind the file path.
Type an absolute path, click Browse… to pick a file, or reference an upstream step's output: `{{nodes.fetch_attachment.output.path}}`. Common upstreams: an email-trigger that writes attachment paths, an HTTP request that downloaded a PDF, an `fs.read` that surfaced a path.
- Choose what to produce.
The 'What to produce' dropdown has three modes. Text + tables only emits the document text, tables, and layout metadata with no field fill. Map fields asks the AI to align the document's labeled regions to your fields and copies the values programmatically — the most reliable path on small local models. Prompt-driven extracts your fields steered by a free-form prompt. Map and Prompt both reveal the typed fields editor (same name + type + description rows as Extract from text).
- Pick a vision model for image-only documents.
If the document is image-only (a scanned page or photo with no extractable text), the AI needs a vision-capable model to read it. Set 'Vision model' to a local model — or, with Consult Mode on in Settings, a cloud vision model from a connected provider. Text + PDF documents with embedded text don't need this. The Overview tab surfaces a non-blocking 'Pick a vision model' reminder until one is set.
- Tune OCR under Advanced (optional).
The OCR engine defaults to Auto (PaddleOCR when available, else Tesseract). Force a specific engine, set the OCR language (`eng`, `eng+spa`, …), toggle ML table-layout detection, or force OCR on PDFs whose embedded text the AI is missing.
- Run the agent.
The first call may take a few seconds while Kreuzberg routes the file through OCR. The Runs tab streams progress; the node's output panel shows the structured object once it lands (or the records table when 'Multiple records' is on in Map / Prompt mode).
Live recipes need the desktop
This article is a static preview. The in-app Help sidecar inside Avery NXR can fire each step against your live project — install the desktop to use it interactively.