Skip to main content
Use the following methods to extract structured data from documents.

Layout-based methods

MethodImageNotes
Box
Click to enlarge
Extracts contents from boxes with continuous borders.
Checkbox
Click to enlarge
Extracts true/false for the selection status of checkboxes.
Column
Click to enlarge
Extracts text aligned in a column, from an anchor down to the bottom of the page.
Document Range
Click to enlarge
Extracts text in a range, or extract image metadata (coordinates). Simpler alternative to the advanced Paragraph method.
Fixed Table
Click to enlarge
Extracts tables where column headings never vary.
Intersection
Click to enlarge
Extracts a target line at the intersection of a horizontal line defined by an anchor, and a vertical line defined by a second anchor.
Label
Click to enlarge
Extracts a line of text that’s proximate to another line.
Nearest Checkbox
Click to enlarge
Extracts true/false for the selection status of the checkbox nearest to the anchor.
Paragraph
Click to enlarge
Extracts paragraphs that partially span the page width, for example from columnar layouts.
Passthrough
Click to enlarge
Extracts anchor text, optionally using RegEx.
RegexExtracts text matching RegEx. Use RegEx capturing groups in this method to clean up extracted data in combination with the Passthrough method.
Region
Click to enlarge
Extracts data from a rectangular region defined by coordinates. Faster alternative to Box method.
Row
Click to enlarge
Extracts text aligned in a row.
SignatureExtracts true/false for the signed status of a region.
Text Table
Click to enlarge
Extracts tables using solely text-positioning data (fast but limited).

Large language model (LLM)-based methods

See LLM-based methods.
I