Use the following methods to extract structured data from documents.

Layout-based methods

MethodImageNotes
BoxExtracts contents from boxes with continuous borders.
CheckboxExtracts true/false for the selection status of checkboxes.
ColumnExtracts text aligned in a column, from an anchor down to the bottom of the page.
Document RangeExtracts text in a range, or extract image metadata (coordinates). Simpler alternative to the advanced Paragraph method.
Fixed TableExtracts tables where column headings never vary.
IntersectionExtracts a target line at the intersection of a horizontal line defined by an anchor, and a vertical line defined by a second anchor.
LabelExtracts a line of text that’s proximate to another line.
Nearest CheckboxExtracts true/false for the selection status of the checkbox nearest to the anchor.
ParagraphExtracts paragraphs that partially span the page width, for example from columnar layouts.
PassthroughExtracts anchor text, optionally using RegEx.
RegexExtracts text matching RegEx. Use RegEx capturing groups in this method to clean up extracted data in combination with the Passthrough method.
RegionExtracts data from a rectangular region defined by coordinates. Faster alternative to Box method.
RowExtracts text aligned in a row.
SignatureExtracts true/false for the signed status of a region.
Text TableExtracts tables using solely text-positioning data (fast but limited).

Large language model (LLM)-based methods

See LLM-based methods.