Multi-document extractions
Sometimes a single file contains multiple documents (a “portfolio”). For example, a portfolio file can contain an invoice, a tax document, and a contract.
Sensible recommends extracting each document in a portfolio using its own document type, so you can write validations for each type. For example, use an “income tax” doc type and an “invoice” doc type, rather than creating a “combined_tax_and_invoice” doc type.
To extract from a portfolio, take the following steps:
- Specify fingerprints to configure how Sensible segments the portfolio into documents. Fingerprints test for text matches on first pages, last pages, and other page types.
- Create an extraction request by taking the following steps:
- Indicate the file is a portfolio:
- Sensible app: Click the Portfolio button on the Extract tab.
- SDKs: Specify the Document Types parameter in the Extract method.
- API: Use one of the Portfolio extraction endpoints.
- In the request, specify the doc types that exist in the portfolio. For example, using the API,
"types": ["insurance_quote", "insurance_loss_run"]
. The extraction response includes document extractions and their page ranges in the portfolio.
- Indicate the file is a portfolio:
Examples
The following example shows extracting three one-page documents from a portfolio. The portfolio contains two car insurance quotes and one loss run.
Config
Document type 1
- doc type: “auto_insurance_quote”
- config name: “anyco_quote”
- config content:
The config is the same as the one used in the Getting started with layout-based extractions, with the addition of the following fingerprint:
Document type 2
- doc type: “loss_run”
- config name: “anyco_claims”
- config content:
The config is the same as the one used in the Sections topic, with the addition of the following fingerprint:
Example document
Example document | Download link |
---|
Output
For the preceding configurations, doc types, and example document portfolio, the following asynchronous request returns a list of document extractions:
- Make an extraction request. For example, through the API:
- This request returns an extraction ID. Use it to retrieve the extractions by replacing `YOUR_EXTRACTION_ID with the returned ID in the following example code:
The response contains extractions from three documents: