fingerprints for: | notes |
---|---|
standalone documents | Improve performance by testing for matching text in a document before running or skipping a config in a given document type. By skipping configs that fail a fingerprint, you can save processing time. This is relevant if a config contains computationally expensive operations like LLM-based methods, selective OCR, table recognition, or box recognition methods.To test for matching text at the field level instead of the document type level, specify field fallbacks. For more information, see Field query object. |
portfolios | A portfolio contains multiple documents combined into one file, such as an invoice, a contract, and a tax form. Sensible uses fingerprints to segment a portfolio into documents. Fingerprints test for matching text that characterizes first, last, or other pages for documents in the portfolio. For more information, see Multi-document extraction. |
"page" : "any"
.
key | value | description for portfolios |
---|---|---|
match (required) | a string, a Match object, or array of Match objects. | Specifies the text to match for the test. |
offset | integer | Specifies where to start or end the document segment, offset in pages relative to the first or last page defined by the Match parameter. For example, if you specify that the page that contains the phrase “A summary of your rights” is the first page of a segment, and Sensible finds a match for the first page on the zero-indexed page 3 of a portfolio: - specifying “offset”: -1 starts the document segment on page 2 of the portfolio. - specifying “offset”: 1 starts the document segment on page 4 of the portfolio. |
page | first, last, every, any | Configure with the following enums:first - The first page of a document segment must meet the match criteria. last - The last page of a document segment must meet the match criteria. If you specify last, you must pair it with a different page type, such as every. every - Every page in the document segment must meet the match criteria. If you define this page type, you must pair it with a different page type, such as last. any - Any page in the document segment can meet the criteria. Notes: - For an example see Multi-document extraction. - If you reuse the same config between portfolios and standalone documents, then for standalone document extractions, Sensible ignores the configured value of this parameter. |
first
page test.last
page test and an every
page test.any
page test unless other page types fail to segment the document.first
page type and wording A, and specify a second test with a first
page type and wording B.