option | configurable for | notes |
---|---|---|
OCR Level parameter | document types | Use this option to configure the criteria by which Sensible determines if a whole document requires OCR. |
OCR preprocessor | configs | Use this option to OCR specified pages or page ranges in a document. |
OCR Engine parameter | document types | Use this option to choose your OCR provider, for example, Amazon, Google, or Microsoft. |
- Sensible converts supported Microsoft Office file types into PDFs.
- Sensible transforms the bytes of the document into raw text, and determines whether the document needs OCR:
- If the file type is an image (for example, PNG), Sensible runs OCR for the whole document, as specified by the document type’s OCR Engine parameter.
- (Configurable) if the file is a PDF, Sensible processes the file, as specified by the document type’s OCR Level parameter and OCR Engine. For more information, see the following table.
- (Configurable) After additional intervening steps, Sensible applies your configured preprocessors, including the OCR preprocessor. This preprocessor runs for documents that don’t trigger whole-document OCR in a previous step.
Notes
- For more information about OCR versus embedded text extraction, see Solving direct text extraction from PDFs.
- For information about extracting data from non-text images, such as photographs, charts, or illustrations, see the Query Group method’s Multimodal Engine parameter. You can use the Multimodal Engine parameter as an alternative to OCR to extract from poor-quality text images, such as handwriting.