File types

Sensible supports the following file types:

OperationPDFMicrosoft Word (\DOC and \DOCX)Microsoft Excel(XLSX)image formats(JPEG, PNG, and TIFF)
Sensible app’s Extract tab
Single-file extraction with SDKs or API
Portfolio extraction with SDKs or API
Classification with SDKs or API

File sizes

Sensible supports the following file sizes:

OperationSize limit for /extract/{doc-type} API endpointSize limit for aysnchronous calls
Single-document file extractionunder 4.5MB, or under 30 seconds processing time6 GB
Portfolio extractionn/a6 GB
Classification4.5 MB4.5 MB

Notes

  • When extracting from image file formats, Sensible ignores OCR or OCR preprocessor settings you configure in the document type or SenseML configuration. For more information about OCR, see OCR level.
  • For DOC and DOCX documents, Sensible converts the document to PDF before processing it.
  • For XLSX documents, Sensible converts the document to PDF. To style the document, Sensible:
    • Discards truncated text in cells. To retain the text, reformat or resize the cells in Excel so the text is visible.
    • Converts sheets to pages by scaling text so that all sheets have the same width and by breaking long sheets into consecutive pages.
    • Adds the sheet name as a header on each page.
  • For TIFF documents, SenseML methods that attempt to render pages return an error, including:
    • pixel-based methods, such as Box, Checkbox, Signature, and image coordinates returned by the Document Range method
    • Key/Value method
    • Fixed Table method with the Stop parameter specified. Use the Text Table method as an alternative.