Extract data from a document (sync)
Note: Use this endpoint for testing. Use the asynchronous extraction endpoints for production.
Extract data from a local document synchronously.
To explore this endpoint, use this interactive API reference, or use one of the following options:
- For a quick “hello world” response to this endpoint, see the API quickstart
- For a step-by-step tutorial about calling this endpoint, see Try synchronous extraction.
- Run this endpoint in the Sensible Postman collection. Run in Postman
There are two options for posting the document bytes.
- (often preferred) specify the non-encoded document bytes as the entire request body,and specify the
Content-Type
header, for example,“application/pdf” or “image/jpeg”. See the following for supported file formats. - Base64 encode the document bytes, specify them in a body “document” field, and specify application/json for the
Content-Type
header.
For a list of supported document file types, see Supported file types.
Authorizations
Sensible uses API keys to authenticate requests. Keep your API keys secure and do not share them publicly accessible areas such as GitHub, client-side code, etc. Authentication to the API is performed via Bearer Authentication. Provide your API key as the bearer auth value.
Path Parameters
Type of document to extract from. Create your custom type in the Sensible app (for example, rate_confirmation
, certificate_of_insurance
, or home_inspection_report
).
To quickly test this endpoint using the Try It
button in this interactive explorer, use the senseml_basics
tutorial document type with this example document.
As a convenience, Sensible automatically detects the best-fit extraction from among the extraction queries ("configs") in the document type.
For example, if you create an auto_insurance_quotes
document type, you can add carrier 1
, carrier 2
, and carrier 3
configs
to the document type in the Sensible app. Then, you can extract data from all these carriers using the same document type, without specifying the carrier in the API request.
Query Parameters
If you specify development
, extracts preferentially using config versions published to the development environment in the Sensible app. The extraction runs all configs in the doc type before picking the best fit. For each config, falls back to production version if no development version of the config exists.
production
, development
If you specify the filename of the document using this parameter, then Sensible returns the filename in the extraction response.
Body
The body is of type file
.
Response
Unique ID for the extraction, used to retrieve the extraction
Date and time Sensible created the initial empty extraction and set its status to WAITING.
Unique user-friendly name for a document type
Status of the extraction:
- WAITING: Sensible created an initial empty extraction and is waiting for the document.
- PROCESSING: Sensible received the document and is extracting data.
- FAILED: The extraction failed.
- COMPLETE: The extraction is complete.
WAITING
, PROCESSING
, COMPLETE
, FAILED
Name of the "configuration", a collection of SenseML queries for extracting document data.
Data extracted from the document, structured as an array of fields. Configure the verbosity parameter in the SenseML configuration to return extraction metadata, such as:
- page numbers
- the bounding polygons that define line coordinates
- for text that Sensible OCR'd, confidence scores. For more information, see Verbosity.
Which extracted fields failed validation rules you write in the Sensible app
Metadata about the PDF file, for example author, authoring tool, and modified date.
Summary of the extracted fields that fail validation rules you write in the Sensible app.
Extraction error messages.
Date and time Sensible set the extraction's status to COMPLETED
Metadata about how Sensible scores configs against the document to extract from. By default, Sensible compares all configs in the document type, then chooses the best extraction using fingerprints, scores, or a combination of the two. When two extractions tie by score and fingerprints, Sensible chooses the first configuration in alphabetic order. For more information, see fingerprints.
Total number of pages in the document.
Name of the environment to which the configuration used by this extraction was published.
If you specify the filename of the document using the document_name
parameter, then Sensible displays the name in extraction history in the Sensible app and returns the name in the extraction response.
The coverage score measures how fully an extraction captured all your target data in the document. It's a percentage comparing non-null, validated fields to total fields returned by a config for a document. For example, a coverage score of 70% for an extraction with no validation errors means that 30% of fields were null. For more information about scoring, see Monitoring extractions.