Extract data from a document (sync)

curl --request POST \
  --url https://api.sensible.so/v0/extract/{document_type} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: image/jpeg'

{
  "id": "246a6f60-0e5b-11eb-b720-295a6fba723e",
  "created": "2022-10-31T16:27:53.433Z",
  "type": "auto_insurance_quotes_all_carriers",
  "status": "COMPLETE",
  "configuration": "config_for_x_company",
  "parsed_document": {
    "policy_number": {
      "type": "number",
      "value": 123456789,
      "lines": [
        {
          "text": "123456789",
          "page": 0,
          "boundingPolygon": [
            {
              "x": 6.458,
              "y": 2.601
            },
            {
              "x": 7.354,
              "y": 2.601
            },
            {
              "x": 7.354,
              "y": 2.767
            },
            {
              "x": 6.458,
              "y": 2.767
            }
          ]
        }
      ]
    },
    "name_insured": {
      "type": "string",
      "value": "Petar Petrov",
      "lines": [
        {
          "text": "Petar Petrov",
          "page": 0,
          "boundingPolygon": [
            {
              "x": 1,
              "y": 5.515
            },
            {
              "x": 1.935,
              "y": 5.515
            },
            {
              "x": 1.935,
              "y": 5.674
            },
            {
              "x": 1,
              "y": 5.674
            }
          ]
        }
      ]
    }
  },
  "validations": [
    {
      "description": "Policy number must be 11 digits",
      "severity": "error"
    },
    {
      "description": "Company email must be in format string@string",
      "severity": "skipped",
      "message": "Missing prerequisites - company_email"
    }
  ],
  "file_metadata": {
    "metadata": {},
    "error": "Error retrieving PDF metadata: Invalid PDF structure",
    "info": {
      "author": "Jay S. Schiller",
      "title": "file123",
      "creator": "macOS Version 11.2 (Build 20D64) Quartz PDFContext",
      "producer": "Preview",
      "creation_date": "2022-08-02T18:09:31.000Z",
      "modification_date": "2022-08-03T15:09:23.000Z",
      "error": "<string>"
    }
  },
  "validation_summary": {
    "fields": 6,
    "fields_present": 4,
    "errors": 0,
    "warnings": 1,
    "skipped": 1
  },
  "errors": [
    {
      "field_id": "phone_number",
      "message": "ConfigurationError: width <=0",
      "type": "configuration"
    }
  ],
  "completed": "2022-10-31T16:27:53.741Z",
  "classification_summary": [
    {
      "configuration": "config_for_x_company",
      "fingerprints": 2,
      "fingerprints_present": 2,
      "score": {
        "value": 3,
        "fields_present": 4,
        "penalities": 0.5
      }
    },
    {
      "configuration": "acme_co",
      "fingerprints": 2,
      "fingerprints_present": 2,
      "score": {
        "value": 0,
        "fields_present": 2,
        "penalities": 1.5
      }
    }
  ],
  "page_count": 100,
  "environment": "development",
  "document_name": "example.pdf",
  "coverage": 0.75
}

Document

Extract data from a document (sync)

Note: Use this endpoint for testing. Use the asynchronous extraction endpoints for production.

Extract data from a local document synchronously.

To explore this endpoint, use this interactive API reference, or use one of the following options:

For a quick “hello world” response to this endpoint, see the API quickstart
For a step-by-step tutorial about calling this endpoint, see Try synchronous extraction.
Run this endpoint in the Sensible Postman collection. Run in Postman

There are two options for posting the document bytes.

(often preferred) specify the non-encoded document bytes as the entire request body,and specify the Content-Type header, for example,“application/pdf” or “image/jpeg”. See the following for supported file formats.
Base64 encode the document bytes, specify them in a body “document” field, and specify application/json for the Content-Type header.

For a list of supported document file types, see Supported file types.

POST

extract

{document_type}

Extract data from a document (sync)

curl --request POST \
  --url https://api.sensible.so/v0/extract/{document_type} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: image/jpeg'

{
  "id": "246a6f60-0e5b-11eb-b720-295a6fba723e",
  "created": "2022-10-31T16:27:53.433Z",
  "type": "auto_insurance_quotes_all_carriers",
  "status": "COMPLETE",
  "configuration": "config_for_x_company",
  "parsed_document": {
    "policy_number": {
      "type": "number",
      "value": 123456789,
      "lines": [
        {
          "text": "123456789",
          "page": 0,
          "boundingPolygon": [
            {
              "x": 6.458,
              "y": 2.601
            },
            {
              "x": 7.354,
              "y": 2.601
            },
            {
              "x": 7.354,
              "y": 2.767
            },
            {
              "x": 6.458,
              "y": 2.767
            }
          ]
        }
      ]
    },
    "name_insured": {
      "type": "string",
      "value": "Petar Petrov",
      "lines": [
        {
          "text": "Petar Petrov",
          "page": 0,
          "boundingPolygon": [
            {
              "x": 1,
              "y": 5.515
            },
            {
              "x": 1.935,
              "y": 5.515
            },
            {
              "x": 1.935,
              "y": 5.674
            },
            {
              "x": 1,
              "y": 5.674
            }
          ]
        }
      ]
    }
  },
  "validations": [
    {
      "description": "Policy number must be 11 digits",
      "severity": "error"
    },
    {
      "description": "Company email must be in format string@string",
      "severity": "skipped",
      "message": "Missing prerequisites - company_email"
    }
  ],
  "file_metadata": {
    "metadata": {},
    "error": "Error retrieving PDF metadata: Invalid PDF structure",
    "info": {
      "author": "Jay S. Schiller",
      "title": "file123",
      "creator": "macOS Version 11.2 (Build 20D64) Quartz PDFContext",
      "producer": "Preview",
      "creation_date": "2022-08-02T18:09:31.000Z",
      "modification_date": "2022-08-03T15:09:23.000Z",
      "error": "<string>"
    }
  },
  "validation_summary": {
    "fields": 6,
    "fields_present": 4,
    "errors": 0,
    "warnings": 1,
    "skipped": 1
  },
  "errors": [
    {
      "field_id": "phone_number",
      "message": "ConfigurationError: width <=0",
      "type": "configuration"
    }
  ],
  "completed": "2022-10-31T16:27:53.741Z",
  "classification_summary": [
    {
      "configuration": "config_for_x_company",
      "fingerprints": 2,
      "fingerprints_present": 2,
      "score": {
        "value": 3,
        "fields_present": 4,
        "penalities": 0.5
      }
    },
    {
      "configuration": "acme_co",
      "fingerprints": 2,
      "fingerprints_present": 2,
      "score": {
        "value": 0,
        "fields_present": 2,
        "penalities": 1.5
      }
    }
  ],
  "page_count": 100,
  "environment": "development",
  "document_name": "example.pdf",
  "coverage": 0.75
}

Authorizations

Authorization

string

header

required

Sensible uses API keys to authenticate requests. Keep your API keys secure and do not share them publicly accessible areas such as GitHub, client-side code, etc. Authentication to the API is performed via Bearer Authentication. Provide your API key as the bearer auth value.

Path Parameters

document_type

string

required

Type of document to extract from. Create your custom type in the Sensible app (for example, rate_confirmation, certificate_of_insurance, or home_inspection_report). To quickly test this endpoint using the Try It button in this interactive explorer, use the senseml_basics tutorial document type with this example document. As a convenience, Sensible automatically detects the best-fit extraction from among the extraction queries ("configs") in the document type. For example, if you create an auto_insurance_quotes document type, you can add carrier 1, carrier 2, and carrier 3 configs to the document type in the Sensible app. Then, you can extract data from all these carriers using the same document type, without specifying the carrier in the API request.

Query Parameters

environment

enum<string>

default:production

If you specify development, extracts preferentially using config versions published to the development environment in the Sensible app. The extraction runs all configs in the doc type before picking the best fit. For each config, falls back to production version if no development version of the config exists.

Available options:

production,

development

document_name

string

If you specify the filename of the document using this parameter, then Sensible returns the filename in the extraction response.

Body

The body is of type file.

Response

200

application/json

The structured data extracted from the document.

The response is of type object.

Authentication Extract doc at a Sensible URL

Introduction

Extraction

Classification

Configuration

Extract data from a document (sync)

Authorizations

Path Parameters

Query Parameters

Body

Response