- SenseML is for advanced config authoring. For a simpler authoring experience, use Sensible Instruct. For more information about SenseML versus Sensible Instruct, see Choosing extraction strategy. For authoring in Sensible Instruct, see Getting started.
- If you instead want to explore without much explanation, then sign up for an account and check out our interactive in-app tutorials in the
sensible_instruct_basics
document type. - If you want a quick “hello world” API response, see the API quickstart.
Get structured data from an auto insurance quote
Let’s get started with SenseML! If you can write basic SQL queries, you can write SenseML queries. SenseML shields you from the underlying complexities of PDFs, so you can write queries that are visually and logically clear to a human programmer. In this tutorial, you’ll:- Write a collection of queries ( a “config”) to extract structured data from an example auto insurance document
- Learn how the config works, including key concepts like lines, anchors, and methods
- Test the config by running your config against a second, similar auto insurance document
- Use the API to integrate your Sensible config with your application
- Validate extractions in production by using JsonLogic to define expected extracted values and flag unexpected values as warnings or errors
Get an account
- Get an account at sensible.so. If you don’t have an account, you can still read along to get a rough idea of how things work.
- Log into the Sensible app.
Configure the extraction
- In the Document Types tab, Click New document type to create a new document type and name it “auto_insurance_quote.” Leave the defaults and click Create.

- To upload an example document for your document type, take the following steps:
-
Download the following document:
| Example document | Download link | -
As the following screenshot shows, click the auto_insurance_quote document type you created, click the Reference documents tab, and click Upload document:
- In the file upload dialog, choose the generic car insurance quote you downloaded in a previous step.
-
Download the following document:
- To create a configuration for your document type, take the following steps:
- In your auto_insurance_quote document type, click the Configurations tab.
- On the tab, click Create configuration, name it “anyco” (for the fictional company providing the quote), and click Create.
- To edit your anyco configuration, click it. When the configuration opens, you see an empty config pane on the left, the document in the middle, and an empty output pane on the right:

Extract data
For this tutorial, you’ll extract these fields:- a couple of premiums
- the policy number
- the policy period
- Paste this config into the left pane in the editor to extract the data:
JSON

JSON
- For a deep dive on how the config works, see the following section.
- If you want to skip ahead and try out the API, see Integrate with your application.
How layout-based extraction works
This guide focuses on layout-based document extraction, which works as follows:- Each “field” is a basic query unit in Sensible. Each field outputs a piece of data from the document that you want to extract. Sensible uses the field
id
as the key in the key/value JSON output. For more information, see Field. - Sensible searches first for a text “anchor” because it’s a computationally quick way to narrow down the location of the target data to extract. An anchor is text that always occurs close to your target text. Without it, Sensible wouldn’t know which page to search in for your target text . For more information about defining complex anchors, see Anchor.
- Then, Sensible uses a “method” to expand its search out from the anchor and extract the data you want. For more information about methods, see Methods.
Type of method | explanation | description |
---|---|---|
layout | How it works: label method | Grab info immediately proximate to labeling text. |
layout | How it works: row method | Grab info from a cell in a row. |
layout | How it works: box method | Grab info from a box. |
Type of method | explanation | description |
---|---|---|
Natural-language | How it works: query method | Ask a free-text question about simple information in the document |
How it works: Query method
The easiest way to start extracting simple information is to ask a natural-language question. For example, to extract the bodily injury liability:
bodily injury premium
. You can group together other queries if the answers are located within a page or two of each other in the document. For example, in the group, the config also queries for the insurer's customer service phone number
.
JSON
JSON
"street address for the Anyco insurance company"
and see what you get. For easy authoring, try out this method in Sensible’s visual authoring tool.
LLM-based methods such as the Query Group method can run up against limitations with complex document formatting. In such cases, combine LLM-based methods with layout-based methods in the same document extraction configuration.
Let’s look next at several simple layout-based methods.
How it works: Label method
To extract the policy period from the document:
JSON
- The anchor (
"policy period"
) is text that’s pretty close to the text to extract, so it can serve as a “label” for that text ("id": "label"
). - The text to extract is to the right of the anchor (
"position": "right"
).
JSON
Key concept: lines
See those gray boxes around the text in the following image?
JSON

How it works: Row method
To extract the comprehensive premium of $150:
JSON
- The anchor text (
"comprehensive"
) is part of a row of lines ("id": "row"
). - The returned value is a currency (
"type": "currency"
). For other data types you can define, see Field query object. - The text to extract is the second line in the row after the anchor (
"tiebreaker": "second"
). Use tiebreakers to select lines in rows, for example maximum and minimum values (<
and>
). - By default, the Row method extracts values to the right of the anchor. You can override the default by specifying (
"position":"left"
).
JSON
"tiebreaker": "second"
select 150, since $250 is the second line after the anchor (the first line is ............
)?
The reason is that "tiebreaker": "second"
evaluates after the data type specified in the field, "type": "currency"
. Instead of looking for the second line after the anchor in general, Sensible looks for the second line that contains a currency. Convenient, right?
Key concept: visualize anchors and matches
In the app, you can visually inspect anchors and methods by looking at their color coding:- Orange boxes show lines matched by the Anchor object.
- Blue boxes show lines matched by the Method object.
- Dotted blue boxes show lines discarded by the Method object. Seeing the entire method match in the app can help you troubleshoot unexpected output.

"tiebreaker": "second"
.
How it works: Box method
To extract the policy number from this document:
JSON
- The anchor is inside a box (
"id": "box"
). - The anchor text is
policy number
. - The anchor line is a little more complex than previous examples, because it also defines a match type (
"type": "startsWith"
). You can write a simpler string anchor as"anchor":"policy number"
, or you can expand to complex anchors. For more information, see Anchor object.
JSON
Advanced queries
You can get more advanced with this auto insurance config. For example:- You can use a Column method to return all the listed premiums (15, $130).
- The limits listed in the table are tricky for the Row method to capture since they can be a variable number of lines. Row methods depend on strict horizontal alignment of lines, so Sensible extracts the first line. Instead, use the Table method to more reliably capture the data in each cell of the whole table. Or, use an
xRangeFilter
parameter in the Document Range method to capture the limits. - What if the document listed emails, and you just wanted to capture all those emails? You could use a regular expression (regex) in a
"match":"all"
anchor coupled with a Passthrough method, or the Regex method. - You can split the policy period into two dates, either by using the Split computed field method, or by setting the Date type on the field and using a tiebreaker.
Test the config
Before integrating the config with an application and writing validation tests against it, double check the config by uploading another quote.-
Repeat the steps in the previous section to upload a second generic car insurance quote:
| auto_insurance_anyco_2 | Download link | - Click the anyco config, select the “auto_insurance_anyco_2” document, and look at the output. Unlike the first document, the policy period takes up two lines, so Sensible misses the end year (2021):
JSON

- Document Range method
- Region method
"policy period"
), you need to specify to:
- include the anchor with `“includeAnchor”: true```* filter out unwanted text in the anchor (the words “Policy period”) with a Word Filters parameter.
policy_period
field with this example:
JSON
policy_period
field with the following field in the Sensible app:
JSON


- Click Publish configuration and choose Production to save your changes to the config.
Integrate with your application
When you’re ready to integrate with your application, enable using the config with the Sensible SDKs or API by taking the following steps:- Click Publish configuration. The config is still a work in progress, so click Development. Now you can use the query parameter
env=development
to test the integration before you go to production:.
- Use the Sensible SDKs or API to integrate with your application.
Validate extractions in production
In a previous section, you tested a couple of documents manually. Now it’s time to scale up and quality control the extractions by writing tests that run for all API extractions in a doc type. Use JsonLogic to validate that the extracted information makes sense for the car insurance document:-
Test that the property damage liability premium is cheaper than the comprehensive premium:
{"<":[{"var":"property_liability_premium.value"},{"var":"comprehensive_premium.value"}]}
-
Test that the policy number is a nine-digit number:
{"match":[{"var":"policy_number.value"},"\\d{9}"]}
To add these tests:
- In the auto_insurance_quote document type, click Create validation. Add the following input to the dialog:
- Set the Severity to Warning
- Set the Description to “prop. damage is less than comprehensive”
- Set the Condition to:
JSON

- Click Create.
- Repeat the previous steps to create another validation with the following settings:
- Set the Severity to Error
- Set the Description to “policy number is a nine-digit number”
- Set the Condition to:
JSON
- To test the validations with a document that’s missing information, try out an API call with the following example document that has these errors:
- the policy number is missing
- the property damage liability premium is $200 more than the comprehensive premium
auto_insurance_anyco_3 | Download link |
---|
JSON
Next
- Check out the SenseML method reference docs to write your own extractions
- Learn more about validations to test the quality of your extractions in production