April 2024
n the last month we released support for GPT-4 Vision, so you can extract information from non-text images in documents. We also released a new recommend query groups feature, so you can upload and extract data from a document with a few button clicks instead of having to author your own LLM prompts. We added a new filterable extraction status, deprecated several methods, and made several improvements to advanced configurability.
New feature: GPT-4 Vision support for extracting data from non-text images
With the Query Group method’s new Multimodal Engine parameter, you can ask questions about non-text images embedded in documents. For example, for a real estate offering memorandum document containing a photo of a property, you can ask questions like does the house pictured have trees on the property?. You can also use the Multimodal Engine to extract from complex text layouts, for example, handwriting. For examples, see the Query Group method.
New feature: Automatic extraction with recommended query groups
You can now automatically extract data from a document without configuring queries manually. Sensible generates LLM-based queries based on the current page you’re viewing. For example, if you’re looking at a lease page summarizing the rents, clicking auto-generate can automatically target relevant data, create prompts, and extract structured data like monthly_rent or late_fee. Reuse your automatically generated prompts to extract from similar documents.
For more information, see the release announcement post and see the following walkthrough:
Deprecation: LLM-based methods replace Invoice, Key-Value, and TFIDF methods
The Invoice, Key-Value, and TFIDF methods are now deprecated. To duplicate these methods’ functions, use LLM-based methods.
Improvement: Keyset pagination for List Extractions API endpoint
You can now navigate paginated results from the List Extractions endpoint using keyset navigation instead of date navigation. Get the next page of results using the continuation_token query parameter, and configure page size with the new limit parameter. The date range parameters are now optional. The cutoff_date parameter is now deprecated.
Improvement: New PROCESSING status for extractions
In addition to filtering extractions by the WAITING, FAILED, and COMPLETED statuses, you can filter by the new PROCESSING status in the Sensible app or with the List Extractions endpoint. The status indicates that Sensible received the document and is working on the extraction.
Improvement: Round Currency and Number types
You can round extracted Currency- and Number-typed values to a specified decimal point using these types’ new Round To parameter. For example, configuring “roundTo”:2 rounds the number 5.919 to 5.92.
Improvement: Advanced Document Range configuration
The new Stop Offset Y parameter adds advanced configurability to the Document Range method. Use the parameter to offset the end of the range up or down the page from the range’s Stop line.
UX improvement: Advanced config-level options in Sensible Instruct
The LLM Engine parameter we released for the List method is now available in the Sensible Instruct editor in addition to the SenseML editor.
The Page Span Threshold parameter we released for the NLP Table method is now available in the Sensible Instruct editor in addition to the SenseML editor.