List
Extracts repeating data in a document, such as the work history or skills on a resume, the vehicles on an auto insurance policy, or the line items on an invoice. It can find these facts in paragraphs of free text or in more structured layouts, such as key/value pairs or tables.
For tips on authoring this method in Sensible Instruct, see List tips.
Advantages
- Low code. Describe what you want to extract in prompts for a large language model (LLM)
- Can reformat or filter extracted data based on your prompts.
- Doesn’t require an anchor.
Limitations
- Sensible can extract up to 20 pages for a single list field. If the list exceeds that limit, Sensible truncates the list.
- For highly complex repeating layouts, such as insurance loss run documents, use the Sections method.
How it works
For more information about how this method works, see Notes.
Parameters
Note: For the full list of parameters available for this method, see Global parameters for methods. The following table only shows parameters most relevant to or specific to this method.
Note You can configure some of the following parameters in both the NLP preprocessor and in a field’s method. If you configure both, the field’s parameter overrides the NLP preprocessor’s parameter. For more information, see Advanced prompt configuration.
key | value | description |
---|---|---|
id (required) | list | The Anchor parameter is optional for fields that use this method. If you specify an anchor: - Sensible ignores the anchor if it’s present in the document. - Sensible returns null for the field if the anchor isn’t present in the document. |
description (required) | string | A prompt describing the list’s subject matter as a whole. |
properties (required) | object | An array of objects with the following parameters: -id (required): A user-friendly ID for the data in the extraction output. -description (required): A prompt describing the list item that you want to extract. The prompt can include instructions to reformat or filter the data. For example, provide prompts like ” transaction amount. return the absolute value” or “vehicle make (not model)“. -type: The list item’s type. For more information, see types. |
(Deprecated) promptIntroduction | string. | (Deprecated) overwrites the introductory text at the beginning of the full prompt that Sensible submits to the LLM for this field. |
llmEngine | fast, thorough. default: fast | Specifies the LLM model to which Sensible submits the full prompt, and affects the number of chunks that Sensible submits to the LLM.If the Fast parameter results in incomplete extractions for multi-page lists, use Thorough as an alternative. - fast: Sensible uses a faster LLM model (GPT-3.5 Turbo) and can submit a smaller number of chunks than specified by the Chunk Count parameter. - thorough: Sensible uses a slower LLM model (GPT-4 Turbo) and submits exactly the number of chunks specified by the Chunk Count parameter. Sensible can take several minutes to return the list. For more information, see Notes. |
contextDescription | For information about this parameter, see Advanced prompt configuration | |
pageHinting | For information about this parameter, see Advanced prompt configuration | |
chunkCount | 20 | For information about this parameter, see Advanced prompt configuration |
chunkSize | 1 | For information about this parameter, see Advanced prompt configuration |
chunkOverlapPercentage | 0 | For information about this parameter, see Advanced prompt configuration |
pageRange | For information about this parameter, see Advanced prompt configuration. |
Examples
The following example shows using the List method to extract information from a menu about listed menu items.
Config
Example document
The following image shows the example document used with this example config:
Example document | Download link |
---|
Output
Notes
For an overview of how the List method works, see the following steps:
- Sensible finds the chunks of the document that most likely contain your target data:
- Sensible concatenates all your property descriptions with your overall list description.
- Sensible splits the document into equal-sized chunks.
- Sensible scores your concatenated list descriptions against each chunk.
- Sensible selects a number of the top-scoring chunks:
- If you specify Thorough for the LLM Engine parameter, the Chunk Count parameter determines the number of top-scoring chunks Sensible selects to submit to the LLM.
- If you specify Fast for the LLM Engine parameter, 1. Sensible selects a number of top-scoring chunks as determined by the Chunk Count parameter. 2. To improve performance, Sensible removes chunks that are significantly less relevant from the list of top-scoring chunks. The number of chunks Sensible sumbits to the LLM can therefore be smaller than the number specified by the Chunk Count parameter.
- To avoid large language model (LLM)‘s token limits, Sensible batches the chunks into groups by page numbers. Sensible batches a maxiumum of 20 page numbers. The chunks in each page group can be non-consecutive in the document.
- For each page group, Sensible submits a full prompt to the LLM that includes the pages’ chunks as context, page-hinting data, and your prompts. For information about the LLM model, see the LLM Engine parameter. For more information about the full prompt, see Advanced prompt configuration. The full prompt instructs the LLM to create a list formatted as a table, based on the context.
- Sensible concatenates the results from the LLM for each page group and returns a list, formatted as a table.