key | value | description |
---|---|---|
type (required) | mergeLines | merges lines distributed along a horizontal axis. |
directlyAdjacentThreshold (required) | number >= 0.16 | Usually, it’s recommended to leave the default for this parameter (0.16).Sensible uses the default setting for this parameter to transform separate tokens output from Google OCR into lines.This parameter specifies the fraction of line height under which to merge two adjacent lines distributed along an x-axis without a space. For example, at 0.16, this preprocessor merges two lines separated by a small gap whose width is less than 16% of the line height. Choosing a larger number merges more aggressively. |
adjacentThreshold (required) | number >= 0.6 | Corrects oversplit lines. Specifies the fraction of line height under which to merge two adjacent lines distributed along an x-axis with a space. The built-in merger uses 0.6, so choosing a larger number merges more aggressively. For an example, see the Examples section. |
yOverlapThreshold | number between 0 and 1.0. default: 1.0 | Merges lines that aren’t perfectly aligned at the same height on the page. Specifies the y overlap above which the Merge Lines preprocessor merges two adjacent lines. Y overlap is the section of the joint y-axis range of two lines that’s occupied by both lines. For example, if two lines share the same minimum and maximum y-axis values, their overlap is 1. If one line’s extent is from 0 to 10 and the other line’s extent is from 2 to 12 on the y-axis, their overlap is .667 (8 / 12). For an example, see the Examples section. |
minXGapThreshold | number in inches | Configure this parameter if two lines overlap on an x-axis. The default behavior is to merge these overlapping lines into one line. To split them instead, set a cap on the amount of allowable overlap. For example:0 - splits lines if their line boundaries are touching but not overlapping.0.1 - splits lines if their boundaries overlap a little, up to 0.1 inches.2.0 - splits lines even when they overlap a lot, up to 2.0 inches.For an example, see the Examples section. |
Name (First, Middle, Last, Suffix, Trust or Custodian)
isn’t one line, but is instead split on words.
SOLUTION
CONFIG
Example document | Download link |
---|
"adjacentThreshold": 0.1
to see oversplit lines."adjacentThreshold": 2.0
to see aggressively merged lines."yOverlapThreshold": 0.2
to observe how lines with misaligned heights (like the email address) merges more aggressively.Example document | Download link |
---|
Supplementary underinsured/uninsured motorist coverage500,000 USD Combined single limit incl. umbl
).
The Min X Gap Threshold preserves the intended document formatting, which is a two-column table. By preserving this format, you can consistently use the Row method on the table in this document, as well as in other examples of this table in documents in which the lines don’t overlap.
Config
Example document | Download link |
---|
"yOverlapThreshold":1.0
or leave its value unspecified, then you set "adjacentThreshold"
to 0.6 or higher.
In this situation, "directlyAdjacentThreshold"
and "adjacentThreshold"
have no effect if both their values are less than 0.6. In other words, the following configuration has no effect: