- oversplit lines
- lines overlapping on the x-axis
- “jittery” lines misaligned on the y-axis
Examples
Notes
Parameters
key | value | description |
---|---|---|
type (required) | mergeLines | merges lines distributed along a horizontal axis. |
directlyAdjacentThreshold (required) | number >= 0.16 | Usually, it’s recommended to leave the default for this parameter (0.16).Sensible uses the default setting for this parameter to transform separate tokens output from Google OCR into lines.This parameter specifies the fraction of line height under which to merge two adjacent lines distributed along an x-axis without a space. For example, at 0.16, this preprocessor merges two lines separated by a small gap whose width is less than 16% of the line height. Choosing a larger number merges more aggressively. |
adjacentThreshold (required) | number >= 0.6 | Corrects oversplit lines. Specifies the fraction of line height under which to merge two adjacent lines distributed along an x-axis with a space. The built-in merger uses 0.6, so choosing a larger number merges more aggressively. For an example, see the Examples section. |
yOverlapThreshold | number between 0 and 1.0. default: 1.0 | Merges lines that aren’t perfectly aligned at the same height on the page. Specifies the y overlap above which the Merge Lines preprocessor merges two adjacent lines. Y overlap is the section of the joint y-axis range of two lines that’s occupied by both lines. For example, if two lines share the same minimum and maximum y-axis values, their overlap is 1. If one line’s extent is from 0 to 10 and the other line’s extent is from 2 to 12 on the y-axis, their overlap is .667 (8 / 12). For an example, see the Examples section. |
minXGapThreshold | number in inches | Configure this parameter if two lines overlap on an x-axis. The default behavior is to merge these overlapping lines into one line. To split them instead, set a cap on the amount of allowable overlap. For example:0 - splits lines if their line boundaries are touching but not overlapping.0.1 - splits lines if their boundaries overlap a little, up to 0.1 inches.2.0 - splits lines even when they overlap a lot, up to 2.0 inches.For an example, see the Examples section. |
Examples
Handwriting OCR
Use the Merge Lines preprocessor to clean up OCRed handwriting text. This preprocessor is useful for Google OCR, which by default groups text into words rather than lines. PROBLEM Without a Merge Line preprocessor, the placeholder handwritten data in an example document is oversplit by Google OCR:
Name (First, Middle, Last, Suffix, Trust or Custodian)
isn’t one line, but is instead split on words.
SOLUTION
CONFIG
JSON

Example document | Download link |
---|

JSON
- set
"adjacentThreshold": 0.1
to see oversplit lines. - set
"adjacentThreshold": 2.0
to see aggressively merged lines. - revert Adjacent Threshold to the original setting, then set
"yOverlapThreshold": 0.2
to observe how lines with misaligned heights (like the email address) merges more aggressively.
Oversplit lines
PROBLEM The following image shows oversplit lines. For example, Sensible splits the phrase “premium driver discount” into three lines even though the human eye perceives it as one phrase:
JSON

JSON
Jittery lines on a y-axis
The following example shows using the Y Overlap parameter to correct vertical misalignment or “jitter” in lines (for example, as the result of a low-quality scan or because of handwriting). ConfigJSON

Example document | Download link |
---|
JSON
Overlapping lines on an x-axis
The following example shows using the Min X Gap Threshold parameter to extract overlapping text in a poorly formatted document. In this example, the built-in behavior without a Min X Gap Threshold is to merge the overlapping lines into one line (Supplementary underinsured/uninsured motorist coverage500,000 USD Combined single limit incl. umbl
).
The Min X Gap Threshold preserves the intended document formatting, which is a two-column table. By preserving this format, you can consistently use the Row method on the table in this document, as well as in other examples of this table in documents in which the lines don’t overlap.
Config
JSON

Example document | Download link |
---|
JSON
Notes
Because the Merge Lines preprocessor evaluates after the built-in line merger, there are limitations to the combinations of parameter values you can set: yOverlapThreshold In general, when you set"yOverlapThreshold":1.0
or leave its value unspecified, then you set "adjacentThreshold"
to 0.6 or higher.
In this situation, "directlyAdjacentThreshold"
and "adjacentThreshold"
have no effect if both their values are less than 0.6. In other words, the following configuration has no effect:
JSON