"type":"address"
recognizes address such as:
"block_format": false
:
Block | In-line | |
---|---|---|
Newlines optional | yes | yes |
Trailing or leading non-address text allowed in starting or ending address lines | no | yes |
Non-address text allowed between address elements | no | no |
key | value | description |
---|---|---|
id (required) | compose | |
types (required) | array of type objects | Each type in a compose array takes the output of the previous type as its input. |
Example document | Download link |
---|
one million
or 123456789
.
key | value | description |
---|---|---|
id (required) | currency | |
requireCurrencySymbol | boolean. Default: false | Requires a currency symbol preceding the amount. |
currencySymbol | string or object. Default: $ | The text to recognize as a currency symbol, for example ”€” or “EURO”. The text must precede the amount. This parameter sets the Unit parameter in the output.To specify multiple currencies to recognize, use this parameter to specify a lookup table. The table maps source text to the Unit parameter. For example, the following lookup table recognizes currency codes and symbols for dollars and euros, and outputs symbols to the Unit parameter: “currencySymbol”: {"""€”: ”€""USD”: ”","EUR": "€","default": "€"}If the source text doesn't include a currency symbol, Sensible uses the default specified in the lookup table. If the lookup table doesn't include a default, Sensible falls back to the symbol. |
requireThousandsSeparator | boolean. Default: false | Requires a thousands separator in numbers with a thousands place. |
thousandsSeparator | string. Default: , | The separator to require, for example . |
decimalSeparator | string. Default: . | For numbers with a decimal place, specify the separator, for example ,. |
maxDecimalDigits | number. Default: 4 | The maximum number of decimal digits to recognize. |
maxValue | number. Default: infinity | The maximum currency amount to recognize. Use this to extract an amount with a known range. For example, use it as an alternative to the Tiebreaker parameter, or to extract one currency amount among several returned by a method like the Document Range or Box method. |
minValue | number. Default: infinity | The minimum currency amount to recognize. Use this to extract an amount with a known range. |
relaxedWithCents | Boolean. default: false | Use this parameter when poor-quality scans or photographed documents result in erroneous OCR output for the decimal separator or thousands separator. If true, Sensible overrides all other Currency type parameters, outputs USD currency, and recognizes the following number format as a currency: - any number of digits mixed with <fuzzySeparator> characters, followed by - one <fuzzySeparator> character, followed by - two digits (for the cents)where a <fuzzySeparator> character is any of the following common erroneous OCR outputs for a period or comma: .,;: _ (period, comma, semicolon, colon, space, underscore)For example, if you set this parameter to true, then for the erroneous OCR output “7.859:36”, Sensible returns: {“source”: “7.859:36”,“type”: “currency”,“unit”: ”$”,“value”: 7859.36} |
accountingNegative | default, anyParentheses, bothParentheses, suffixNegativeSign Default: null | Replaces the deprecated Accounting Currency type. Specifies to recognize accounting sign conventions for negative numbers.null Sensible recognizes negative numbers as described in the preceding formats recognized section.bothParentheses - Sensible assigns a negative value to a number prefixed and suffixed by parentheses.anyParentheses - Sensible assigns a negative value to a number that includes any parentheses as a suffix or prefix. Use this option to handle OCR errors, where an opening or closing parenthesis can be incorrectly recognized as other characters.suffixNegativeSign - Sensible assigns a negative value to number suffixed by a negative sign.default Replaces the behavior of the Accounting Currency type for backward compatibility. The equivalent of bothParentheses and suffixNegativeSign. |
alwaysNegative | boolean | If true, Sensible assigns a negative value to a number and ignores sign symbols in the document. For example, use this to capture values in the debit column of an accounting document, where negative signs are omitted. |
removeSpaces | boolean | Removes whitespace in a line for better currency recognition. For example, changes the line 12.45. |
roundTo | number of decimal places to round up to. | Rounds up to the specified decimal place. For example if you specify “roundTo”: 3 then Sensible rounds 0.1234 to 0.123. If you specify “roundTo”: 2 and “decimalSeparator”: ”,” then Sensible rounds 5,249 to 5,25. |
key | value | description |
---|---|---|
id (required) | custom | |
pattern (required) | Valid JS regex | Javascript-flavored regular expression. Returns the first capturing group.Double escape special characters since the regex is in a JSON object. For example, \\s, not \s , to represent a whitespace character.Sensible doesn’t validate regular expressions for custom types. |
flags | JS-flavored regex flags. | Flags to apply to the regex. for example: “i” for case-insensitive. |
matchMultipleLines | Boolean. default: false | If true, matches regular expressions that span multiple lines. To enable this behavior, Sensible joins the lines returned by the method using whitespaces as the separators, and runs the regular expression on the joined text. ^ matches the start of the first line returned by the method, and matches the end of the last line. For example, ^\[0-9 \]+ matches all the joined text returned by the method, if all the characters are digits or whitespaces. |
format | example | example output |
---|---|---|
”%b-%d[a-z]-%y$“ | JAN-31st-22, February-3rd-21 | ”value”: “2022-01-31T00:00:00.000Z" |
"%y%M%D” | 800325 | ”value”: “1980-03-25T00:00:00.000Z”, |
“%b\\\\%d\\\\%Y” | JAN\31\2022 | ”value”: “2022-01-31T00:00:00.000Z" |
"%b\\s*?%Y” | jan 2022 | ”value”: “2022-01-01T00:00:00.000Z” |
key | value | description |
---|---|---|
id (required) | date | Returns datetime. Sensible outputs the time as midnight UTC. |
format | JS regex or array of JS regexs | Custom date formats override the defaults listed in the simple syntax section.See the following table for a list of the field descriptors. The field descriptors are concise syntax for regular expressions. You can use Javascript-flavored regular expressions (“regex”) with these field descriptors to define custom date formats. Double escape special characters since the regex is in a JSON object (for example, \\s, not \s , to represent a whitespace character). |
field descriptor | regex | notes | example |
---|---|---|---|
%b | for each month, case-insensitive pattern like january OR jan\.? | Abbreviated month name, with or without periods, or full month name. | Jan, Feb, …, Dec.January, February, …, December |
%y | [0-9] | Two-digit year.Values in the range 69–99 refer to years in the twentieth century (1969–1999); values in the range 00–68 refer to years in the twenty-first century (2000–2068).Tips: If you want to recognize two-digit years and exclude four-digit years, add an end-of-line regex special character ” so that you don’t incorrectly match dates like 02/03/1998 as 2019-02-03T00:00:00.000Z. If you want to match both two- and four-digit years, you don’t need the $ character. Instead you need to specify the four-digit format first, for example, [“%b-%d-%Y”,“%b-%d-%y”]. | 00, 01, …, 99 |
%Y | [0-9] | Four-digit year (year with century as a decimal number). | 2013, 2019 etc. |
%m | [0-9]{1,2} | The month number, unpadded or zero-padded. | 1,…,1201,…,12 |
%M | [0-9] | Two-digit (“zero-padded”) month number (01-12). | 01,…,12 |
%d | [0-9]{1,2} | The day number, unpadded or zero-padded | 1,…,3101,…,31 |
%D | [0-9] | Two-digit (“zero-padded”) day number (01-31). | 01,…,31 |
key | value | description |
---|---|---|
id (required) | name | |
capitalization | allCaps, firstLetter. Default: no change to source capitalization | Formats the output in all uppercase, or with the first letter of each word capitalized. |
3.061.534,45
. Configure the Currency type instead.
key | value | description |
---|---|---|
id (required) | number | |
roundTo | number of decimal places to round up to. | Rounds up to the specified decimal place. For example if you specify “roundTo”: 3 then Sensible rounds 0.1234 to 0.123. |
"annotateSuperscriptAndSubscript": true
, Sensible formats the footnote symbols to indicate they’re superscripted, for example, [^1]
:
key | value | description |
---|---|---|
id (required) | paragraph | |
annotateSuperscriptAndSubscript | Boolean. default: false | When true: - Sensible annotates subscript and superscript text with [^…] and [_…], respectively. - Sensible annotates end-of-page breaks with [EOP]. |
allNewlines | Boolean. default: false | When true, Sensible inserts a newline (\n) in the output for every line break in the document text, and two newlines (\n\n), for every paragraph break.When false, Sensible inserts a newline for every paragraph break. |
paragraphBreakThreshold | default: 0.4 | By default, Sensible detects paragraph breaks when the vertical gap between two lines is larger than 40% of the font height of the output line. Use this parameter to change the percentage. |
()
or with the minus sign (-
).
Recognizes digits in USA decimal notation (for example, 1,500.06):