Classifying documents by type
You can classify a document by its similarity to each document type you define in your Sensible account. For example, if you define a bank statements type and a tax_forms type in your account, you can classify 1040 forms, 1099 forms, Bank of America statements, Chase statements, and other documents, into those two types. In this scenario, for a 2023-1-1_bankofamerica_statement_jon_doe.pdf
document, Sensible:
- Classifies this document into the
bank_statements
document type. - Classifies the statement doc by its similarity to reference documents in the
bank_statements
document type. The highest score is for a Bank of America sample statement. - Provides metadata for the classification, including similarity scores for this document compared to each document type in your Sensible account and to each reference document in the
bank_statements
type.
Use document classification:
- In an extraction workflow. For example, determine which documents to extract prior to calling a Sensible extraction endpoint.
- Outside an extraction workflow. For example, determine where to route each document or to label each document in a system of record.
To improve classification results, Sensible recommends that a document type includes a sample set of reference documents that represent the diversity you expect to see in the document type. To use a document type for classification, Sensible requires that the type contains at least one reference document.
To classify documents, use the Sensible API or SDKs.