Unstructured Document Classification and Data Extraction APIs
Peladon Unstructured Classification and Data Extraction APIs are designed to handle the most challenging classification and extraction problems with thousands of different document layouts. They eliminate manual sorting of documents prior to scanning and make placement of barcodes or other special identification symbols on document pages unnecessary.
Designed as service APIs, they automatically distribute the incoming images to all available processors, balance the computational load and assign images to classes of similar layouts for data capture.
The system does not require training with presorted images of documents. It is ready to process previously unseen images on their first presentation
The best approach to capturing the unstructured data is based on information about the layout of the document. If the document can be identified as having one of the known automatically pre-stored layouts the data can be extracted in a much easier fashion than with a blind approach whose only available information is the type of data sought for.
If the document identification/classification is completely automated, the data capture armed with the image layout is much more accurate and handles tables, keywords, and data whose format is known.
Data Capture
• Tabular or single fields, line item data
• Assisted by page layout
• Flexible and powerful
• Designed for maximum efficiency
• Designed for the ease of application supplied custom validation
processes
• Available as an API and as a fully integrated system
Document Classification
• Completely automated process.
• Insensitivity to most noise present in document images.
• No appreciable deterioration of speed with increasing number of
templates.
• Handling of portrait and landscape orientated documents of arbitrary
sizes
• Effortless scalability from a few thousands documents a month to
millions of documents a day.
• Simple and intuitive modular structure for ease of integration in
large applications
• User-settable parameters for page separation

