Peladon Logo
HomeCompanySolutionsProductsSupportPartnersCustomersNews & Events

 

Unstructured Document Classification and Data Extraction APIs  

Peladon Unstructured Classification and Data Extraction APIs are designed to handle the most challenging classification and extraction problems with thousands of different document layouts. They eliminate manual sorting of documents prior to scanning and make placement of barcodes or other special identification symbols on document pages unnecessary.

Designed as service APIs, they automatically distribute the incoming images to all available processors, balance the computational load and assign images to classes of similar layouts for data capture.

The system does not require training with presorted images of documents. It is ready to process previously unseen images on their first presentation

The best approach to capturing the unstructured data is based on information about the layout of the document. If the document can be identified as having one of the known automatically pre-stored layouts the data can be extracted in a much easier fashion than with a blind approach whose only available information is the type of data sought for.

If the document identification/classification is completely automated, the data capture armed with the image layout is much more accurate and handles tables, keywords, and data whose format is known.

DocXP Cata Capture

Data Capture

• Tabular or single fields, line item data
• Assisted by page layout
• Flexible and powerful
• Designed for maximum efficiency
• Designed for the ease of application supplied custom validation processes
• Available as an API and as a fully integrated system

Document Classification

• Completely automated process.
• Insensitivity to most noise present in document images.
• No appreciable deterioration of speed with increasing number of templates.
• Handling of portrait and landscape orientated documents of arbitrary sizes
• Effortless scalability from a few thousands documents a month to millions of documents a day.
• Simple and intuitive modular structure for ease of integration in large applications
• User-settable parameters for page separation