Classification deals with the categorization of objects. In our process automation and digitization world, we often think of the objects as complete documents that need to be classified. Of course, it is important to understand what the type of a document is...
Document Separation Revisited
One of the frequently overlooked and really difficult problems in document automation, which is also really annoying in daily processing, is the automatic separation of a stack of documents into single meaningful documents and assignment to a document class. The goal would be to simply scan the whole stack and have it separated by an intelligent algorithm. Fortunately this is readily available today from the Skilja technology stack as a built in feature into the Laera classifier. This does not say it is easy. It requires quite some experience and infrastructure to manage several interdependent steps of classification and separation in a stable and reliable way. This is what Laera provides out of the box.
Confusion Matrix
Understanding the quality of an automatic classification system is crucial for its acceptance and any attempt to improve it over time. Quality means that we need to look at errors and at the recognition rate. In classification terms these values are...
The Magic of Online-Learning
Wouldn’t it be nice if your AI enabled document processing system would continuously take the input from user interactions and use this information to improve the quality of recognition over time? And nobody would have to take care of this – even in the case of...
Auto Classification and Bias
Personal bias and individual opinions are a big issue in standardized business processing if they happen to influence the outcome of a process and the decisions made. Nobody wants to be subject to random changes in the outcome of a personal request – and yet it...
What is a good classifier? (4/4)
This is the fourth and final post on the characteristics of a good content based classifier. In previous posts we have focused on presentation of statistical results and comparison of the Skilja Classifier to a plain vanilla naïve Bayes classifier in Recall-Precision...
Classification and Context
This is a situation that probably sounds familiar to you: You meet a person and you are sure that you know him/her well and that you have already seen him or her many times – but you cannot remember who it is. More precisely – you cannot put the person into a context....
What is a good classifier? (3/4)
In recent articles about classifier quality we have focused on the overall statistical results. For this we have used either the precision-recall graph or the inverted precision graph. While these are very good tools to predict the overall quality of a classification...
Auto-Classification Technologies and RFID Smart Docs
Editor’s note: This is a guest post from Cláudio Chaves from TCG Brazil Recent advances in the auto-classification technologies – as described in this blog – have provided a substantial manual labor reduction for several companies related to physical...
What is a good classifier? (2/4)
In our small series about classification quality we have used the precision-recall graph to show the difference between a very good and a so-so classifier in a recent post that you can find here. This graphical representation is very common and easy to understand....
What is a good classifier? (1/4)
Auto-Classification is able to assign categories and hence meaning to documents with an unprecedented speed and quality. The technology for auto-classification has been developed over the last 15 years – from the first tentative rule based systems to elaborate...
Why Semantics is Important for Classification
Many automatic classification systems out there today use a pure bag of words approach for finding relevant features that determine the meaning of a document. Few are using correlation and collocation – to account for the fact that words have a different meaning based...
Generic and specific classification
In principle it is possible to train enough representative samples to create a classification scheme that is totally generic for a specific purpose. This is what humans do all the time. Reading a text and correctly classifying it manually into a given...
Classification methods
Classification tries to mimic human understanding. Several methods have been developed in the past to achieve what we as humans can do almost effortless. These methods can be divided into two groups. Rule based classification Rule based systems are...
Taxonomy and Hierarchies
Classification can be defined as modeling real objects via a simplified mathematical representation consisting of a set of characteristic features. The goal of such a description is to collect objects, which are quite similar into one group. A big benefit of the...
10 rules for creating a successful mailroom classification project
Automatic, context based classification for mailrooms has proven to generate significant ROI and acceleration of processes in the last few years. But we have also seen failures and disappointments. I have managed and monitored many of these projects in the past and...
Measuring Classification Quality
For an active production system, but also when the classification scheme is set up, it is very important to measure the quality of the classification. The goal is to create as few as possible errors in classification (also called false positives) as these can severely...
Classification of – Chairs
If you have followed my presentations in the past you know that document classification closely corresponds to concept creation of the human mind. The concepts represent classes of real life objects. We are able to recognize concepts and group objects and group them...
Document Understanding Primer
For a long time document understanding has been a research topic in computer sciences. We have seen conferences discussing concepts and approaches to use computers and machine learning for understanding documents. Quite often this topic appears also in proceedings on...