We all know that our language is fluid and words can change their meaning over time. Words get extinct and new words are created but more often existing words are adapted to new circumstances. It is interesting to see how this happens in the course of years but...
On the Benefits of Page Classification
Classification deals with the categorization of objects. In our process automation and digitization world, we often think of the objects as complete documents that need to be classified. Of course, it is important to understand what the type of a document is...
Document Separation Revisited
One of the frequently overlooked and really difficult problems in document automation, which is also really annoying in daily processing, is the automatic separation of a stack of documents into single meaningful documents and assignment to a document class. The goal would be to simply scan the whole stack and have it separated by an intelligent algorithm. Fortunately this is readily available today from the Skilja technology stack as a built in feature into the Laera classifier. This does not say it is easy. It requires quite some experience and infrastructure to manage several interdependent steps of classification and separation in a stable and reliable way. This is what Laera provides out of the box.
Confusion Matrix
Understanding the quality of an automatic classification system is crucial for its acceptance and any attempt to improve it over time. Quality means that we need to look at errors and at the recognition rate. In classification terms these values are...
Process as a Service
Imagine that you have created a powerful process for superb document automation using all kind of advanced recognition, image processing and AI technologies available. With these technologies it is possible to automate almost any document driven process that involves...
Reading Medical Reports
Medical Reports are complex documents that are written by doctors who use their specific language and style to express not only facts but also hypotheses and suggestions. They are intended to be read by other doctors or experts who have a deep knowledge of the subject...
The Magic of Online-Learning
Wouldn’t it be nice if your AI enabled document processing system would continuously take the input from user interactions and use this information to improve the quality of recognition over time? And nobody would have to take care of this – even in the case of...
A New Approach to OCR Quality
The approach to improve OCR on a given document is very similar to human capabilities of adapting their cognitive capabilities to a specific sample. Just imagine that you see a document with very difficult handwriting. In the begining you will be able to distinguish...
Auto Classification and Bias
Personal bias and individual opinions are a big issue in standardized business processing if they happen to influence the outcome of a process and the decisions made. Nobody wants to be subject to random changes in the outcome of a personal request – and yet it...
What is a good classifier? (4/4)
This is the fourth and final post on the characteristics of a good content based classifier. In previous posts we have focused on presentation of statistical results and comparison of the Skilja Classifier to a plain vanilla naïve Bayes classifier in Recall-Precision...
Classification and Context
This is a situation that probably sounds familiar to you: You meet a person and you are sure that you know him/her well and that you have already seen him or her many times – but you cannot remember who it is. More precisely – you cannot put the person into a context....
What is a good classifier? (3/4)
In recent articles about classifier quality we have focused on the overall statistical results. For this we have used either the precision-recall graph or the inverted precision graph. While these are very good tools to predict the overall quality of a classification...
What is a good classifier? (2/4)
In our small series about classification quality we have used the precision-recall graph to show the difference between a very good and a so-so classifier in a recent post that you can find here. This graphical representation is very common and easy to understand....
OCR on Historical Documents
Skilja is proud to announce that we have received a grant from the European Union supporting a research and development project to improve OCR on historical documents. The grant is provided through the Eurostars program of the European Union. This program supports...
Visual Classifiers From Random Images
Now this is an interesting experiment that leads us very close to the touch point between machine classification and human imaginations. As described in previous posts, auto-classification algorithms are using features that are extracted from the objects to be...
What is a good classifier? (1/4)
Auto-Classification is able to assign categories and hence meaning to documents with an unprecedented speed and quality. The technology for auto-classification has been developed over the last 15 years – from the first tentative rule based systems to elaborate...
Read Faster with Text Streaming
An interesting new approach to human document understanding is presented by the Boston based startup company „Spritz“. They believe that human understanding of text (i.e. reading) is slowed down by the eye-movement on the text. Therefore they have developed a new...
Practical Semantics
As you have seen in previous posts I was pointing out the difficulty for software to really understand English language mainly lies in the ambiguity of words and their related meanings. The technology to resolve this difficulty is called semantic analysis. I recently...
Visiting AIIM 2013
AIIM is the community that provides education, research, and best practices on information management and collaboration. This year the AIIM community met in New Orleans, the city of music and French Creole architecture. New Orleans is also called the “Big...
Why Semantics is Important for Classification
Many automatic classification systems out there today use a pure bag of words approach for finding relevant features that determine the meaning of a document. Few are using correlation and collocation – to account for the fact that words have a different meaning based...