One of my fundamental beliefs is that software needs to be self adaptive to user behavior – or „learning“ to be able to mimic complex business processes that each of us is able to execute on a daily basis. For a long time we have seen that rules based systems go a certain way and then will be stuck – out of the sheer complexity of the task that has to be solved. Therefore from the beginning we have put our eggs in this basket and consistently added learning capabilities to document recognition software when it made sense. This is the reason some of the products have been called “intelligent” which gives rise to the well known term IDR (Intelligent Document Recognition).
But what kind of intelligence is this? Let’s take a look at document classification. In document classification software is able to categorize documents into several hundred categories (or classes) based on features that have been found on the document. The features have been automatically determined before using a representative sample set for each of the classes. What these features are is difficult to describe without mathematics. Therefore what I have tried in the past was to explain them to my audience by comparing it to the way babies learn to speak.
When a baby learns to speak it does nothing else than – classification. Because segmenting the world into concepts and attaching a word to each concept is classification and is the basic of language (and of thinking as some philosophers would say). At some stage a baby is able to distinguish a car from a tree and will say so: “Car!” – “Tree!” And the parents are proud. But babies do not use rules to accomplish this. And they are very quickly able to distinguish relevant features from irrelevant ones and to assign all varieties of cars to the concept “car” irrespective of color, brand, size and so on. And this is exactly how learn by example software works.
Imagine my surprise when I recently found an article published in Scientific American by psychology professor Alison Gopnik of UC Berkley on “How Babies Think”. See here for the full text. It was like hearing myself speaking and trying to explain a content classifier – but from a very different perspective.
In particular Professor Gopnik writes: “Computer scientists and philosophers have begun to use mathematical ideas about probability to understand the powerful learning abilities of scientists – and children. A whole new approach to developing computer programs for machine learning uses what are called probabilistic models, also known as Bayesian models or Bayes nets.” (Scientific American July 2010, page 80)
A standard classifier uses feature detection and a variation of Bayesian networks to do classification. And it uses training and learning as you would help a baby to learn to speak. The learning classifier is the baby computer. I would say, by its capabilities, it is about five years old – which is in fact the number of years this technology has been on the market.
And there is another nugget in the article:
“A learning strategy has many advantages, but until learning takes place, you are helpless. Evolution solves this problem with a division of labor between babies and adults”.
Same with any classifier. You need to do a certain pretraining before you start. This is done during the project setup phase where the vendor’s and the customer’s experts set up and train the basic classification scheme.
After that the “baby” is ready to start talking and takes over your document processing needs.