Archive
Generic and specific classification
In principle it is possible to train enough representative samples to create a classification scheme that is totally generic for a specific purpose. This is what humans do all the time. Reading a text and correctly classifying it manually into a given...
What is the world?
Editor’s note: This is a guest post from Prof. Dr. Jürgen Lenerz from University of Cologne When asked what we talk about, we would possibly say that we talk about “the world”. But what is the world? Perhaps we would say that the world somehow exists outside...
Classification methods
Classification tries to mimic human understanding. Several methods have been developed in the past to achieve what we as humans can do almost effortless. These methods can be divided into two groups. Rule based classification Rule based systems are...
Skilja New Offices
Internal information/personal note We are pleased to announce that Skilja has opened a subsidiary in Croatia. The office is located in the beautiful city of Novigrad/Cittanova in Istria which is in the North of Croatia and close to the Slovenian/Italian border. This...
New Artificial Intelligence – NAI
Editor’s note: This is a guest post from Süleyman Arayan, Founder & CEO of German ITyX Group Recently Science has made an enormous progress in the field of processing naturally linguistic texts. Thus a new generation of artificial intelligence achieves recognition...
Taxonomy and Hierarchies
Classification can be defined as modeling real objects via a simplified mathematical representation consisting of a set of characteristic features. The goal of such a description is to collect objects, which are quite similar into one group. A big benefit of the...
What do we talk about?
Editor’s note: This is a guest post from Prof. Dr. Jürgen Lenerz from University of Cologne We, i.e. adult and sound human beings, are able to talk about everything we want to talk about. But what do we want to talk about? We want to talk about our thoughts...
Visiting Text Analytics Summit
Text Analytics is a part of the discipline that we call Document Understanding on this site. Originally focused on text mining, text analytics moves more and more to real time analysis of documents to create actionable insight. This year’s conference in Boston was the...
Semantic Technologies for Document Understanding
Document understanding is about understanding text and determining its meaning and entities therein. For unstructured text the meaning of entities or even simple words very often is not clearly devisable and can be very ambiguous. Take a simple number in a text – this...
The Paper Challenge
In one of my favorite movies “Brazil” by Terry Gilliam a man gets suffocated by paper. “Brazil” is a quite bizarre adaptation of the novel “1984” by George Orwell with a lot of ideas how not only society but also technology could evolve much differently than we all...
Visiting Enterprise Search Summit
Enterprise Search Summit (ESS2012) is a conference of business professionals in the field of professional search applications that takes place in May in New York. It was obvious from this year’s agenda that search technology need to go far beyond simply indexing...
Visiting Docville May 2012
I’m just back from the international Docville meeting in Brussels. Docville is a community of professionals in the ECM and capture industry, organized and facilitated by Michael Ziegler which already has more than 800 members on LinkedIn....
10 rules for creating a successful mailroom classification project
Automatic, context based classification for mailrooms has proven to generate significant ROI and acceleration of processes in the last few years. But we have also seen failures and disappointments. I have managed and monitored many of these projects in the past and...
Visiting Social Media Analytics Summit
I am just back from an interesting conference in San Francisco on social media analytics. It is a rather small conference; however the market it deals with is growing rapidly. You might ask what social media analytics has to do with document understanding...
Measuring Classification Quality
For an active production system, but also when the classification scheme is set up, it is very important to measure the quality of the classification. The goal is to create as few as possible errors in classification (also called false positives) as these can severely...
Faults and Tolerance
Humans have a remarkable capability to compensate for noisy signal and incomplete information. We are able to distinguish and recognize relevant information even when the signal to noise ratio is extremely low. Missing data is reconstructed from context knowledge or...
Visiting AIIM 2012
AIIM is the community that provides education, research, and best practices on information management and collaboration. This year the AIIM community met in San Francisco. For the first time since 10 years it was held in a conference format with general...
Structured and Unstructured – what is this?
When you have been involved in plans or projects for automated document processing you have for sure been exposed to the distinction between structured and unstructured information. And you might have gathered an understanding what this means. But what does it really...
Classification of – Chairs
If you have followed my presentations in the past you know that document classification closely corresponds to concept creation of the human mind. The concepts represent classes of real life objects. We are able to recognize concepts and group objects and group them...
Document Understanding Primer
For a long time document understanding has been a research topic in computer sciences. We have seen conferences discussing concepts and approaches to use computers and machine learning for understanding documents. Quite often this topic appears also in proceedings on...