This is a guest post from Süleyman Arayan, Founder & CEO of German ITyX Group
Recently Science has made an enormous progress in the field of processing naturally linguistic texts. Thus a new generation of artificial intelligence achieves recognition rates in clearly defined thematic areas (domains), that were previously impossible to reach by simply using traditional keyword-based methodologies. This “New Artificial Intelligence” (NAI) is a highly efficient development based on a modern form of simulation. It is capable of learning and adapts the behavior of human experts in assessing and processing data in documents or files. If imitated enough, results from the automation of recurring, similar transactions turn out to be an exploitable advantage in the efficient of document understanding.
By “training” within a suitable content environment NAI dynamically develops digital structures that are operating similar to a human brain. Thus NAI does not “understand” written contents (this would presuppose a human consciousness) but sticks to imitation only. The potential innovation of NAI lies in the fact that traditional AI-based role models could only be used if contents or processing where understood and causally describable. However using NAI a captured document content can be imitated while still beeing misunderstood.
Bayesian filtering, K-Nearest-Neighbor algorythm (KNN), Support Vector Machine (SVM) – in all these algorithms is a major problem: the representation of text based content as vectors. Although semantically representative, these vectors will lose the sequence of words for the original text. Instead, the mere presence, or the number of single words are counted and recorded. This approach must be be insufficient in several cases, even for the best methods, since semantics and predicate of the text gets lost.
The right combination
Due to research projects along with leading German universities and research institutes since 1999 ITyX has worked with all known methods of automatic document and data processing of various structures and channels. Some new methods have been developed that incorporate the decision of human staff dynamically into the classification decisions. A basic finding: each described methodological approach has its “domain” and thus its “raison d’etre”.
A complete „domain-independent“ content categorization can only be achieved by a blending of results and a calculation of the true categories. In particular, the methods and their typical weaknesses (failures) are needed for a correction. Simple statistical approaches fail, as they multiply the errors in single domains. In this way the overall performance of a DMR digital mailroom system must decrease. The future of automatic text evaluation is scientific classification – driven by a combination with self-adapting procedures. By the end human interception is a must to drive NAI to it´s best.
Since 1996 Süleyman Arayan, Founder & CEO of German ITyX Group, has been actively developing linguistic software solutions for Mailroom automation, Knowledge and Response Management. ITyX is a leading provider for intelligent ECM solutions based on auto adaptive AI-features. http://www.ityx.co.uk/