Gastbeitrag | Skilja

IDP: Solving bot illiteracy in the digital workforce – Part 2

Guest — Wed, 21 Oct 2020 10:12:00 +0000

Editor’s note: This is a guest post from Jupp Stöpetie

In this post we examine the role of Intelligent Document Processing (IDP) relative to Robotic Process Automation (RPA) and how these technologies drive Digital Transformation when combined. Part one was looking at what is driving and enabling Digital Transformation. This part two is dedicated to RPA and why IDP is essential.

Robotic Process Automation

Since maybe 3 or 4 years RPA has become the tool of choice for automating repetitive tasks for many companies. RPA vendors claim that their systems are easy to set up and maintain without the need of coding. Vendors propagate that RPA systems can be operated by business managers with some light-weight training. Basically every manager can create bots that replace human workers. This has led to a wide spread of RPA systems in businesses. There is no need anymore for complicated, lengthy and costly automation projects managed by IT departments. RPA puts automation and the use of AI in the hands of business managers. Today the workforce in companies is increasingly a combination of humans and robots. But of course things are more complicated when you look a bit deeper than the RPA marketing collaterals. What is for example when we need to retrieve information from a document? That is no problem for human workers. But bots have no humanlike reading skills. They only can “read” data from structured data like files and databases. That is because it is easy to instruct bots with the exact location where to find the data.

Why bots can’t read documents

A document can be seen as a container for content. The content is made up of static data and explicit and hidden information describing the relationships between the data which gives meaning to the document. A document also has meta-data being the properties of the container itself. Content is represented in documents in a way so that humans can process the content. Note that this has an important implication. Most documents were never designed to be read by bots. When data must be extracted it doesn’t matter for human workers if a document has a fixed structure like in a form, a semi-structure like in an invoice or no structure like in a contract. With some proper instructions human workers will be able to find the data they are looking for. Depending on how much structure there is, processing time may vary significantly of course. Bots however have no cognitive reading skills. And adding OCR and data capture technology to an RPA solution is often not enough to make bots really skilled at processing documents.

Photo by Arlington Research on Unsplash

Why OCR and Data Capture often do not offer the right reading skills for bots

The short answer: these technologies fall short because they were not developed for RPA. OCR was not designed to understand content. OCR is a technology for converting pixels into characters. Most OCR packages can also convert document images (scans) into text files while recreating the original layout. Data Capture systems use OCR technology and also many other AI technologies. Data capture systems were designed for extracting data from large volumes of documents with the highest possible accuracy. Neither OCR nor data capture were designed for RPA users to teach their bots how to read stuff.

The users who usually set up data capture installations are engineers who know exactly how to use all the levers and parameters of these systems. They often will create scripts or even add self coded additions in order to achieve the highest possible accuracy. The initial set up is high but that makes a lot of sense. The ROI of data capture installations doesn’t have to be fast. Almost always these installations are set up to run for many years.

Batch oriented data capture systems, although very powerful when setup correctly, logically are not the first choice of RPA users when they need to add document processing capabilities to their bots. These users are looking for simple, easy, fast and flexible functionality. The volumes they need to process are small. They also have higher needs for these systems to learn while doing because not the whole spectrum of variability in the documents that need to be processed will be available at the start of the project. And often they also need a higher level of intelligence because they want to automate tasks that were formerly performed by humans. And when they design new processes RPA users want to create intelligent bots that behave just like humans. But what RPA users cannot handle and what would break the RPA paradigm is if adding reading skills to bots comes at the expense of requiring a lot of investment and special technical skills like solution design, coding, production testing etc.

Note that in cases where RPA systems are used for processing large volumes of documents it makes sense to use data capture systems to extract data from these documents. Because in these cases efficiency aka accuracy most likely will play a significant role. Processing large volumes of documents is data capture’s sweet spot.

What RPA users really need when they have to process documents is something that may look a bit like OCR and data capture but is much smarter than that because it is operating a much broader set of AI technologies and at the same time it also should be easier to use.

IDP: a new product category for a new market

The massive pervasion of RPA installations in companies that happened during the past 3 to 4 years has led to an increasingly high demand for these easy, simple, flexible but yet powerful intelligent data capture solutions. This fast growing demand has spawned a new generation of companies who has gone down a different path using different technologies than what the incumbents in the data capture market have been doing for more than 20 years. These new systems are all based on the idea that Deep Neural Networks and other forms of Machine Learning are better and much easier ways to fulfill the needs of RPA users. All you need is a lot of samples, train your neural networks and off you go. What however is unclear at this stage is if ML can actually deliver the accuracy that is needed when bots should have humanlike reading skills executing mission critical tasks. When deep neural networks make mistakes you cannot go in and correct for these mistakes. These systems are black boxes. It is noteworthy that all incumbents are updating their existing offerings with adding ML technologies. Especially with the goal to become better at processing unstructured documents. And it seems not such a ridiculous assumption that these companies who have many years of experience developing document processing systems have an advantage over the competition that is fully ML focused. It looks quite plausible that incumbents are better set up to marry old AI and new AI technologies based on their solid understanding of how to build robust document processing systems.

All these developments of old and new companies to develop document processing skills for RPA bots have led to a new category of products that cater to the digital transformation market and not so much to the traditional capture market. The emergence of these new products was the reason for the Everest Group, a leading management consulting and research firm to come up with a new product category: Intelligent Document Processing or IDP.

Everest Group defines IDP as any software product or solution that captures data from documents (e.g., email, text, pdf, and scanned documents), categorizes, and extracts relevant data for further processing using AI technologies such as computer vision, OCR, Natural Language Processing (NLP), and machine/deep learning. These solutions are typically non-invasive and can be integrated with internal applications, systems, and other automation platforms.

Photo by Markus Spiske on Unsplash

About OCR and Online-learning

In many blogs about IDP authors create a contradiction between OCR being the old fashioned unintelligent way of data extraction and modern AI based extraction methods that are state of the art and intelligent. First of all as I pointed out OCR is not data extraction. OCR is one of many AI technologies that are used in data capture systems. And there are different ways we can build intelligence into systems that will help to find and interpret data in documents. Machine learning may perform better on unstructured documents. When dealing with forms and semi-structured documents systems that use templates, classifiers and other AI technologies including machine learning will almost always outperform systems that are solely based on machine learning. Note that these comprehensive systems operating ML in a smart way will learn from automated feedback through users correcting mistakes and from the results of successful and correct classification and extraction to generate additional knowledge (expanding the space) and statistics of usage of existing knowledge. See The Magic of Online-Learning.

What is the ideal IDP solution?

The challenge for intelligent document processing is that there seems to be no one ideal approach. Depending on the type of documents, the volumes that have to be processed and the importance of accuracy ML or more traditional AI approaches or a blend thereof will be the best choice. For example when document volumes are big accuracy is important. When a shared service centre for example is processing 50 million documents a year improving accuracy from say 95% to 96% is significant. We just reduced the number of documents that have to be corrected by 500k. Another case where accuracy is critical is in straight-through processing.

It seems that the best option customers have is to adopt an IDP platform that enables them to operate different solutions or combinations of such solutions while shielding users from the complexity of these solutions. Vinna is such a platform that even allows to add crowdsourcing (Human In The Loop Verification) without users ever being aware of the complexity of what goes on under the hood.

———————————————————————————————

Jupp Stöpetie as CEO of ABBYY Europe established ABBYY’s presence in the Western European markets, growing the brand and market presence to a leadership status for +25 years. His experience includes founding and growing companies and managing all levels of business operations, sales, and marketing. Jupp left ABBYY in spring 2020 and now works as an independent consultant based in Munich, Germany.

IDP: Solving bot illiteracy in the digital workforce – Part 1

Guest — Wed, 15 Jul 2020 10:11:47 +0000

Editor’s note: This is a guest post from Jupp Stöpetie

In this post we examine the role of Intelligent Document Processing (IDP) relative to Robotic Process Automation (RPA) and how these technologies drive Digital Transformation when combined. In part one we look at what is driving and enabling Digital Transformation. Part two then is dedicated to RPA and why IDP is essential.

Digital Transformation

Companies all over the world are redesigning and digitizing their businesses at an ever faster pace and they have many reasons for doing so:

They want to serve their customers faster and better.
They understand that automation and using AI technologies improves innovation, agility, scalability and cost-efficiency
They appreciate that in contrast to human labor-intensive processes digital processes
- take less time to design
- need less capital investment
- are faster to deploy
- and (much) less costly to run.

The above is commonly referred to as Digital Transformation. Citing Salesforce’s definition: “Digital Transformation (DX) is the process of using digital technologies to create new — or modify existing — business processes, culture, and customer experiences to meet changing business and market requirements.“

The main drivers of Digital transformation are the ongoing globalisation and the Fourth Industrial Revolution which has led to dramatically increased levels of competition. DX greatly improves the agility of businesses so they can adapt much faster than ever before to changes in the market. Changing or even completely redesigning digital processes is a lot easier and comes at much lower cost which results in significantly increasing a business’ competitiveness. Note that even terminating digitised processes comes at a much lower cost than when a lot of capital investment and labor was involved. Basically businesses have no choice. They must become digital. And those who are slow to change find themselves in an increasingly disadvantageous position. New businesses nowadays will always start with a digital concept in mind and by doing so will avoid manual processes if that is possible and makes sense.

Photo by Mike Kononov on Unsplash

Market Data on Digital Transformation

What actually makes things very different from say ten years ago is the enormous progress. Worldwide spending on technologies and services that enable digital transformation (DX) by business practices, products, and organizations is forecast to reach $2.3 trillion in 2023, according to a new update to the International Data Corporation (IDC) Worldwide Semiannual Digital Transformation Spending Guide. DX spending is expected to steadily expand throughout the 2019-2023 forecast period, achieving a five-year compound annual growth rate of 17.1%. „We are approaching an important milestone in DX investment with our forecast showing the DX share of total worldwide technology investment hitting 53% in 2023,“ said Craig Simpson, research manager with IDC’s Customer Insights and Analysis Group.

In their July report Grand View Research reported that the global Robotic Process Automation market size which is a segment of the overall DX market was valued at USD 1.40 billion in 2019 and is projected to exhibit a compound annual growth rate (CAGR) of 40.6% from 2020 to 2027.

Why is Digital Transformation taking place now

In the last decade we have seen a tsunami of digital transformation projects driven by an accelerating desire of companies to increase competitiveness and innovation. But of course one could argue that businesses always had that desire. The question one could ask is: Why is all that happening now? Haven’t companies not been automating for decades already? Yes, but not at the current pace.

Note: at this stage it is unclear how the Covid-19 pandemic will influence market dynamics. It seems however unlikely that the need for companies to digitize their businesses will slow down. On the contrary, it is much more likely that the opposite will happen.

What has made things totally different from say ten years ago and has really enabled Digital Transformation to take place is the enormous progress in both computer science and in the computer industry. Compared with say 10 years ago we see:

an enormous increase in computational power
the rise of super powerful algorithms, algorithms that need incredible amounts of computational power which is now available
vastly improved connectivity both in speed and access points and it is improving at an accelerating pace (5G)
smartphones: in 2009 170 million smartphones were sold. In 2020 1.5 billion units are estimated to be sold. That makes for an estimated 3.5 billion people having a smartphone in 2020
huge and disproportionate – compared to the rest of the economy – amounts of money, that have been invested in tech companies. Successful tech companies have rewarded their investors with multiples that dwarfs any other industry.
and last but not least an insistently growing appetite of consumers and businesses for more, better and faster service anywhere at any time.

Some examples of how robotic process automation and document processing drive Digital Transformation in companies

A large international pharmaceutical company wanted to capture all details from their purchase orders and perform lookups and validation using their ERP. Tasks were automated using a RPA system. An intelligent document processing system was needed to read the data from the POs.

A large financial services company wanted to use RPA to automate their KYC process which involves capturing, verifying driver licences, passports etc. Early in the process they found that they also needed an intelligent document processing system.

A global logistics company wanted to automate their invoice processing (millions of documents). Their RPA system was found to be up to the challenge. But the company initially backed off because of the complexity of extracting data from millions of invoices (semi-structured) with a wide range of varying lay-outs. Only after a proof of concept clarified that there was an intelligent document processing system on the market that was up to the challenge the company proceeded with the project.

In part 2 of this post we will discuss how the combination of RPA with IDP works and what are the challenges that need to be considered.

———————————————————————————————

Give Workflow a Miss – Go for Real Automation

skiljaadmin — Fri, 22 Apr 2016 16:33:45 +0000

Editor’s note: This is a guest post from Richard Cop from Interact Consulting

Give workflows a miss: A provoking claim in the age of business process automation. Because whenever we think of the automated processing, we think in categories of how information gets to the relevant persons fast and efficiently. But especially when it comes to automated processing of vendor invoices, this consideration is the wrong approach, because it is necessary to automate processing as far as possible so that only a small amount of invoices have to be sent on a control and approval journey actually using a workflow.

All beginnings are easy
Whenever we are addressing the subject of automated invoice processing in a first meeting with a customer, our counterpart talks workflow. Traditionally the incoming electronic or scanned image of an invoice is received in accounting. They then determine the responsible person, sends the invoice and waits for the respective invoice release. After its return to accounting, booking, filing and payment are carried out and that is all.

This may be a good approach when you are dealing with very few invoices, maybe five to at most ten units per day respectively less than 2.000 invoices per annum. In addition, dealing with this amount, an existing e-mail system and a bit of organizational aptitude should be sufficient to keep track. However, as soon as the number doubles, triples or even quadruples, such an approach reaches its limits shortly, because all the sudden a staff member of the accounting department will spend most of its working day sending invoices on their electronic journey and controlling the returned invoice approvals.

Therefore, when it comes to invoice quantities of 10.000 units or more, a better alternative has to be considered. One that avoids wait time and grants full overview over all invoices in circulation. So at first glance, implementing a workflow-system seems obvious. A system that offers the possibility to track the invoice distribution and intervene in the process at any time. However, is this sufficient? Or is it rather a missed opportunity?

Invoices are a bonanza
A workflow can be a good thing, but honestly: Does it not seem absurd to you, too, to send an invoice that you receive on a monthly basis to the same person time and again and subsequently use the same old data over and over again for accounting? In this case, would it not be much more feasible if the system carried out such processes on its own?

In order to enable an intelligent processing system to do so, it has to be provided with high-quality data at a very early stage. Therefore, at the beginning of an automation process paper invoices, on the one hand, have to be scanned and recognized or automatically read, the delivered data records of electronic invoices, on the other hand, have to be reformatted into a standard format, so that all further processing steps can be executed uniformly.

Once the invoice data is available in a consistent electronic way, it can be processed automatically. The objective of an optimized business process automation has to be to store knowledge in such a way that data can be processed highly automatically, or in other words, whenever possible without human intervention.

When analyzing your everyday invoices, you will quickly realize that a large number of it is returning. This means that information and knowledge on how to deal with these invoices is present within your organization. This knowledge can be stored in systems or inside the heads of your staff. In case this process is to be automated, it is important to use this information and to prepare it in such a way that the processing system can take highly independently all necessary decisions based on this knowledge.

For example, there is project ABC and it is know that the invoices of supplier XYZ always belong to this project and that they are always booked the same way. At the same time, these invoices are checked and approved by the ever-same person of the ever-same department. The binding source for the invoice checking is the corresponding contract negotiated with supplier XYZ for project ABC.

It does not make any sense to enter this information into the accounting system time and again whenever the invoice comes in. In addition, it is equally useless to e-mail this invoice to the ever-same person hoping she or he will return the approval before the cash discount period has expired.

In this case, we are speaking about an automated workflow where the basic information of a vendor is stored in the processing system and used automatically as accounting and approval template whenever such an invoices comes in. However, processing invoices that way implies that the invoice recognition is carried out consistently, by way of example using the unique vendor master record of the accounting system.

Why do contracts exist?
The cynical answer could be: So that they can be discarded somewhere and forgotten until someday the big search begins. Well, hopefully not, but in most of the companies that we have met so far, there is no or an only very limited structured contract management. Because of this, goods and services are often only vaguely known and simple information such as period or term of notice is not available on short notice. In case contracts are recorded in a system in a structured way, the information contained can be used to automate the processing of all invoices by the respective contracting party. Generally, terms of credit, rates and conditions, installment plans, periods, budgets et cetera are defined in the contract, so that the electronically available invoice data can be matched with the respective contract information.

Generally, if there is an existing leasing or rental agreement it is known when the respective invoices come in, which amounts have to be paid and how they have to be recorded. In case no deviation from the contract is discovered, there is absolutely no need for a check and approval process. Letting these invoices be checked and approved by the ever-same person time and again and to record them in the ever-same way is nothing more than a waste of precious time.

For contracts with variable invoice amounts, similar procedures can be applied. By way of example, checking the often-comprehensive telephone bills is feasible only when there is a peculiarity respectively a deviation such as an extraordinary invoice amount. Therefore, effectively automating the processing of invoices with contract reference requires the use of such procedures in order to process these recurrent invoices fully automatically and to detect deviations so that your specialists can systematically focus on these peculiarities.

An order is already entered in the books
The potential of automating invoices with order reference goes even further. Many organizations operate standardized purchasing systems that manage the orders of goods and services. Generally, requisition, approval, process of quotation and finally placing the purchase order with the supplier are part of the things managed by such systems. In addition, an approval process of the desired order is an essential part of this procedure. Furthermore, all relevant information regarding prices and conditions, accounting, cost centers, projects et cetera is already deposited when placing the order. Therefore, it does not make any sense at all to run through these steps all over again and to manually check and record all information again by the time the invoice arrives.

Using automated order matching, the processing system checks whether there is a deviation between invoice and order. In case no deviation is detected, there is no reason to manually check the respective invoice and run through an approval process all over again. This a complete waste of precious time, too.

The workflow spiral
And yet it moves, the workflow spiral. However, only for the few invoices that are sent by (still) unknown suppliers or do not have either contract or order reference. As far as we know, in a well-positioned organization these kinds of invoices only represent a very small part that often lies in a single-digit percent range. Making a workflow system the heart of an automation project because of this small amount of invoices and thereby renouncing the actual automation potential is as wrong as missing cash discount periods. In short, it is a missed opportunity.

(This post appeared first in Interact Magazin on the Interact homepage in German)

####

DR. RICHARD COP is co-founder and CEO of Interact Consulting: «Which impact does technology have on our society and which impact do we have on technology? These questions take center stage for me and I also dealt with them in my dissertation on the process of structural change in telecommunication. I want to optimize processes, harmonize technology, organization and people, and above all, promote interaction. Therefore, as entrepreneur, today I design fully integrated processes together with our customers.»

Auto-Classification Technologies and RFID Smart Docs

Alexander — Sat, 10 Jan 2015 13:16:43 +0000

Editor’s note: This is a guest post from Cláudio Chaves from TCG Brazil

Recent advances in the auto-classification technologies – as described in this blog – have provided a substantial manual labor reduction for several companies related to physical preparation, classification and separation of documents into its operations. Although these advances have achieved tangible results in optimizing document centric workflows, there is still a gap in the aspect of classifying and tracking paper documents. This is especially important in some countries where physical documents are subject to different retention policies based on legal requirements. A certain amount of documents must be retained physically for a varying number of years based on the document type determined by classification.

Some capture applications are able to identify document types using barcode at scan time and using auto-classification or barcode content, apply different rules to separate and classify the document images. In the digital world everything is straightforward and works pretty well, but if you need to track and trace the same documents physically until the final archiving step is completed, it always becomes a challenge, especially into a large scale operation with tons of documents.

RFID is an acronym for Radio Frequency Identification, and it is also considered a generic term denoting the ability to identify an object remotely. It means that the information is transmitted via radio waves and does not require line-of-sight or contact between the reader and the tags. RFID technology provides great benefits through the combined use of a barcode, a microchip and an antenna, encapsulated into a tag, also called smart label. The radio waves are sent from a reader and then picked up by a tag that signals back its unique number called EPC (Electronic Product Code). The presence of a tagged folder/document is seen at a reader’s specific location, and this information can be reported to the tracking software that updates a records management database, an ECM repository or even a document capture platform.

Due a high reading speed and the capacity to identify an item even without visual access to the document, RFID technology makes it possible to quickly read a stack of tagged folders and documents even when they are stored inside a card box. In this way, it is possible to perform an automatic check-in of a ton of documents without any human intervention. Additionally it is also possible to inspect document containers like card box and folders at the receiving and delivering points, checking if all the required document classes are really there.

Just like barcodes, RFID technology allows to store a free encoding data schema into the EPC memory, on the other hand, it is always suggested to use a standard, to avoid a proprietary encoding. There are now a few international standards available for different types of objects, such as fixed assets, returnable assets, trade items, documents, etc. These standards have been developed by GS1, an international non-profit association, aiming efficiency improvement, higher items visibility and interoperability between the whole chain.

The EPC data schema GDTI (Global Document Type Identifier) specified by GS1, was developed to identify documents, including the class or type of each document. GDTI can be encoded in a 1D/2D barcode, stored into an EPC memory or printed directly on the document. Companies can use the GDTI as a method for identification and registration of documents and related events. They can also use the GDTI for information retrieval, document tracking, electronic archiving or even to prevent fraud and document falsification.

All these standards were specified based on the EPCGlobal framework, which describes the relation between different RFID components such as hardware, software and data interfaces. Based on the context of this article, we are referring about passive RFID. This technology does not use batteries and works with UHF frequency. Based on this specification, the objects can be identified not only in a near field, but also in a far field area, achieving up to 10 meters far, depending on the type of the object, the tag, antenna and the reader.

The combination of auto classification technologies and RFID tagged documents makes it possible to match the classification results, physically and logically.Given the physical classified document class, it is possible to define and choose the most appropriate document container (e.g. card box, folder, etc.) and pass the parent document class to the image/content classification engine to perform a deeper classification.

At the end of the process, we can match the results and track the both versions (image and paper) during the entire flow. This is can be achieved without physical contact with the paper document. Imagine that you get a box of paper from a remote location for archiving. If the documents have been classified and RFID encoded then within a second you can check the completeness of the physical archive and stow them away. If all of this happens with your capture process automation system you have a tight combination of auto-classification with physical sorting of paper – solving this last obstacle to full automation.

There are still several interesting RFID use cases for documents, such as automatic check in/out, hunting, inventory, exits, etc. which will become more and more popular very soon with the decreasing costs of the technology and the advances of new concepts like IoT (Internet of Things).

Links:
RFID: http://en.wikipedia.org/wiki/Radio-frequency_identification
GS1: http://www.gs1.org/about/overview
EPCGlobal Framework: http://www.gs1.org/gsmp/kc/epcglobal
GDTI: http://www.gs1.org/barcodes/technical/idkeys/gdti

####
Cláudio Chaves is Managing Director at TCG Brasil in Santana de Parnaíba, São Paulo, Brazil. He has many years of rich experience in the document processing applications especially in the South America market.

New Artificial Intelligence – NAI

Alexander — Fri, 03 Aug 2012 14:53:07 +0000

Editor’s note:
This is a guest post from Süleyman Arayan, Founder & CEO of German ITyX Group

Recently Science has made an enormous progress in the field of processing naturally linguistic texts. Thus a new generation of artificial intelligence achieves recognition rates in clearly defined thematic areas (domains), that were previously impossible to reach by simply using traditional keyword-based methodologies. This “New Artificial Intelligence” (NAI) is a highly efficient development based on a modern form of simulation. It is capable of learning and adapts the behavior of human experts in assessing and processing data in documents or files. If imitated enough, results from the automation of recurring, similar transactions turn out to be an exploitable advantage in the efficient of document understanding.

By “training” within a suitable content environment NAI dynamically develops digital structures that are operating similar to a human brain. Thus NAI does not “understand” written contents (this would presuppose a human consciousness) but sticks to imitation only. The potential innovation of NAI lies in the fact that traditional AI-based role models could only be used if contents or processing where understood and causally describable. However using NAI a captured document content can be imitated while still beeing misunderstood.

Bayesian filtering, K-Nearest-Neighbor algorythm (KNN), Support Vector Machine (SVM) – in all these algorithms is a major problem: the representation of text based content as vectors. Although semantically representative, these vectors will lose the sequence of words for the original text. Instead, the mere presence, or the number of single words are counted and recorded. This approach must be be insufficient in several cases, even for the best methods, since semantics and predicate of the text gets lost.

The right combination

Due to research projects along with leading German universities and research institutes since 1999 ITyX has worked with all known methods of automatic document and data processing of various structures and channels. Some new methods have been developed that incorporate the decision of human staff dynamically into the classification decisions. A basic finding: each described methodological approach has its “domain” and thus its “raison d’etre”.

A complete „domain-independent“ content categorization can only be achieved by a blending of results and a calculation of the true categories. In particular, the methods and their typical weaknesses (failures) are needed for a correction. Simple statistical approaches fail, as they multiply the errors in single domains. In this way the overall performance of a DMR digital mailroom system must decrease. The future of automatic text evaluation is scientific classification – driven by a combination with self-adapting procedures. By the end human interception is a must to drive NAI to it´s best.

####

Since 1996 Süleyman Arayan, Founder & CEO of German ITyX Group, has been actively developing linguistic software solutions for Mailroom automation, Knowledge and Response Management. ITyX is a leading provider for intelligent ECM solutions based on auto adaptive AI-features. http://www.ityx.co.uk/

What do we talk about?

Alexander — Mon, 25 Jun 2012 14:58:00 +0000

Editor’s note: This is a guest post from Prof. Dr. Jürgen Lenerz from University of Cologne

We, i.e. adult and sound human beings, are able to talk about everything we want to talk about. But what do we want to talk about? We want to talk about our thoughts (feelings, experiences etc., i.e. about our mental states).

What are we able to think about? We think about situations in the world of our thoughts, – that may be the real world (whatever this is) or a world in our imagination. Worlds may be explained by means of set theory: a world consists of individuals, sets of individuals, sets of sets of individuals and some other sets (of sets) etc.

An individual (in the sense of logical semantics) is what we somehow conceive of as a separate object with certain properties. This is rather vague but close enough to the point. So, Barack Obama is an individual as well as the left front wheel of my car as well as my weight or my memory. As a rule, in English, we use noun phrases (my car, the weight of..etc.) when we want to refer to individuals. In semantics, we say that a noun phrase (in the singular) denotes an individual.

We also think of properties of individuals. A property is described by the set of all and only those individuals which share this property. In English, adjectives or (intransitive) verbs denote such properties (soft, green, liquid,…to sleep, to work, to dance…).

We also think of relations between individuals: local, temporal or comparative relations etc. Prepositions are a good example The preposition on denotes the relation between an individual (the book) and a place, often referred to by another individual (the table): the book (is) on the table. Other expressions for such relations are comparative adjectives ( X (is) bigger than Y ) or some transitive verbs (John loves Mary).

These simple properties and relations may denote what we may call a state of something.

States are somehow stable for a certain period of time, but they may change. So, a flower may welt (i.e. be in the state of bloom first and then wilted) or someone may fall in love(not be in love first and then be in love later). These are changes of states or processes (like the process of aging). So, processes are changes from one state to another. The relation BECOME relates both processes. In English, we find some transitive verbs (to welt, to grow, to dawn…) or more complex constructions in which the predicate BECOME is visible in the additional verb: to grow older, to fall in love, to become bald…).

States or processes may in turn be caused by some other state or process. We call that activities. We may say that X CAUSES Y, X being an individual or a state or a process and Y being a state or process which X brings about. If X brings about Y intentionally, we call X an agent, and we may express this with an additional predicate X DO CAUSE Y.

(There are more finely grained distinctions, depending on the temporal structure of the event (aspect or lexical aspect etc.), but this is the general picture.)

The lexical semantics and the syntax of these concepts may be dealt with in a different blog.

####

Prof. Dr. Jürgen Lenerz, born 1945, was a professor for German Linguistics at the University of Cologne from 1985 until his retirement in 2011. His main interests are in the interaction of syntax, semantics, intonation and information structure in natural languages

Gastbeitrag | Skilja

IDP: Solving bot illiteracy in the digital workforce – Part 2

Robotic Process Automation

Why bots can’t read documents

Why OCR and Data Capture often do not offer the right reading skills for bots

IDP: a new product category for a new market

About OCR and Online-learning

What is the ideal IDP solution?

IDP: Solving bot illiteracy in the digital workforce – Part 1

Digital Transformation

Market Data on Digital Transformation

Why is Digital Transformation taking place now

Some examples of how robotic process automation and document processing drive Digital Transformation in companies

Give Workflow a Miss – Go for Real Automation

Editor’s note: This is a guest post from Richard Cop from Interact Consulting

Auto-Classification Technologies and RFID Smart Docs

Editor’s note: This is a guest post from Cláudio Chaves from TCG Brazil

New Artificial Intelligence – NAI

Editor’s note: This is a guest post from Süleyman Arayan, Founder & CEO of German ITyX Group

What do we talk about?

Editor’s note: This is a guest post from Prof. Dr. Jürgen Lenerz from University of Cologne

Editor’s note:
This is a guest post from Süleyman Arayan, Founder & CEO of German ITyX Group