skiljaadmin | Skilja

Auto Classification and Bias

skiljaadmin — Tue, 22 Jul 2025 12:01:16 +0000

Personal bias and individual opinions are a big issue in standardized business processing if they happen to influence the outcome of a process and the decisions made. Nobody wants to be subject to random changes in the outcome of a personal request – and yet it happens. Because humans have a bias in how they see facts, based on their education, cultural background and even the mood they happen to be in at a certain time in the week. So in addition to different persons making different decisions you can even expect that the same person makes different decisions during the week. You just look differently at a task on Monday morning than on Friday evening. The reason is so-called priming which happens to all of us day-by-day through our experience, knowledge, physical condition, context and a lot of other small factors.

In a recent article on the meaning of words we have shown how the sound of words influences our perception. There are a lot more linguistic associations that influence the way we think and behave, which introduce bias. If for example I tell you that I am driving north across a hilly terrain, would you expect the trip to be mosty uphill or downhill? In fact most people associate movement to north with uphill and to south with downhill. An interesting study by psychologists Leif D. Nelson from UC San Diego and Joseph Simmons from Yale shows that these associations can actually be measured and produce some strange biases: People think it will take longer to travel north than south, that it will cost more to ship to a northern than to a southern location, and that a moving company will charge more for northward movement than for southward movement. A similar study concluded, that people assume that property is more valuable when it sits in the northern part of town. Of course these opinions stem from the decision by the old Greeks to plot the map of the world with northern parts above the south. But also it shows us clearly how much we are biased by our language and of course north/south is only one of many linguistic associations we are exposed to.

Ancient mapmakers introduced north and south unwittingly, but lawyers do have an intention when they describe car accidents. While the defense might call a car accident “contact”, the plaintiff might say one car “smashed” the other. Elizabeth Loftus and John Palmer showed in a classic experiment that these labels really matter. They had a group of students watch the same series of traffic accidents. Then they were asked to estimate the speed of the cars when the accident occurred. When the scene was described in a way that the cars “contacted” one another, the average speed estimation by the students was thirty-two miles an hour, whereas the estimate was forty miles an hour when they said that the cars “smashed” one another. In a another experiment, 14% of participants incorrectly remembered seeing shattered glass when told that the cars “hit” one another, whereas 32% of participants made the same error when told the cars “smashed” into one another. This shows that even a single word can change how people remember an event they witnessed only minutes earlier – making it very clear how priming can bias our decisions.

This brings us back to auto-classification. A classifier like the Skilja Content Classifier is trained with representative samples that are collected by a group of people. If applied to a bunch of documents it will then make the same decision over and over again. It represents – through machine learning – the average opinion about the content of a document and will repeat it without tiring. Same in Monday morning and Friday evening – with a speed of several 100.000 pages per hour. It will make errors – based on statistics – but not more than a human. And the errors are reproducible and can be corrected if necessary. If you have ever thought about compliance – this is a good example. Because compliance does not say that you cannot make errors. It says that you need a reproducible, documented procedure how you store and treat your documents. Auto-classification can help to achieve this goal. It is a great tool for boosting productivity. But it is even more helpful to avoid bias, irreproducible results and non-compliance.

How Meanings of Words Change

skiljaadmin — Fri, 02 Dec 2022 11:57:32 +0000

We all know that our language is fluid and words can change their meaning over time. Words get extinct and new words are created but more often existing words are adapted to new circumstances. It is interesting to see how this happens in the course of years but sometimes words change their meaning overnight.

In a study on “Statistically Significant Detection of Linguistic Change”, published last year (available online from arxiv.org), the researchers Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena have used data mining to find out how the way we use words is revealing the linguistic earthquakes that constantly change our language. There findings are very interesting for anybody who works professionally with text analytics as it reveals a lot about how semantics in our language work. Kulkarni et al. have tracked these linguistic changes by mining the corpus of words stored in databases such as Google Books, movie reviews from Amazon and of course Twitter.

In the pre-internet times the usage and meaning of words changed relatively slowly. This is can be seen in the metamorphoses of the word “gay” from its social meaning in the fifties to the purely sexual-orientation meaning in our time. This is nicely displayed in the word cloud view below:

A faster change occurred in the 1970s to the word “mouse”, when it gained the new meaning of “computer input device” and later, the word “windows” was used internationally as the name of the Microsoft operating system within a few years.

Today the meaning of a word can change almost instantly. Before October 2012, the word “sandy” was an adjective meaning “covered in or consisting mostly of sand”. Then Hurricane “Sandy” approached. Almost overnight, this word gained an additional meaning as a proper noun for one of the costliest storms in US history.

Now this might not sound like a big deal for us – but just imagine the insurance industry and the thousands of e-Mails they suddenly receive referring to damages by Sandy! If they are using a static or rule based classification system, they might easily miss the point on these.

So this is a big challenge for automatic classification systems that have been trained in a machine learning algorithm on a specific set of documents containing words in a specific meaning. When these meanings suddenly change or the usage of words is widened suddenly the classifier will make wrong decisions. The only way to cope with this problem in a living and productive system is continuous learning. The system must learn from user corrections – supervised learning – but also from good classification that contains some new aspects and features. This is called unsupervised learning and is so important as the number of documents that can be used for training the system is much higher than the manual correction. By unsupervised learning – or better enhancement, which also includes forgetting btw – the classification system will be able to cope with changed meanings over the time. Abrupt changes as mentioned above will lead to a drop in classification rate from which the classifier will recover within a few days – like humans who will also need a short time to adapt.

On the Benefits of Page Classification

skiljaadmin — Thu, 29 Sep 2022 14:37:47 +0000

Classification deals with the categorization of objects. In our process automation and digitization world, we often think of the objects as complete documents that need to be classified. Of course, it is important to understand what the type of a document is and automatic classification can determine exactly this. But documents in a business context normally are complex and not homogenous. As a person when you get a multipage document you typically will browse through it to see what is in it to understand what it is about. A document in an envelope or a manila folder that you receive on your desk may consist of an opening letter, some notes, then the real important document, like for example the court order, and maybe attached some standard forms. To understand which process to initiate and what to do with the document you will therefore look at the pages and decide how you can determine from their content what this is all about. Maybe even two or more processes originate from different pages within one document where you might need to answer a request from one page and execute a payment from another page.

Laera Classifier – Page Classification for Claims Processing

This is exactly what page classification in document understanding is able to provide automatically. Instead of looking at the document as a whole the algorithm will classify page by page and derive decisions from the results. This is much more granular than taking only the complete document. And it is different from automatic document separation which is physically splitting the document. Of course, separation is another option based on page results, but it is error prone and risky as the document is ripped apart maybe incorrectly. Often this is not at all necessary but it is sufficient to structure and digitize the document page-wise to achieve the process goals intended.

Page classification requires a solid infrastructure and understanding of physical documents. We provide this with the Laera Classification Framework that inherently understands structured documents. Going even further would be paragraph and sentence classification but this will be a topic for another article. In Laera you can simply define a page classification scheme alongside the document classification. And you can even use the page classification results to determine the document type (e.g. by majority rule or by priority rule).

An example of a real-life project that is in production since more than a year is shown above.

In this case the customer receives thousands of car insurance claims per day. These are 10 to 50 page documents that contain all different kind of pages, as examples:

Covering letter or e-mail (“Anschreiben”)
Attorney’s letter
Expertise (“Gutachten”)
Calculation of repair (“Kalkulation”)
Declaration of Assignment (“Abtretungserklärung”)
Photos

Laera Classifier is able to automatically determine all of these types with a rate in the high 90%. Photo detection tags all photos and hides them for the following recognition steps as they are unnecessarily blocking OCR and extraction steps otherwise. The page classification results allow to structure and reorder the document in an optimal way for subsequent extraction of data from the different page types. Being able to define specific extraction for each page type leads to a significant increase in extraction quality and speed. It also greatly eases the task for the clerks in the subsequent process steps as they already receive a structured document (in this case a PDF that is assembled) with tags and always in the same order.

In such way page classification plays an important role in streamlining the process getting a bit closer to the way how a person would look at the document and work from it.

Confusion Matrix

skiljaadmin — Wed, 10 Aug 2022 11:01:43 +0000

Understanding the quality of an automatic classification system is crucial for its acceptance and any attempt to improve it over time. Quality means that we need to look at errors and at the recognition rate. In classification terms these values are called precision and recall. Precision gives the percentage of documents that have been classified correctly with respect to all documents assigned by the classifier (a/a+b), recall is the number of documents classified into a class with respect to the total number of documents that should be in this class (a/a+c). In a previous post (Measuring Classification Quality) we have already discussed these and how important they are. It is easy to depict them in a graphical visualization:

While these values might appear a little abstract their advantage is that they are independent of the size of the set. But it might be more intuitive to talk about the actual number of documents that are imported into a class from other classes (set b) or exportedand lost from the class (set a). Because it becomes obvious that recall and precision are related and have the same value if no threshold is applied – as every document that is imported to a class must have been lost in another class. Also it makes it easy to look at particular problem classes with a lot of imports (attractors) or exports (donors).

For a classification system these values can be depicted in a so called confusion matrix (also known as a contingency table or an error matrix) showing all relations between classes in one glance.

Our classification designer in the Skilja Content Classification system has a built in visualization that lets you easily see the migration of documents into other classes. As an example we have used our popular Reuters news wire test set and arranged the classes in 7 hierarchical groups. If you run a 90:10 split benchmark on all 5917 documents (which fortunately only takes a few seconds because the SCC is so incredibly fast) the confusion matrix obtained for the 51 classes looks as follows:

Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. Of course the user interface allows you to zoom in to look at the details.

The correctly classified documents are summed up on the diagonal, the exports are on the right upper side and the imports on the left lower side. In our case you see quite some exports from the class “acq”, which is news on acquisitions to “earn”, which is earning. But this is to be expected as these classes are close by topic and often a report on acquisition talks about the same topics (shares, revenue, board) as for earnings. The user can now use this display to click on the box of the 57 exported documents, open them in a list and review them to improve classification if desired. Such it becomes easy to drill down into the results and see exactly what can be improved. You will never achieve 100% precision but remember that also manual human classification only achieves 95% on average as proven in experiments.

When the classes are organized in a hierarchy, the confusion matrix by Skilja also allows you to collapse the nodes and look at upper levels only. In this case the values of the hidden subclasses are summed up and shown for the parent class.

The diagonal has two values now. For example 4.320 of the finance documents have been correctly classified but 175 have been exported/imported within the finance category. Often you are only interested in the migration between the main parent classes, while errors under one parent are less problematic.

Typically an organisation can assign a cost with each export and import. The cost can be different for each pair of classes where this happens. Migrations within a set of subclasses are often not very expensive if they relate for example to documents that anyway are processed in a department. On the other side an import into a class that leads to an automatic payment can be very expensive. This can be mitigated by assigning different thresholds to such classes, which SCC allows. The confusion matrix allows you to find out where these need to be applied. But the matrix can also be exported and you can apply your own cost matrix to the results to determine, which improvement make sense. We are currently working with a real client to create a case study that shows these numbers in a real world example at an insurance company. When available, this study will be published here. Stay tuned!

Process as a Service

skiljaadmin — Wed, 05 May 2021 13:57:54 +0000

Imagine that you have created a powerful process for superb document automation using all kind of advanced recognition, image processing and AI technologies available. With these technologies it is possible to automate almost any document driven process that involves repetitive cognitive tasks like classification, indexing and decision making today. VINNA by Skilja is a powerful platform that enables and orchestrates the LAERA components by Skilja to perform all these miracles. In addition VINNA plugs in a lot of other powerful tools from other technology companies like barcode recognition, office format conversion, e-Mail normalization, PDF-A generation etc. etc.

Now as this process is built and works well, the question arises how to integrate the new capabilities in your line-of-business applications and existing processes. File import and export is insecure and outdated. Full integration requires too much effort from IT that might not be available.

Fortunately there is a solution 🙂 : By plugging in an Event Driven Activity (EDA) you can enable ANY process to be accessible through a standard web service protocol. Simply by adding the EDA to an existing process you make it available to a RESTful service call. The EDA can be the only start point of a process but it can also be added in addition to existing starters like file importers, message queues or IMAP collectors that pump documents into the same process.

Typically you add one EDA starter and one or several EDA Reporter or Listeners. These are accessed through a simple web protocol by the EDA consumer. The Consumer can either be a web page (as in the example below) or a Windows application like an RPA client that automates the cognitive task scheduling. Both are provided as sample source codes with your installation. At any stage of the process the Consumer is optionally updated via event on the progress of the work item in the process. When processing is finished the results are retrieved either through an event or from a queue that is queried via REST.

VINNA Process as a Service schematically

The process can be deployed anywhere – there is zero setup. Simply use the URL of the service and of course the credentials for the secure and encrypted communication with the authentication service. The process can sit anywhere in the cloud and be used by any client world wide that has access –

Through EDA VINNA is used completely in slave mode and the consumer is shielded from any complexity of the process.

The process can contain any number of steps and routes, including manual correction, approvals or even sending tasks to the crowd. The Consumer will see none of this, it will simply get notified on the results on a standard API irrespective of the process. So when the process is changed and enhanced, the Consumer can stay as it is. This is the true power of process as a service. “Technology under the hood”- Especially when combined with online learning that will continuously improve the result.

Vinna Process as a Service for RPA with Classification Example

A nice example is shown in the graphic above. The Consumer (in this case any RPA client) choses to use a classification service by selecting the process (through EDA) and the classification project that should be used. In this case both classification activity and classification Web designer access the same classification model in the database. Therefore an admin user can even modify the classification model and the taxonomy or create a new one. Because the process itself and the call to use it stay unchanged. This process also contains a manual correction step (Batch Review) that is conditionally used in case of uncertain results. The system can even send a link for a correction task as a web page so the user can make any decisions or corrections of submitted tasks herself without having to install anything. The corrected results in turn will then be aggregated and used for online learning. If several hundred users are using this process it will quickly optimize the results automatically without any further effort. And at all time the actually processing can happen anywhere in the world on any cloud server.

This is the power of process as a service.

Reading Medical Reports

skiljaadmin — Fri, 27 Dec 2019 14:08:21 +0000

Medical Reports are complex documents that are written by doctors who use their specific language and style to express not only facts but also hypotheses and suggestions. They are intended to be read by other doctors or experts who have a deep knowledge of the subject at hand and can make judgements based on what they learn. And in the end the information contained therein is of vital importance for many decisions that are to be taken – medication, change of life style, possible surgery but also cost of insurance.

So reading medical reports using Artificial Intelligence and machine learning is a challenging task. The use case described here utilizes Laera Information Extraction to assist experts at a life insurance to calculate the health risk for persons. In the end the decision needs to be taken by doctors but the system can greatly assist them to sift through the amount of text provided. Because medical reports attached to a life insurance application can easily exceed 100dreds of pages.

Laera Information Extraction uses advanced AI methods to assess the risks contained in these reports and points them out to the experts. In a first step the diagnoses are extracted base on the common ICD-10 code. But a simple word search is not enough because each diagnosis needs to be put into its context. Here the option of Laera to assign multiple roles to an entity becomes very useful. Once a critical diagnosis is found Laera makes an assessment based on several categories:

Is the polarity negative or positive? In most cases symptoms are excluded in the reports and therefore the diagnosis is negative. These are of no interest (of course the patient is happy about that). Only the positively confirmed ones are relevant for the risk assessment.
Is the diagnosis for the present or the past? Many reports contain a lot of history of what happened in the past. While the history might be interesting, the main focus is on the current situation.
Is the diagnosis for the person herself or maybe for the family? Family history (Father had heart attack) might be important but needs to be assessed differently.

Finding Diagnoses, polarity and roles

Laera intelligent extraction performs all these tasks and analyzes all pages in milliseconds using semantic and structural methods:

Find all diagnose and symptoms
Find the polarity (is the diagnosis excluded or asserted
Determine the context (role e.g. self or family)
Classify the paragraph by relevance
Present and auto-summarize results in ICD10 terms
Highlight relevant areas in the document for quick visual confirmation

Of course, just to make sure this does not get lost, the roles and assignments are not defined by rules but trained using machine learning from a few hundred labeled examples.

Summary of symptoms with polarity in ICD-10 tree

The customer using this system could reduce the time spent to assess an application by more than 50% as the experts get a prepared data set that allows them to quickly jump tp the relevant sections and make the decision.

It is also important to note that this is not a hard coded special solution, but an example of the application of the Laera Information Extraction product. Any other industry can use this approach to solve their specific requirements. Example range from contract management to court documents, but of course also standard default extraction tasks can be easily solved with Laera Information Extraction.

If you are interested to learn more about this use case or the application of AI extraction in your specific domain please let us know via e-Mail at info(at)skilja.com and we will be happy to provide more information.

What’s another Year?

skiljaadmin — Tue, 20 Feb 2018 15:14:41 +0000

Now going into our seventh year of concentrated product development we at Skilja still are full of innovation and ideas to create something really unique. In the last few months our main subject “Artificial Intelligence” has become an absolute top topic in the news – something we have quietly been doing since a long time now has reached the titles of magazines and the pages of newspapers. (As an example look at this nice title from The New Yorker). This makes us proud as we have ignited some of this new thinking and confident that we will play a small but nevertheless important role in the years to come.

Skilja has the expertise and the knowledge to make artificial intelligence, or as we prefer to say “cognitive automation”, really work. This is now proven in numerous projects that we and our valued partners and integrators have realized in the past.

At the same time every engine needs a chassis and wheels to be able to run. Last year therefore was also marked by the market introduction of our great, new, service-oriented and cloud enabled document processing platform “Vinna”. The first few dozens of projects have been delivered and customers are happy about stability, scalability and ease of deployment into system environments. Especially for enterprise deployment Vinna offers unique support for our customers. We will discuss the concepts of deployment and management of multiple tenants in environments and cloud in an upcoming post.

With the platform now readily available, solving the basic problems of process, storage, security, formats etc., our machine learning and intelligent algorithms can now even more easily be applied to real world tasks. Among others some examples of what we and our partners did in the last year:

Claims management with automatic document separation and distribution of incoming e-Mail claims
Digitizing complete archives of contracts for more transparency on their content
Processing reports that then are inserted to an existing portal using robotic process automation (RPA)
Automatic comparison of insurance terms in old contracts with the current terms to provide a suggestion for upgrading the contract or to detect legal loopholes
Detection of duplicates, id-cards and photographs in incoming mail (scanned or e-Mail)

This is a small insight of what has happened and what is happening and we hope that also you, our readers, will not hesitate to reach out to make contact. We have a network of good partners worldwide and are happy to make the connection or we work directly together.

So what’s another year? – amazing what can be achieved in this time span… More is coming next months – stay tuned.

Happy Birthday Skilja!

skiljaadmin — Tue, 28 Mar 2017 15:41:35 +0000

On a cold winter day in January 2012 we created Skilja – the Document Understanding Company – and incorporated it in Freiburg/Germany. Unbelievable, that 5 years have passed since then and we celebrated our 5th birthday on January, 20 2017. Happy birthday, Skilja!

The goal was from the beginning to create a unique set of software solutions to solve the task of automatic document understanding using cognitive algorithmic methods. And we were using our experience from more than 20 years of creation of innovative software products, often pioneering and paving the way to new approaches, to achieve this goal and do it better once more. The current wave of so-called intelligent software and artifical intelligence, driven by the need for digitization of business processes came our way and is exactly where Skilja is positioned in and optimally equipped to deliver on the promises made by others. Because we have a fantastic team that knows what is needed and how it is done.

In the last five years Skilja has grown to a considerable size of now 15 very skilled engineers. All of them with long experience and knowledge in document processing or other key areas like neural networks, statistical machine learning and image processing. We have a great partner network that uses our software components and solutions so our team can fully and exclusively concentrate on developing the latest and greatest additions to cognitive intelligence that are needed to streamline processes. Skilja does not market too much of this publicly – we prefer to deliver. A few glimpses and examples of what has been achieved over the last years as description of real productive projects:

Classification of incoming mail of one of the biggest insurance companies in Germany processing more than 100.000 documents per day with a supreme quality and unrivaled speed.
Complete solution for claims processing in car insurance developed together with a leading BPO that processes and evaluates claims for more than 20 insurance companies in Germany.
E-Mail classification of orders and complains of one of the larger online shopping platforms.
Automatic classification and document structuring of medical records within life insurance applications for a big insurance company, leading to a significant speed-up of their processes for underwriting.
Automation of car loan processing for a South-American bank using content based and ID-card classification to identify the necessary documents within a car loan and structure it – leading to a significant reduction of effort for checking a loan for validitity and completeness.News

All of these projects have been achieved by our partners and we are proud that we were able to help them to be successful and give them the competitive edge in these cases.

We are looking forward to the next 5 years, we are still innovating and still have a bag full of good ideas and projects that we enjoy working on in this time. See you in 2022!

Vinna – 4th Generation Document Processing Platform

skiljaadmin — Fri, 27 May 2016 15:43:22 +0000

We are glad to introduce the latest version of our software “Vinna”. Vinna is Icelandic and means “work” and this name is self-explanatory as Vinna is a fourth generation platform for digital document processing.

Vinna was presented last week at the open house conference in Berlin that Skilja organized together with our dedicated partner ScaleHub (www.scalehub.com) with our partners TCG (www.tcgprocess.com) and Eucon (www.eucon.de).

Vinna Dimensions

The three dimensions of Vinna are:

Process: Definition of flow of action define according to BPMN
Documents: Richly structured and open model for storage and enrichment of documents and other media
Activities: Process modules which are individually configured or created

Vinna is an open and process oriented platform, which defines a process in exactly the way as it is optimally operated in a company. If you wish you process documents in batches then create batches. If you want to work case or document centric then use these entities in the process. The design of the platform allows full flexibility with any number of activities to be carried out in the process, including arbitrary routing decisions based on intermediate results.

Vinna is based on a data and document model that is completely and transaction safe in a modern SQL database. Using a modular architecture of the communication layer Vinna can be scaled and distributed optimally. Processes with more than several 100.000 items per day are easily manageable with intelligent load balancing.

Of course the architecture is service oriented (SOA) and can be deployed at will in the cloud or on premise or in mixed environments. All communication between services and databases is transaction based, securely encrypted and uses standard protocols. The platform is multi-tenant capable, supports staging, versioning, assisted deployment and provides built in SLA monitoring of all processes.

Vinna Tools

As a part of the platform three powerful tools are provided with pure HTML GUI:

Process Designer: Definition of processes and activity management according to BPMN
Process Monitor: Operational view on running processes and SLA monitoring
System Monitor: Technical view on running services and current state of the system messages, current throughput per service and activity

Vinna Modules

Vinna modules are activities in the process. The platform already comes with a number of standard activities included, like import, export, full-text OCR and PDF creation. Our own classification components and technology from selected third parties are provided as options. In addition individual activities can be created based on customer requirements or directly by the customers themselves using the open .NET API or the REST Interface.

Give Workflow a Miss – Go for Real Automation

skiljaadmin — Fri, 22 Apr 2016 16:33:45 +0000

Editor’s note: This is a guest post from Richard Cop from Interact Consulting

Give workflows a miss: A provoking claim in the age of business process automation. Because whenever we think of the automated processing, we think in categories of how information gets to the relevant persons fast and efficiently. But especially when it comes to automated processing of vendor invoices, this consideration is the wrong approach, because it is necessary to automate processing as far as possible so that only a small amount of invoices have to be sent on a control and approval journey actually using a workflow.

All beginnings are easy
Whenever we are addressing the subject of automated invoice processing in a first meeting with a customer, our counterpart talks workflow. Traditionally the incoming electronic or scanned image of an invoice is received in accounting. They then determine the responsible person, sends the invoice and waits for the respective invoice release. After its return to accounting, booking, filing and payment are carried out and that is all.

This may be a good approach when you are dealing with very few invoices, maybe five to at most ten units per day respectively less than 2.000 invoices per annum. In addition, dealing with this amount, an existing e-mail system and a bit of organizational aptitude should be sufficient to keep track. However, as soon as the number doubles, triples or even quadruples, such an approach reaches its limits shortly, because all the sudden a staff member of the accounting department will spend most of its working day sending invoices on their electronic journey and controlling the returned invoice approvals.

Therefore, when it comes to invoice quantities of 10.000 units or more, a better alternative has to be considered. One that avoids wait time and grants full overview over all invoices in circulation. So at first glance, implementing a workflow-system seems obvious. A system that offers the possibility to track the invoice distribution and intervene in the process at any time. However, is this sufficient? Or is it rather a missed opportunity?

Invoices are a bonanza
A workflow can be a good thing, but honestly: Does it not seem absurd to you, too, to send an invoice that you receive on a monthly basis to the same person time and again and subsequently use the same old data over and over again for accounting? In this case, would it not be much more feasible if the system carried out such processes on its own?

In order to enable an intelligent processing system to do so, it has to be provided with high-quality data at a very early stage. Therefore, at the beginning of an automation process paper invoices, on the one hand, have to be scanned and recognized or automatically read, the delivered data records of electronic invoices, on the other hand, have to be reformatted into a standard format, so that all further processing steps can be executed uniformly.

Once the invoice data is available in a consistent electronic way, it can be processed automatically. The objective of an optimized business process automation has to be to store knowledge in such a way that data can be processed highly automatically, or in other words, whenever possible without human intervention.

When analyzing your everyday invoices, you will quickly realize that a large number of it is returning. This means that information and knowledge on how to deal with these invoices is present within your organization. This knowledge can be stored in systems or inside the heads of your staff. In case this process is to be automated, it is important to use this information and to prepare it in such a way that the processing system can take highly independently all necessary decisions based on this knowledge.

For example, there is project ABC and it is know that the invoices of supplier XYZ always belong to this project and that they are always booked the same way. At the same time, these invoices are checked and approved by the ever-same person of the ever-same department. The binding source for the invoice checking is the corresponding contract negotiated with supplier XYZ for project ABC.

It does not make any sense to enter this information into the accounting system time and again whenever the invoice comes in. In addition, it is equally useless to e-mail this invoice to the ever-same person hoping she or he will return the approval before the cash discount period has expired.

In this case, we are speaking about an automated workflow where the basic information of a vendor is stored in the processing system and used automatically as accounting and approval template whenever such an invoices comes in. However, processing invoices that way implies that the invoice recognition is carried out consistently, by way of example using the unique vendor master record of the accounting system.

Why do contracts exist?
The cynical answer could be: So that they can be discarded somewhere and forgotten until someday the big search begins. Well, hopefully not, but in most of the companies that we have met so far, there is no or an only very limited structured contract management. Because of this, goods and services are often only vaguely known and simple information such as period or term of notice is not available on short notice. In case contracts are recorded in a system in a structured way, the information contained can be used to automate the processing of all invoices by the respective contracting party. Generally, terms of credit, rates and conditions, installment plans, periods, budgets et cetera are defined in the contract, so that the electronically available invoice data can be matched with the respective contract information.

Generally, if there is an existing leasing or rental agreement it is known when the respective invoices come in, which amounts have to be paid and how they have to be recorded. In case no deviation from the contract is discovered, there is absolutely no need for a check and approval process. Letting these invoices be checked and approved by the ever-same person time and again and to record them in the ever-same way is nothing more than a waste of precious time.

For contracts with variable invoice amounts, similar procedures can be applied. By way of example, checking the often-comprehensive telephone bills is feasible only when there is a peculiarity respectively a deviation such as an extraordinary invoice amount. Therefore, effectively automating the processing of invoices with contract reference requires the use of such procedures in order to process these recurrent invoices fully automatically and to detect deviations so that your specialists can systematically focus on these peculiarities.

An order is already entered in the books
The potential of automating invoices with order reference goes even further. Many organizations operate standardized purchasing systems that manage the orders of goods and services. Generally, requisition, approval, process of quotation and finally placing the purchase order with the supplier are part of the things managed by such systems. In addition, an approval process of the desired order is an essential part of this procedure. Furthermore, all relevant information regarding prices and conditions, accounting, cost centers, projects et cetera is already deposited when placing the order. Therefore, it does not make any sense at all to run through these steps all over again and to manually check and record all information again by the time the invoice arrives.

Using automated order matching, the processing system checks whether there is a deviation between invoice and order. In case no deviation is detected, there is no reason to manually check the respective invoice and run through an approval process all over again. This a complete waste of precious time, too.

The workflow spiral
And yet it moves, the workflow spiral. However, only for the few invoices that are sent by (still) unknown suppliers or do not have either contract or order reference. As far as we know, in a well-positioned organization these kinds of invoices only represent a very small part that often lies in a single-digit percent range. Making a workflow system the heart of an automation project because of this small amount of invoices and thereby renouncing the actual automation potential is as wrong as missing cash discount periods. In short, it is a missed opportunity.

(This post appeared first in Interact Magazin on the Interact homepage in German)

####

DR. RICHARD COP is co-founder and CEO of Interact Consulting: «Which impact does technology have on our society and which impact do we have on technology? These questions take center stage for me and I also dealt with them in my dissertation on the process of structural change in telecommunication. I want to optimize processes, harmonize technology, organization and people, and above all, promote interaction. Therefore, as entrepreneur, today I design fully integrated processes together with our customers.»