Technology | Skilja

The End of OCR: How Word-Level Understanding Changes Everything

Alexander — Thu, 30 Oct 2025 16:10:19 +0000

For more than half a century, OCR—Optical Character Recognition—has meant one thing: machines deciphering single characters. From early template-matching in the 1960s to the statistical engines of the 2000s, OCR has always approached reading as a mechanical process. It segmented text into character-shaped fragments and tried to analyze what each one meant, letter by letter. But that era is now ending. Modern systems don’t read characters at all—not individually. They read words, phrases, and even meaning. OCR has evolved into something fundamentally different, and this shift delivers a new level of accuracy, fluidity, and naturalness that previous generations could not reach.

This is what Skilja has done with Lesa, our deep-learning, transformer-based system designed to read like humans do: holistically, contextually, and intelligently.

A Brief Look Back: From Characters to Context

Traditional OCR started as an analogous process (therefore the O = Optical) and went through several stages:

1950s–1980s:Rigid template matching—effective only with perfect typewritten pages.
1990s:Feature extraction and early machine learning—better but still brittle.
2000s–2010s:Statistical modeling and improved analysis workflows—good enough for books and printed forms and constrained hand print, but always character-bound.

Even at its best, classical OCR remained a guessing game. It mistook 1 for l, turned smudges into glyphs, and struggled with anything outside its narrow expectations. Especially handwriting.

The problem wasn’t better algorithms—it was characters – but humans read words and not characters as anyone who has watched a child learn to read will confirm.

The Deep Learning Shift: From Decoding Shapes to Understanding Language

Transformers changed everything. Instead of interpreting characters, transformer-based models interpret sequences, context, and linguistic probability. They don’t see text as isolated shapes but as parts of sentences, paragraphs, and concepts.

This allows Lesa to:

recognize entire words, not just letters,
use surrounding text as context,
maintain coherence across full pages,
and adapt to different visual styles.

Reading becomes a language-understanding task, not a pixel-decoding task.

Lesa: Built for Word-Level Intelligence

Lesa is designed from the ground up to treat a document as a linguistic landscape. Using a transformer architecture trained on diverse text and real-world images, Lesa doesn’t ask, “What is this character?” It asks, “What does this word say—and how does it fit into the sentence?” This matters since now we have:

Far fewer errors: No more character-by-character error cascades.
Natural output: Clean spacing, correct punctuation, coherent text.
Font and layout robustness: Works across stylized text, messy receipts, tables, multi-column pages.

One of the most transformative outcomes of word-level understanding is that constrained handwriting—the kind found in forms, notes, medical records, delivery slips, and corporate paperwork—now works much better. This means that messy capital letters, box-filled handwriting, and half-printed forms suddenly become highly readable. Handwriting recognition suddenly is at our fingertips and routinely applied.

Calling this the “end of OCR” isn’t hype. It’s a technical recognition of what has changed

Traditional OCR = character recognition
Modern OCR = language understanding

Lesa is part of a new generation of systems that read documents the way humans do—by interpreting words, not decoding symbols.

OCR as we knew it is over. Something better has replaced it. Lesa represents this new era.

Intelligent Document Processing and Enterprise Security

Alexander — Sun, 22 Jun 2025 11:15:06 +0000

For those of us who have historically worked in the area of Intelligent Document Processing (IDP), or Capture as it was simply called before, it is a very pleasant observation that IDP, that has been around for a long time, creates more and more interest in the general CEO discussions and is now seen as an integral part of process optimization.

This is on the one hand due to the rise of AI technologies and the subsequent understanding what can be achieved with algorithms that mimic human understanding. AI having arrived in the mainstream (and even dominating the mainstream discussions) is now generally understood as being capable to perform cognitive tasks that humans perform. We have known and preached this for a long time but it is a good development that our former niche becomes standard.

On the other hand IDP gets more and more integrated into the main business processes. In the past Capture used to be almost always departmental. Capture was only allowed in a (badly lit) corner of an Enterprise mainly because no Capture system was able to fully integrate into Enterprise IT and most importantly comply with all security rules established for enterprises.

This has changed to the better in the past few years. At Skilja we have invested a lot of effort to strictly follow all security requirements so Vinna (our enterprise platform) can become a part of enterprise IT. This means on the one side that during development we are running frequent security screenings and penetrations tests to ensure utmost security for the software. For example Vinna and Skilja Software is Veracode Verified since many years. On the other hand we independently make sure to follow all defined industry standards very closely.

The most important aspect to be allowed to run in an enterprise is authentication and authorization. Typically an enterprise will not (or only grudgingly) allow an application to store user names or passwords outside of their internal identity management software. A platform should not have its own user management. This is a no-go for many customers. Therefore Vinna from the beginning always used roles that then are mapped to users in the enterprise user directory. In Vinna 3.0 still the password needed to be entered by the user and was sent (encrypted) to the authentication backend.

Since version 3.1 our Vinna platform uses the OAuth2 protocol for the authorization of users. OAuth2 itself does not directly deal with the authentication of users and clients. Instead the authentication backend is required to grant authorization and thus access. OAuth2 is supported by a lot of backends, namely Microsoft Azure AD and Keycloak. All communication with the backend (Resource Server) is bundled in the Skilja Authorization Server that is used by all Vinna Platform services, Clients and Activities.

Vinna provides authentication via different methods:

Authorization Code Flow with PKCE via a web browser
User name and password authentication via the password grant flow
Client authentication via the client credentials grant flow

Authorization Code Flow with PKCE (proof key for code exchange) is the current best practice when logging into any client application because it avoids entrusting a client application with user credentials.

A user that wants to log into a web site is redirected to the log in page of the Skilja Authorization Server. If the user has not yet authenticated itself to the Skilja Authorization Server, they enter a user name or password into the login field. After the credentials have been verified, the user is redirected back to the web site where they started, along with an authorization code. This authorization code is used by the website to exchange it with the Authorization Server for an access token. The authorization code part prevents user credentials to ever be entered into a potentially non-trusted client application. The PKCE part in this flow prevents the access tokens to be passed around in redirect URLs shown in the browsers URL tab, thereby preventing accidental token leakage by copy/pasting the url.

The client credentials grant flow type of authorization as shown above allows to register client credentials with the authorization service, along with claims which each clientId may receive. This type of authorization is intended for machine-to-machine communication and not for typical user interactions. In this flow, a client Id and matching client secret is sent to the authorization service that issues an access token. Once the access token expires, client Id and client secret can be used again to obtain a new access token. Also, in some cases a refresh token is made available that can also be used to acquire a new access token

Overall this new architecture of authorization and authentication allows an enterprise to integrated Vinna and all IDP activities that run within Vinna into their environments, available to all users opening up much more options to use IDP in their processes than if they were separated in their own network.

On the Benefits of Page Classification

skiljaadmin — Thu, 29 Sep 2022 14:37:47 +0000

Classification deals with the categorization of objects. In our process automation and digitization world, we often think of the objects as complete documents that need to be classified. Of course, it is important to understand what the type of a document is and automatic classification can determine exactly this. But documents in a business context normally are complex and not homogenous. As a person when you get a multipage document you typically will browse through it to see what is in it to understand what it is about. A document in an envelope or a manila folder that you receive on your desk may consist of an opening letter, some notes, then the real important document, like for example the court order, and maybe attached some standard forms. To understand which process to initiate and what to do with the document you will therefore look at the pages and decide how you can determine from their content what this is all about. Maybe even two or more processes originate from different pages within one document where you might need to answer a request from one page and execute a payment from another page.

Laera Classifier – Page Classification for Claims Processing

This is exactly what page classification in document understanding is able to provide automatically. Instead of looking at the document as a whole the algorithm will classify page by page and derive decisions from the results. This is much more granular than taking only the complete document. And it is different from automatic document separation which is physically splitting the document. Of course, separation is another option based on page results, but it is error prone and risky as the document is ripped apart maybe incorrectly. Often this is not at all necessary but it is sufficient to structure and digitize the document page-wise to achieve the process goals intended.

Page classification requires a solid infrastructure and understanding of physical documents. We provide this with the Laera Classification Framework that inherently understands structured documents. Going even further would be paragraph and sentence classification but this will be a topic for another article. In Laera you can simply define a page classification scheme alongside the document classification. And you can even use the page classification results to determine the document type (e.g. by majority rule or by priority rule).

An example of a real-life project that is in production since more than a year is shown above.

In this case the customer receives thousands of car insurance claims per day. These are 10 to 50 page documents that contain all different kind of pages, as examples:

Covering letter or e-mail (“Anschreiben”)
Attorney’s letter
Expertise (“Gutachten”)
Calculation of repair (“Kalkulation”)
Declaration of Assignment (“Abtretungserklärung”)
Photos

Laera Classifier is able to automatically determine all of these types with a rate in the high 90%. Photo detection tags all photos and hides them for the following recognition steps as they are unnecessarily blocking OCR and extraction steps otherwise. The page classification results allow to structure and reorder the document in an optimal way for subsequent extraction of data from the different page types. Being able to define specific extraction for each page type leads to a significant increase in extraction quality and speed. It also greatly eases the task for the clerks in the subsequent process steps as they already receive a structured document (in this case a PDF that is assembled) with tags and always in the same order.

In such way page classification plays an important role in streamlining the process getting a bit closer to the way how a person would look at the document and work from it.

Document Separation Revisited

Alexander — Thu, 08 Sep 2022 09:50:54 +0000

One of the frequently overlooked and really difficult problems in document automation, which is also really annoying in daily processing, is the automatic separation of a stack of documents into single meaningful documents and assignment to a document class. In traditional scanning processes this is often achieved by manual preparation of the paper and sticking a barcode as a document separator on each first page. But this is labor intensive and error prone. In addition as we are going more and more digital, even with paper based processes, normally the processing facility does not have access to the paper any more. So the goal would be to simply scan the whole stack and have it separated by an intelligent algorithm.

Fortunately this is readily available today for example from the Skilja technology stack as a built in feature into the Laera classifier. This does not say it is easy. It requires quite some experience and infrastructure to manage several interdependent steps of classification and separation in a stable and reliable way. This is what Laera provides out of the box.

How does document structuring work in principle? Well, in exactly the same way (our credo!) as a human would do it. Go through the stack page by page, determine what page type it is, if it is related to the previous page or if a new topic/form starts. Then check page numbers for security if they are present. If in doubt, go back one or a few pages to check back and then make your decision to separate.

Laera Document Separation

In AI classification, what Laera is, this is built into a sequence of algorithms. The system is trained on a sample that is already correctly separated. Laera does learn for each page if it is a first, a middle, and end or a single page. The user does not have to specify this explicitly as the Laera AI finds that out automatically from the samples and hides this complexity from the users. The training interface just requires you to drop the single documents into the training set. It is not required to have an exact number of pages (range) for each document type. Laera automatically takes into account that these can vary for each document type. However if you know you can also restrict allowed pages for example for single page forms that are always single page.

Laera will then learn the structure and apply it to the whole document stack of unseparated single pages during runtime. Each page is analyzed. A second classifier (we could call it a “meta-classifier”) will then take these results and find the most probable separation based on the trained model. So even if a first page has not been identified as a first page there is a chance that the meta-classifier still will see it as more probable to be a first page and correctly separate. A third classifier will then determine the document type for the separated documents. As usual in Laera all this is very fast and a separation of a stack of 200 pages with 150 document types takes less than 30 seconds in total.

The example below shows the results from separation of a mortgage application stack with 153 pages and classification in 244 document types. The horizontal lines indicated the found separators and the “New page” column shows the new numbering of pages in the separated documents.

Laera Mortgage Separation Result (click on image to see full screen)

The detail view of the separation result for one page nicely shows how the separation algorithm came to a decision for the first page of a URLA supported in addition by the detected page count on the page (“Page 1 of 9).

Laera Mortgage Separation Details

Training of this model takes about 10 minutes so it is easy to frequently test and refine it. All this can be done by the end user and does not need an AI engineer.

Quality is very important and Laera makes sure to bias towards precision to no errors are made and allow the workflow to show unconfident separations to a user for decision. In a project that was done 18 months ago for a large Swiss insurance company Laera achieved an automation rate of 87% with an error rate (false positive) or 0.14%. Still of course each separation result needs to be checked and the correction results will be used by Laera online learning to improve the model.

But overall the reduction of work in separation and the increase in quality is very measurable and yields huge benefits. All this is available either on premise or as cloud service to be used through RPA or RESTful API in any backend. Let us know if you are interested and we can show you a demo. Also a setup with your own documents is easily achievable with little effort. Contact is info (at) skilja.com.

Confusion Matrix

skiljaadmin — Wed, 10 Aug 2022 11:01:43 +0000

Understanding the quality of an automatic classification system is crucial for its acceptance and any attempt to improve it over time. Quality means that we need to look at errors and at the recognition rate. In classification terms these values are called precision and recall. Precision gives the percentage of documents that have been classified correctly with respect to all documents assigned by the classifier (a/a+b), recall is the number of documents classified into a class with respect to the total number of documents that should be in this class (a/a+c). In a previous post (Measuring Classification Quality) we have already discussed these and how important they are. It is easy to depict them in a graphical visualization:

While these values might appear a little abstract their advantage is that they are independent of the size of the set. But it might be more intuitive to talk about the actual number of documents that are imported into a class from other classes (set b) or exportedand lost from the class (set a). Because it becomes obvious that recall and precision are related and have the same value if no threshold is applied – as every document that is imported to a class must have been lost in another class. Also it makes it easy to look at particular problem classes with a lot of imports (attractors) or exports (donors).

For a classification system these values can be depicted in a so called confusion matrix (also known as a contingency table or an error matrix) showing all relations between classes in one glance.

Our classification designer in the Skilja Content Classification system has a built in visualization that lets you easily see the migration of documents into other classes. As an example we have used our popular Reuters news wire test set and arranged the classes in 7 hierarchical groups. If you run a 90:10 split benchmark on all 5917 documents (which fortunately only takes a few seconds because the SCC is so incredibly fast) the confusion matrix obtained for the 51 classes looks as follows:

Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. Of course the user interface allows you to zoom in to look at the details.

The correctly classified documents are summed up on the diagonal, the exports are on the right upper side and the imports on the left lower side. In our case you see quite some exports from the class “acq”, which is news on acquisitions to “earn”, which is earning. But this is to be expected as these classes are close by topic and often a report on acquisition talks about the same topics (shares, revenue, board) as for earnings. The user can now use this display to click on the box of the 57 exported documents, open them in a list and review them to improve classification if desired. Such it becomes easy to drill down into the results and see exactly what can be improved. You will never achieve 100% precision but remember that also manual human classification only achieves 95% on average as proven in experiments.

When the classes are organized in a hierarchy, the confusion matrix by Skilja also allows you to collapse the nodes and look at upper levels only. In this case the values of the hidden subclasses are summed up and shown for the parent class.

The diagonal has two values now. For example 4.320 of the finance documents have been correctly classified but 175 have been exported/imported within the finance category. Often you are only interested in the migration between the main parent classes, while errors under one parent are less problematic.

Typically an organisation can assign a cost with each export and import. The cost can be different for each pair of classes where this happens. Migrations within a set of subclasses are often not very expensive if they relate for example to documents that anyway are processed in a department. On the other side an import into a class that leads to an automatic payment can be very expensive. This can be mitigated by assigning different thresholds to such classes, which SCC allows. The confusion matrix allows you to find out where these need to be applied. But the matrix can also be exported and you can apply your own cost matrix to the results to determine, which improvement make sense. We are currently working with a real client to create a case study that shows these numbers in a real world example at an insurance company. When available, this study will be published here. Stay tuned!

Vinna 3.0 Released

Alexander — Tue, 24 Aug 2021 10:17:26 +0000

We are proud to announce the release of Vinna 3.0, our open 4th generation Document Processing Platform. After 18 months of concentrated and intensive development time we are very happy that we now can provide even more value to our customers. We invested a lot to take our customers feedback back to our engineers and create a totally new and modern UI – with an improved backend to support enterprise performance, scalability and security requirements. Process Editor and Process Monitor are completely redesigned and both are now available in English, as well as in German language. To avoid any pain for our many existing customers, special effort has been spent on compatibility with Vinna 2.4 so all projects can be smoothly upgraded. You can either manage a 2.4. runtime from 3.0 design time to achieve a step-by-step upgrade without disrupting production, but also the transfer of old process versions into 3.0 has been very thoroughly tested.

New Process Editor UI

The new design makes creating processes with no coding – no scripting – no configuration file editing as easy as it should be. Vinna 3.0 comes with the new BPMN process editor, with improved speed and usability. Now in Angular 10, all functions are componentized and can be integrated separately.

Plenty of new features improve the design and runtime management of processes. Cooperate better with your team members, as you write comments directly to activity instances. Work together designing the process. All activities can be configured through the UI – either through standard dialog or individual extended dialogs of activities, which can even bring up their own web UI. When a process is locked, you can now immediately see by whom. Besides many graphical changes, e.g. in the view of processes, document types and variables, it is now also possible to switch all views to lists and search in all trees and lists. You can see all environments where a process(-version) has been published to and we allow deletion only when no published version exists.

Vinna 3.0 Process Designer

Process Version Management

If you ever were in charge to manage a production system you know how important staging and versioning is. “Never touch a running system” is common but in the end leads to legacy problems as nothing can be updated any more. Key to any enterprise-critical production system is version management that allows full control over what is changed – of course with thorough testing in staging steps. Therefore, version management and staging is a central part of Vinna architecture from the start and has been further improved in version 3.0. You can now create major and minor versions (1.0, 1.1, 2.0, …) of processes. The latest version you edit is always marked as a draft version – you can’t break anything! Deploying a process happens for a certain selected version. So it is easy to work on major changes of a process and already test it but at the same time create hot fixes (patches) for existing production processes if necessary. And you can even change variables in each of your runtime environments separately for each version.

Vinna 3.0 Process Version Management

Environment Management

Environment ist the runtime system where a process is published to. Many environments – on premise, private cloud, public cloud – can be managed from the same Designer. The environment is executing the process by hosting and running the activities as necessary in as many Activity Servers as needed. In Vinna 3.0 we now have “transient” Activity Servers that auto-start with a VM or in a Docker, do their work and shut down again when not needed. Together with the separation of Activity Server configuration from the instance, you can easily assign arbitrary resources to a project to scale up dynamically in peak hours, or reduce hardware cost by using just as many servers as you need. An overview over all assigned activities and activity servers across an environment fulfills a long-requested requirement.

Process Monitor

The 3.0 runtime backend is fully compatible and introduces a lot of invisible changes related to scaling, performance and security. Process Monitor is the GUI to monitor the runtime and also has been completely redesigned. It comes with a lot of improvements and usability enhancements. Many visual usability enhancements were made with the new controls like grouping, filtering and customization of the UI for business operators. There is now a new tab for directly previewing documents in a work item with all their data. Licenses can now be reviewed and managed either in runtime or in design time with a common license view including status and report for click rates.

Vinna 3.0 Process Monitor with grouped work item list.

Vinna is an open and process-oriented platform, that allows users to define a process in exactly the way as it is optimally operated in a company. The design allows full flexibility in the data model with a hierarchical document model supporting batches, folders, documents and pages. The documents are processed as work-items in the flow and passed through activities. The activities are either standard tasks like OCR or Classification, or custom tasks as integrations into the platform. Any number of activities can be defined in the process as micro services, including arbitrary routing decisions based on intermediate results. The architecture of Vinna is service oriented (SOA) and the runtime is easily deployed either in the cloud (Microsoft Azure, AWS or private cloud), on premise or in mixed environments where the data storage is kept in house and processing happens outside.

All communication between services and databases is transaction based, securely encrypted and uses standard REST protocols over HTTP and HTTPS. Three powerful HTML based graphical user interfaces are provided for defining, managing and monitoring processes. Vinna is available for small projects but also incorporates all enterprise features need for large production systems. The biggest Vinna customer now processes 100M documents p.a. in one system, which is 400.000 documents per day.

Whitepaper and data sheets are available if you are interested in further details, please contact us through info(at)skilja.com to obtain your copy.

Process as a Service

skiljaadmin — Wed, 05 May 2021 13:57:54 +0000

Imagine that you have created a powerful process for superb document automation using all kind of advanced recognition, image processing and AI technologies available. With these technologies it is possible to automate almost any document driven process that involves repetitive cognitive tasks like classification, indexing and decision making today. VINNA by Skilja is a powerful platform that enables and orchestrates the LAERA components by Skilja to perform all these miracles. In addition VINNA plugs in a lot of other powerful tools from other technology companies like barcode recognition, office format conversion, e-Mail normalization, PDF-A generation etc. etc.

Now as this process is built and works well, the question arises how to integrate the new capabilities in your line-of-business applications and existing processes. File import and export is insecure and outdated. Full integration requires too much effort from IT that might not be available.

Fortunately there is a solution 🙂 : By plugging in an Event Driven Activity (EDA) you can enable ANY process to be accessible through a standard web service protocol. Simply by adding the EDA to an existing process you make it available to a RESTful service call. The EDA can be the only start point of a process but it can also be added in addition to existing starters like file importers, message queues or IMAP collectors that pump documents into the same process.

Typically you add one EDA starter and one or several EDA Reporter or Listeners. These are accessed through a simple web protocol by the EDA consumer. The Consumer can either be a web page (as in the example below) or a Windows application like an RPA client that automates the cognitive task scheduling. Both are provided as sample source codes with your installation. At any stage of the process the Consumer is optionally updated via event on the progress of the work item in the process. When processing is finished the results are retrieved either through an event or from a queue that is queried via REST.

VINNA Process as a Service schematically

The process can be deployed anywhere – there is zero setup. Simply use the URL of the service and of course the credentials for the secure and encrypted communication with the authentication service. The process can sit anywhere in the cloud and be used by any client world wide that has access –

Through EDA VINNA is used completely in slave mode and the consumer is shielded from any complexity of the process.

The process can contain any number of steps and routes, including manual correction, approvals or even sending tasks to the crowd. The Consumer will see none of this, it will simply get notified on the results on a standard API irrespective of the process. So when the process is changed and enhanced, the Consumer can stay as it is. This is the true power of process as a service. “Technology under the hood”- Especially when combined with online learning that will continuously improve the result.

Vinna Process as a Service for RPA with Classification Example

A nice example is shown in the graphic above. The Consumer (in this case any RPA client) choses to use a classification service by selecting the process (through EDA) and the classification project that should be used. In this case both classification activity and classification Web designer access the same classification model in the database. Therefore an admin user can even modify the classification model and the taxonomy or create a new one. Because the process itself and the call to use it stay unchanged. This process also contains a manual correction step (Batch Review) that is conditionally used in case of uncertain results. The system can even send a link for a correction task as a web page so the user can make any decisions or corrections of submitted tasks herself without having to install anything. The corrected results in turn will then be aggregated and used for online learning. If several hundred users are using this process it will quickly optimize the results automatically without any further effort. And at all time the actually processing can happen anywhere in the world on any cloud server.

This is the power of process as a service.

Reading Medical Reports

skiljaadmin — Fri, 27 Dec 2019 14:08:21 +0000

Medical Reports are complex documents that are written by doctors who use their specific language and style to express not only facts but also hypotheses and suggestions. They are intended to be read by other doctors or experts who have a deep knowledge of the subject at hand and can make judgements based on what they learn. And in the end the information contained therein is of vital importance for many decisions that are to be taken – medication, change of life style, possible surgery but also cost of insurance.

So reading medical reports using Artificial Intelligence and machine learning is a challenging task. The use case described here utilizes Laera Information Extraction to assist experts at a life insurance to calculate the health risk for persons. In the end the decision needs to be taken by doctors but the system can greatly assist them to sift through the amount of text provided. Because medical reports attached to a life insurance application can easily exceed 100dreds of pages.

Laera Information Extraction uses advanced AI methods to assess the risks contained in these reports and points them out to the experts. In a first step the diagnoses are extracted base on the common ICD-10 code. But a simple word search is not enough because each diagnosis needs to be put into its context. Here the option of Laera to assign multiple roles to an entity becomes very useful. Once a critical diagnosis is found Laera makes an assessment based on several categories:

Is the polarity negative or positive? In most cases symptoms are excluded in the reports and therefore the diagnosis is negative. These are of no interest (of course the patient is happy about that). Only the positively confirmed ones are relevant for the risk assessment.
Is the diagnosis for the present or the past? Many reports contain a lot of history of what happened in the past. While the history might be interesting, the main focus is on the current situation.
Is the diagnosis for the person herself or maybe for the family? Family history (Father had heart attack) might be important but needs to be assessed differently.

Finding Diagnoses, polarity and roles

Laera intelligent extraction performs all these tasks and analyzes all pages in milliseconds using semantic and structural methods:

Find all diagnose and symptoms
Find the polarity (is the diagnosis excluded or asserted
Determine the context (role e.g. self or family)
Classify the paragraph by relevance
Present and auto-summarize results in ICD10 terms
Highlight relevant areas in the document for quick visual confirmation

Of course, just to make sure this does not get lost, the roles and assignments are not defined by rules but trained using machine learning from a few hundred labeled examples.

Summary of symptoms with polarity in ICD-10 tree

The customer using this system could reduce the time spent to assess an application by more than 50% as the experts get a prepared data set that allows them to quickly jump tp the relevant sections and make the decision.

It is also important to note that this is not a hard coded special solution, but an example of the application of the Laera Information Extraction product. Any other industry can use this approach to solve their specific requirements. Example range from contract management to court documents, but of course also standard default extraction tasks can be easily solved with Laera Information Extraction.

If you are interested to learn more about this use case or the application of AI extraction in your specific domain please let us know via e-Mail at info(at)skilja.com and we will be happy to provide more information.

The Magic of Online-Learning

skiljaweb3 — Sun, 03 Feb 2019 08:26:58 +0000

Wouldn’t it be nice if your AI enabled document processing system would continuously take the input from user interactions and use this information to improve the quality of recognition over time? And nobody would have to take care of this – even in the case of hundreds of document classes with dozens of index fields each. In the best case the system would be easy to set up, run completely unattended in background and work like a charm.

This is what Skilja with its Laera Classification and Extraction software suites provides. We have completely implemented this new paradigm which is available either as SDKs or as integrated modules to our Vinna Document Processing Platform. But of course what looks easy for the user requires significant infrastructure and automated checks and balances to make this a reliable and stable part of your processing tasks.

Machine Online-Learning of document classification and recognition uses supervised and unsupervised continuous training of incoming data streams. Supervised learning will take the corrections the users have made, analyze them and apply them as new patterns as appropriate. Unsupervised learning will use the results of successful and correct classification and extraction to generate additional knowledge (expanding the space) and statistics of usage of existing knowledge. Both combined are then used to continuously improve the system. The infrastructure is set up quickly and consists of services that do the work in the background: collect statistics, collect samples, analyze the validity of the new data and publish them to the production runtime system if the AI has determined them to be valid additions.

As we know that system administrators might be vary of having their setup changed automatically (at least until they have seen it really works) there is several intermediate levels of AI automation that they can chose. The most important are:

Have all changes and each new document manually reviewed, benchmarked and checked before explicitly publishing it. This is the box on the left
Have automatically created improvements be reviewed and explicitly published
View any conflict and resolve them manually (or at least check them)
Restrict the users that can contribute to the training to a certain group. Only corrections from this group will be taken into account while the input from less experienced users will be discarded.

But in the end learning can run completely unattended. As in school (think exams) we need to check the validity of the new knowledge before we apply it. Therefore Laera algorithms will always analyze for conflicts that are created and try to resolve them. Im addition each new revision of the training pattern will fully automatically be quality checked in background and only be accepted if the recognition results of the new model exceed the existing one. This is an assurance for the production system: Changes in quality will always only go into one direction – better!

Again, this is not a black box but Laera provides precise insight of what is happening and lets you influence or even revert the suggested improvements at any stage. Laera Monitor is the tool for this, a web application that shows the continuously measured quality numbers of your system.

The example shown here shows a typical curve for the F1 score (average quality measurement). Starting with a setup of a few hundred trained documents the quality quickly deteriorates as new and unknown samples arrive in production. Especially when the real volumes start to be processed. It is interesting to see that the precision stays high close to 95% which is very satisfying, but recall (recognition rate) goes down as the system simply does not “know” the new documents. But then online learning kicks in and uses the new samples and corrections made to quickly improve the quality to 95% after a few thousand new training documents have been processed.

Online Learning will make classification and extraction much easier in the future. After an initial setup AI will simply learn in background what needs to be known to arrive at he best possible automation rate within a few weeks. This makes a whole new area of processes (for example with smaller document volumes) available and will greatly improve quality for existing automation processes.

Please let us know if you have additional questions or need more insight or have a direct interest. Contact us under info (at) skilja.com.

Vinna 2.0 Released

skiljaweb3 — Fri, 06 Jul 2018 12:58:05 +0000

We are happy to announce the release our new version Vinna 2.0, the 4th Generation Document Processing Platform.
Since the last major release, a year ago, we have worked hard – and a big thanks to the team – to focus on enterprise features that make Vinna’s capabilities unique in the market.

Main emphasis was put on easy and secure deployment into runtime environments in a customer infrastructure. Vinna allows designing and testing of a process completely outside of production and then using a built in staging system to move it securely to production. This step is called publishing. Vinna 2.0 allows the administrator to also configure the actual runtime environment completely from central Process Editor by distribution of activities to activity servers, assignment of reservations and definition of priorities. A process is such automatically deployed to a target environment and to the activity servers including all configurations but also all binaries. This is schematically shown in the graph below. At no time the runtime server need to be accessed except through a browser and HTTPs protocol. In addition processes are versioned in design and in runtime to avoid any interference of new settings with existing production work items.

Publishing a Process and Activities in Vinna 2.0

Support for Oracle, Azure SQL, Azure Blob has been added on top of existing MS-SQL and NoSQL databases. Variables allow to predefine connection strings, import paths and any other parameter specifically for each target environment. One example: In a test system you might define a user with admin rights on a lookup database for testing purposes but this user will be automatically be replaced with a more restricted user on the production system.

Vinna 2.0 Runtime Architecture

Vinna is an open and process oriented platform, that allows users to define a process in exactly the way as it is optimally operated in a company. The design allows full flexibility in the data model with a hierarchical document model supporting batches, folders, documents, pages. The documents are processed as work-items in the flow and passed through activities. The activities are either standard tasks like OCR or Classification, or custom tasks as integrations into the platform. Any number of activities can be defined in the process as micro services, including arbitrary routing decisions based on intermediate results.The architecture of Vinna is service oriented (SOA) and the runtime is easily deployed either in the cloud (Microsoft Azure or private cloud), on premise or in mixed environments where the data storage is kept in house and processing happens outside.

Whitepaper and data sheets are available if you are interested in further details, please contact us through info(at)skilja.com to obtain your copy.