Essentials | Skilja

The End of OCR: How Word-Level Understanding Changes Everything

Alexander — Thu, 30 Oct 2025 16:10:19 +0000

For more than half a century, OCR—Optical Character Recognition—has meant one thing: machines deciphering single characters. From early template-matching in the 1960s to the statistical engines of the 2000s, OCR has always approached reading as a mechanical process. It segmented text into character-shaped fragments and tried to analyze what each one meant, letter by letter. But that era is now ending. Modern systems don’t read characters at all—not individually. They read words, phrases, and even meaning. OCR has evolved into something fundamentally different, and this shift delivers a new level of accuracy, fluidity, and naturalness that previous generations could not reach.

This is what Skilja has done with Lesa, our deep-learning, transformer-based system designed to read like humans do: holistically, contextually, and intelligently.

A Brief Look Back: From Characters to Context

Traditional OCR started as an analogous process (therefore the O = Optical) and went through several stages:

1950s–1980s:Rigid template matching—effective only with perfect typewritten pages.
1990s:Feature extraction and early machine learning—better but still brittle.
2000s–2010s:Statistical modeling and improved analysis workflows—good enough for books and printed forms and constrained hand print, but always character-bound.

Even at its best, classical OCR remained a guessing game. It mistook 1 for l, turned smudges into glyphs, and struggled with anything outside its narrow expectations. Especially handwriting.

The problem wasn’t better algorithms—it was characters – but humans read words and not characters as anyone who has watched a child learn to read will confirm.

The Deep Learning Shift: From Decoding Shapes to Understanding Language

Transformers changed everything. Instead of interpreting characters, transformer-based models interpret sequences, context, and linguistic probability. They don’t see text as isolated shapes but as parts of sentences, paragraphs, and concepts.

This allows Lesa to:

recognize entire words, not just letters,
use surrounding text as context,
maintain coherence across full pages,
and adapt to different visual styles.

Reading becomes a language-understanding task, not a pixel-decoding task.

Lesa: Built for Word-Level Intelligence

Lesa is designed from the ground up to treat a document as a linguistic landscape. Using a transformer architecture trained on diverse text and real-world images, Lesa doesn’t ask, “What is this character?” It asks, “What does this word say—and how does it fit into the sentence?” This matters since now we have:

Far fewer errors: No more character-by-character error cascades.
Natural output: Clean spacing, correct punctuation, coherent text.
Font and layout robustness: Works across stylized text, messy receipts, tables, multi-column pages.

One of the most transformative outcomes of word-level understanding is that constrained handwriting—the kind found in forms, notes, medical records, delivery slips, and corporate paperwork—now works much better. This means that messy capital letters, box-filled handwriting, and half-printed forms suddenly become highly readable. Handwriting recognition suddenly is at our fingertips and routinely applied.

Calling this the “end of OCR” isn’t hype. It’s a technical recognition of what has changed

Traditional OCR = character recognition
Modern OCR = language understanding

Lesa is part of a new generation of systems that read documents the way humans do—by interpreting words, not decoding symbols.

OCR as we knew it is over. Something better has replaced it. Lesa represents this new era.

Auto Classification and Bias

skiljaadmin — Tue, 22 Jul 2025 12:01:16 +0000

Personal bias and individual opinions are a big issue in standardized business processing if they happen to influence the outcome of a process and the decisions made. Nobody wants to be subject to random changes in the outcome of a personal request – and yet it happens. Because humans have a bias in how they see facts, based on their education, cultural background and even the mood they happen to be in at a certain time in the week. So in addition to different persons making different decisions you can even expect that the same person makes different decisions during the week. You just look differently at a task on Monday morning than on Friday evening. The reason is so-called priming which happens to all of us day-by-day through our experience, knowledge, physical condition, context and a lot of other small factors.

In a recent article on the meaning of words we have shown how the sound of words influences our perception. There are a lot more linguistic associations that influence the way we think and behave, which introduce bias. If for example I tell you that I am driving north across a hilly terrain, would you expect the trip to be mosty uphill or downhill? In fact most people associate movement to north with uphill and to south with downhill. An interesting study by psychologists Leif D. Nelson from UC San Diego and Joseph Simmons from Yale shows that these associations can actually be measured and produce some strange biases: People think it will take longer to travel north than south, that it will cost more to ship to a northern than to a southern location, and that a moving company will charge more for northward movement than for southward movement. A similar study concluded, that people assume that property is more valuable when it sits in the northern part of town. Of course these opinions stem from the decision by the old Greeks to plot the map of the world with northern parts above the south. But also it shows us clearly how much we are biased by our language and of course north/south is only one of many linguistic associations we are exposed to.

Ancient mapmakers introduced north and south unwittingly, but lawyers do have an intention when they describe car accidents. While the defense might call a car accident “contact”, the plaintiff might say one car “smashed” the other. Elizabeth Loftus and John Palmer showed in a classic experiment that these labels really matter. They had a group of students watch the same series of traffic accidents. Then they were asked to estimate the speed of the cars when the accident occurred. When the scene was described in a way that the cars “contacted” one another, the average speed estimation by the students was thirty-two miles an hour, whereas the estimate was forty miles an hour when they said that the cars “smashed” one another. In a another experiment, 14% of participants incorrectly remembered seeing shattered glass when told that the cars “hit” one another, whereas 32% of participants made the same error when told the cars “smashed” into one another. This shows that even a single word can change how people remember an event they witnessed only minutes earlier – making it very clear how priming can bias our decisions.

This brings us back to auto-classification. A classifier like the Skilja Content Classifier is trained with representative samples that are collected by a group of people. If applied to a bunch of documents it will then make the same decision over and over again. It represents – through machine learning – the average opinion about the content of a document and will repeat it without tiring. Same in Monday morning and Friday evening – with a speed of several 100.000 pages per hour. It will make errors – based on statistics – but not more than a human. And the errors are reproducible and can be corrected if necessary. If you have ever thought about compliance – this is a good example. Because compliance does not say that you cannot make errors. It says that you need a reproducible, documented procedure how you store and treat your documents. Auto-classification can help to achieve this goal. It is a great tool for boosting productivity. But it is even more helpful to avoid bias, irreproducible results and non-compliance.

How Meanings of Words Change

skiljaadmin — Fri, 02 Dec 2022 11:57:32 +0000

We all know that our language is fluid and words can change their meaning over time. Words get extinct and new words are created but more often existing words are adapted to new circumstances. It is interesting to see how this happens in the course of years but sometimes words change their meaning overnight.

In a study on “Statistically Significant Detection of Linguistic Change”, published last year (available online from arxiv.org), the researchers Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena have used data mining to find out how the way we use words is revealing the linguistic earthquakes that constantly change our language. There findings are very interesting for anybody who works professionally with text analytics as it reveals a lot about how semantics in our language work. Kulkarni et al. have tracked these linguistic changes by mining the corpus of words stored in databases such as Google Books, movie reviews from Amazon and of course Twitter.

In the pre-internet times the usage and meaning of words changed relatively slowly. This is can be seen in the metamorphoses of the word “gay” from its social meaning in the fifties to the purely sexual-orientation meaning in our time. This is nicely displayed in the word cloud view below:

A faster change occurred in the 1970s to the word “mouse”, when it gained the new meaning of “computer input device” and later, the word “windows” was used internationally as the name of the Microsoft operating system within a few years.

Today the meaning of a word can change almost instantly. Before October 2012, the word “sandy” was an adjective meaning “covered in or consisting mostly of sand”. Then Hurricane “Sandy” approached. Almost overnight, this word gained an additional meaning as a proper noun for one of the costliest storms in US history.

Now this might not sound like a big deal for us – but just imagine the insurance industry and the thousands of e-Mails they suddenly receive referring to damages by Sandy! If they are using a static or rule based classification system, they might easily miss the point on these.

So this is a big challenge for automatic classification systems that have been trained in a machine learning algorithm on a specific set of documents containing words in a specific meaning. When these meanings suddenly change or the usage of words is widened suddenly the classifier will make wrong decisions. The only way to cope with this problem in a living and productive system is continuous learning. The system must learn from user corrections – supervised learning – but also from good classification that contains some new aspects and features. This is called unsupervised learning and is so important as the number of documents that can be used for training the system is much higher than the manual correction. By unsupervised learning – or better enhancement, which also includes forgetting btw – the classification system will be able to cope with changed meanings over the time. Abrupt changes as mentioned above will lead to a drop in classification rate from which the classifier will recover within a few days – like humans who will also need a short time to adapt.

IDP: Solving bot illiteracy in the digital workforce – Part 2

Guest — Wed, 21 Oct 2020 10:12:00 +0000

Editor’s note: This is a guest post from Jupp Stöpetie

In this post we examine the role of Intelligent Document Processing (IDP) relative to Robotic Process Automation (RPA) and how these technologies drive Digital Transformation when combined. Part one was looking at what is driving and enabling Digital Transformation. This part two is dedicated to RPA and why IDP is essential.

Robotic Process Automation

Since maybe 3 or 4 years RPA has become the tool of choice for automating repetitive tasks for many companies. RPA vendors claim that their systems are easy to set up and maintain without the need of coding. Vendors propagate that RPA systems can be operated by business managers with some light-weight training. Basically every manager can create bots that replace human workers. This has led to a wide spread of RPA systems in businesses. There is no need anymore for complicated, lengthy and costly automation projects managed by IT departments. RPA puts automation and the use of AI in the hands of business managers. Today the workforce in companies is increasingly a combination of humans and robots. But of course things are more complicated when you look a bit deeper than the RPA marketing collaterals. What is for example when we need to retrieve information from a document? That is no problem for human workers. But bots have no humanlike reading skills. They only can “read” data from structured data like files and databases. That is because it is easy to instruct bots with the exact location where to find the data.

Why bots can’t read documents

A document can be seen as a container for content. The content is made up of static data and explicit and hidden information describing the relationships between the data which gives meaning to the document. A document also has meta-data being the properties of the container itself. Content is represented in documents in a way so that humans can process the content. Note that this has an important implication. Most documents were never designed to be read by bots. When data must be extracted it doesn’t matter for human workers if a document has a fixed structure like in a form, a semi-structure like in an invoice or no structure like in a contract. With some proper instructions human workers will be able to find the data they are looking for. Depending on how much structure there is, processing time may vary significantly of course. Bots however have no cognitive reading skills. And adding OCR and data capture technology to an RPA solution is often not enough to make bots really skilled at processing documents.

Photo by Arlington Research on Unsplash

Why OCR and Data Capture often do not offer the right reading skills for bots

The short answer: these technologies fall short because they were not developed for RPA. OCR was not designed to understand content. OCR is a technology for converting pixels into characters. Most OCR packages can also convert document images (scans) into text files while recreating the original layout. Data Capture systems use OCR technology and also many other AI technologies. Data capture systems were designed for extracting data from large volumes of documents with the highest possible accuracy. Neither OCR nor data capture were designed for RPA users to teach their bots how to read stuff.

The users who usually set up data capture installations are engineers who know exactly how to use all the levers and parameters of these systems. They often will create scripts or even add self coded additions in order to achieve the highest possible accuracy. The initial set up is high but that makes a lot of sense. The ROI of data capture installations doesn’t have to be fast. Almost always these installations are set up to run for many years.

Batch oriented data capture systems, although very powerful when setup correctly, logically are not the first choice of RPA users when they need to add document processing capabilities to their bots. These users are looking for simple, easy, fast and flexible functionality. The volumes they need to process are small. They also have higher needs for these systems to learn while doing because not the whole spectrum of variability in the documents that need to be processed will be available at the start of the project. And often they also need a higher level of intelligence because they want to automate tasks that were formerly performed by humans. And when they design new processes RPA users want to create intelligent bots that behave just like humans. But what RPA users cannot handle and what would break the RPA paradigm is if adding reading skills to bots comes at the expense of requiring a lot of investment and special technical skills like solution design, coding, production testing etc.

Note that in cases where RPA systems are used for processing large volumes of documents it makes sense to use data capture systems to extract data from these documents. Because in these cases efficiency aka accuracy most likely will play a significant role. Processing large volumes of documents is data capture’s sweet spot.

What RPA users really need when they have to process documents is something that may look a bit like OCR and data capture but is much smarter than that because it is operating a much broader set of AI technologies and at the same time it also should be easier to use.

IDP: a new product category for a new market

The massive pervasion of RPA installations in companies that happened during the past 3 to 4 years has led to an increasingly high demand for these easy, simple, flexible but yet powerful intelligent data capture solutions. This fast growing demand has spawned a new generation of companies who has gone down a different path using different technologies than what the incumbents in the data capture market have been doing for more than 20 years. These new systems are all based on the idea that Deep Neural Networks and other forms of Machine Learning are better and much easier ways to fulfill the needs of RPA users. All you need is a lot of samples, train your neural networks and off you go. What however is unclear at this stage is if ML can actually deliver the accuracy that is needed when bots should have humanlike reading skills executing mission critical tasks. When deep neural networks make mistakes you cannot go in and correct for these mistakes. These systems are black boxes. It is noteworthy that all incumbents are updating their existing offerings with adding ML technologies. Especially with the goal to become better at processing unstructured documents. And it seems not such a ridiculous assumption that these companies who have many years of experience developing document processing systems have an advantage over the competition that is fully ML focused. It looks quite plausible that incumbents are better set up to marry old AI and new AI technologies based on their solid understanding of how to build robust document processing systems.

All these developments of old and new companies to develop document processing skills for RPA bots have led to a new category of products that cater to the digital transformation market and not so much to the traditional capture market. The emergence of these new products was the reason for the Everest Group, a leading management consulting and research firm to come up with a new product category: Intelligent Document Processing or IDP.

Everest Group defines IDP as any software product or solution that captures data from documents (e.g., email, text, pdf, and scanned documents), categorizes, and extracts relevant data for further processing using AI technologies such as computer vision, OCR, Natural Language Processing (NLP), and machine/deep learning. These solutions are typically non-invasive and can be integrated with internal applications, systems, and other automation platforms.

Photo by Markus Spiske on Unsplash

About OCR and Online-learning

In many blogs about IDP authors create a contradiction between OCR being the old fashioned unintelligent way of data extraction and modern AI based extraction methods that are state of the art and intelligent. First of all as I pointed out OCR is not data extraction. OCR is one of many AI technologies that are used in data capture systems. And there are different ways we can build intelligence into systems that will help to find and interpret data in documents. Machine learning may perform better on unstructured documents. When dealing with forms and semi-structured documents systems that use templates, classifiers and other AI technologies including machine learning will almost always outperform systems that are solely based on machine learning. Note that these comprehensive systems operating ML in a smart way will learn from automated feedback through users correcting mistakes and from the results of successful and correct classification and extraction to generate additional knowledge (expanding the space) and statistics of usage of existing knowledge. See The Magic of Online-Learning.

What is the ideal IDP solution?

The challenge for intelligent document processing is that there seems to be no one ideal approach. Depending on the type of documents, the volumes that have to be processed and the importance of accuracy ML or more traditional AI approaches or a blend thereof will be the best choice. For example when document volumes are big accuracy is important. When a shared service centre for example is processing 50 million documents a year improving accuracy from say 95% to 96% is significant. We just reduced the number of documents that have to be corrected by 500k. Another case where accuracy is critical is in straight-through processing.

It seems that the best option customers have is to adopt an IDP platform that enables them to operate different solutions or combinations of such solutions while shielding users from the complexity of these solutions. Vinna is such a platform that even allows to add crowdsourcing (Human In The Loop Verification) without users ever being aware of the complexity of what goes on under the hood.

———————————————————————————————

Jupp Stöpetie as CEO of ABBYY Europe established ABBYY’s presence in the Western European markets, growing the brand and market presence to a leadership status for +25 years. His experience includes founding and growing companies and managing all levels of business operations, sales, and marketing. Jupp left ABBYY in spring 2020 and now works as an independent consultant based in Munich, Germany.

IDP: Solving bot illiteracy in the digital workforce – Part 1

Guest — Wed, 15 Jul 2020 10:11:47 +0000

Editor’s note: This is a guest post from Jupp Stöpetie

In this post we examine the role of Intelligent Document Processing (IDP) relative to Robotic Process Automation (RPA) and how these technologies drive Digital Transformation when combined. In part one we look at what is driving and enabling Digital Transformation. Part two then is dedicated to RPA and why IDP is essential.

Digital Transformation

Companies all over the world are redesigning and digitizing their businesses at an ever faster pace and they have many reasons for doing so:

They want to serve their customers faster and better.
They understand that automation and using AI technologies improves innovation, agility, scalability and cost-efficiency
They appreciate that in contrast to human labor-intensive processes digital processes
- take less time to design
- need less capital investment
- are faster to deploy
- and (much) less costly to run.

The above is commonly referred to as Digital Transformation. Citing Salesforce’s definition: “Digital Transformation (DX) is the process of using digital technologies to create new — or modify existing — business processes, culture, and customer experiences to meet changing business and market requirements.“

The main drivers of Digital transformation are the ongoing globalisation and the Fourth Industrial Revolution which has led to dramatically increased levels of competition. DX greatly improves the agility of businesses so they can adapt much faster than ever before to changes in the market. Changing or even completely redesigning digital processes is a lot easier and comes at much lower cost which results in significantly increasing a business’ competitiveness. Note that even terminating digitised processes comes at a much lower cost than when a lot of capital investment and labor was involved. Basically businesses have no choice. They must become digital. And those who are slow to change find themselves in an increasingly disadvantageous position. New businesses nowadays will always start with a digital concept in mind and by doing so will avoid manual processes if that is possible and makes sense.

Photo by Mike Kononov on Unsplash

Market Data on Digital Transformation

What actually makes things very different from say ten years ago is the enormous progress. Worldwide spending on technologies and services that enable digital transformation (DX) by business practices, products, and organizations is forecast to reach $2.3 trillion in 2023, according to a new update to the International Data Corporation (IDC) Worldwide Semiannual Digital Transformation Spending Guide. DX spending is expected to steadily expand throughout the 2019-2023 forecast period, achieving a five-year compound annual growth rate of 17.1%. “We are approaching an important milestone in DX investment with our forecast showing the DX share of total worldwide technology investment hitting 53% in 2023,” said Craig Simpson, research manager with IDC’s Customer Insights and Analysis Group.

In their July report Grand View Research reported that the global Robotic Process Automation market size which is a segment of the overall DX market was valued at USD 1.40 billion in 2019 and is projected to exhibit a compound annual growth rate (CAGR) of 40.6% from 2020 to 2027.

Why is Digital Transformation taking place now

In the last decade we have seen a tsunami of digital transformation projects driven by an accelerating desire of companies to increase competitiveness and innovation. But of course one could argue that businesses always had that desire. The question one could ask is: Why is all that happening now? Haven’t companies not been automating for decades already? Yes, but not at the current pace.

Note: at this stage it is unclear how the Covid-19 pandemic will influence market dynamics. It seems however unlikely that the need for companies to digitize their businesses will slow down. On the contrary, it is much more likely that the opposite will happen.

What has made things totally different from say ten years ago and has really enabled Digital Transformation to take place is the enormous progress in both computer science and in the computer industry. Compared with say 10 years ago we see:

an enormous increase in computational power
the rise of super powerful algorithms, algorithms that need incredible amounts of computational power which is now available
vastly improved connectivity both in speed and access points and it is improving at an accelerating pace (5G)
smartphones: in 2009 170 million smartphones were sold. In 2020 1.5 billion units are estimated to be sold. That makes for an estimated 3.5 billion people having a smartphone in 2020
huge and disproportionate – compared to the rest of the economy – amounts of money, that have been invested in tech companies. Successful tech companies have rewarded their investors with multiples that dwarfs any other industry.
and last but not least an insistently growing appetite of consumers and businesses for more, better and faster service anywhere at any time.

Some examples of how robotic process automation and document processing drive Digital Transformation in companies

A large international pharmaceutical company wanted to capture all details from their purchase orders and perform lookups and validation using their ERP. Tasks were automated using a RPA system. An intelligent document processing system was needed to read the data from the POs.

A large financial services company wanted to use RPA to automate their KYC process which involves capturing, verifying driver licences, passports etc. Early in the process they found that they also needed an intelligent document processing system.

A global logistics company wanted to automate their invoice processing (millions of documents). Their RPA system was found to be up to the challenge. But the company initially backed off because of the complexity of extracting data from millions of invoices (semi-structured) with a wide range of varying lay-outs. Only after a proof of concept clarified that there was an intelligent document processing system on the market that was up to the challenge the company proceeded with the project.

In part 2 of this post we will discuss how the combination of RPA with IDP works and what are the challenges that need to be considered.

———————————————————————————————

The Meaning of Words

skiljaadmin — Wed, 18 Mar 2015 13:07:58 +0000

A famous poem by German poet Christian Morgenstern starts with the line that probably everybody has heard once: “Die Möwen sehen alle aus, als ob sie Emma hießen” which is in the translation by Karl F. Ross: “The seagulls by their looks suggest that Emma is their name”.

And in fact, if you think about it, isn’t the name Emma associated with a certain impression of a person, of a situation of a sentiment? The same is true for many other names and expressions. Because our mind is implicitly associating a linguistic meaning to words based on the sound denoting a concept that is expressed by the word going beyond the pure description.

Another good example is given in the New Yorker, that inspired this article, which quotes in the issue from May 2013: “Dawdle” and “meander” sound as unhurried as the walking speeds they describe, and “awkward” and “gawky” sound as ungainly as the bodies they represent.

This observation led the German Gestalt-psychologist Wolfgang Köhler in 1929 to perform a psychological experiment to measure the non-arbitrary mapping between speech sounds and the visual shape of objects. In a famous experiment, which is now known as the Bouba/Kiki effect (see Wikipedia here) Köhler showed forms similar to those below and asked participants which shape was called “takete” and which was called “maluma”.

I am sure if you think about this for a moment you go with the vast majority of the respondents who associated the round and soft shape on the left with the word “malumba” while “takete” apparently seems to represent something jagged and sharp. In the following experiments using the words “kiki” and “bouba” 95% of participants selected the curvy shape as “bouba” and the jagged one as “kiki” in groups as diverse as American college undergraduates and Tamil speakers in India, implying that there seem to be deeper concepts to words we use. And if you are reading in a foreign language (like I do), didn’t it happen to you also that you didn’t know a word but somehow understood what it had to mean based on the sound?

You can see this effect also very clearly in the naming of products. Consider Clorox (producer of household bleach) and Chanel (high-end perfume). Switch the name and product and you get the idea. The words bear a subconscious meaning related to the sound. Sound elements in speech are called phonemes – which are related to morphemes in Grammar.

What has this to do with classification and document understanding?

Well, a good classifier will make use of these ideas to better train concepts and understand language. Meaning that it will not use a bag-of-words approach only, but break language down into smaller parts that allow to associate meanings with the phonemes. The Skilja Classifier follows this approach in extracting features from language.

Especially for short texts that are not very explicit and for sentiment analysis this method is far superior to using words. Take for example social media where words very often gradually change their meaning over time or new words are invented. But even if invented the new words follow the same sound pattern as existing words. We will use this approach for sentiment analysis based on pronunciation of quasi-words in Tweets. It will be interesting to see if the observed link between a meaning of a word and its sound can be statistically used to better understand sentiment.

Already today we see a significant difference in the quality of document classification if we compare pure word based classification with the richer statistics gained from morphemes. This is different from language to language with English being very sensitive while for example Mandarin does not show the effect at all as the symbols used for writing do not correspond to actual sounds.

We will follow up this interesting topic in number of future posts sharing ideas and results. For now let me just give you the final four lines of above poem by Morgenstern:

O human, you will never fly
the way the seagulls do;
but if your name is Emma, why,
be glad they look like you.

Visual Classifiers From Random Images

Alexander — Mon, 03 Nov 2014 13:21:47 +0000

Now this is an interesting experiment that leads us very close to the touch point between machine classification and human imaginations.

As described in previous posts, auto-classification algorithms are using features that are extracted from the objects to be classified (images or text) and are represented in a feature space. Classification can be described as finding the correct separation plane (in many dimensions) between the features for different objects. A group of researchers from MIT (Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba) has now used an interesting approach to get a glimpse of how this feature space might actually exist and look like in our minds. They have generated random white noise in the feature space and inverted this noise to actual images. These images then have been presented to humans and they have been asked if they resembled certain well known objects. The results are quite fascinating:

All image patches on the left are just noise. Many thousands of them were shown to online workers and they were asked them to find ones that look like cars. See full scientific paper here.

Most of the time these random images will appear to people as random. But every now and then somebody will feel that an image does remind them of a car. So we set this image aside. And repeat. After assessing 100,000 images in this way, we end up with a set of essentially random pictures that remind people of cars. We then take the average of these and find something interesting. The resulting image does indeed look like a blurry car, not a specific kind of car but a very general template of one.

Mathematically speaking this noise-driven method estimates the decision boundary that the human visual system uses for recognition.

My favorite example of the ones that have been tested is the fire hydrant that emerges from random white noise images.

Now as the random noise was actually generated in the feature space and not in the images themselves the researchers can deduct which features actually lead to the recognition of objects. And hence get an understanding on how human recognition of objects acually works. Because humans have some remarkable capabilities to recognize objects that they have never seen, touched or smelled before. This understanding of the actual feature selection in human minds will help us in the future to derive new classfiers that more closely resemble the way we all do object recognition and includes the human bias. Because this is one of the most interesting results of the study: The object that emerges depends on the cultural background of the persons selecting the random images. For example when online workers from India were asked to find a sports ball then a red circular object emerges. Because the most popular sport in India is cricket which is played with a red ball. Ask the same question to US workers then a orange ball appears – think of football or basketball.

The same question of human bias we find in our daily work in classification of documents in big enterprises. Every person will classify a document set a little different from their co-workers – thus leading to a lot of inconsistencies. Which can be overcome with auto-classification, which will always make predictable and consistent decisions.

The research described here provides a very interesting insight into the nature of human mind. It will also allow us to work on refined methods of classification that more closely resemble the way humans make decision.

Practical Semantics

Alexander — Fri, 17 Jan 2014 13:30:00 +0000

As you have seen in previous posts I was pointing out the difficulty for software to really understand English language mainly lies in the ambiguity of words and their related meanings. The technology to resolve this difficulty is called semantic analysis. I recently found an interesting and funny text summarizing these diffculties. If you have read this blog up to now you will easily be able to recognize, which of the words below are homonyms and which are polysemic, which can be resolved by syntax (e.g. same word as a noun and as a verb) and which need semantics. I was not able to find the original source, but many thanks to those who compiled it. Enjoy reading (I have highlighted my favorite in blue further down):

1) The bandage was wound around the wound.
2) The farm was used to produce produce.
3) The dump was so full that it had to refuse more refuse.
4) We must polish the Polish furniture..
5) He could lead if he would get the lead out.
6) The soldier decided to desert his dessert in the desert.
7) Since there is no time like the present, he thought it was time to present the present.
8 ) A bass was painted on the head of the bass drum.
9) When shot at, the dove dove into the bushes.
10) I did not object to the object.
11) The insurance was invalid for the invalid.
12) There was a row among the oarsmen about how to row.
13) They were too close to the door to close it.
14) The buck does funny things when the does are present.
15) A seamstress and a sewer fell down into a sewer line.
16) To help with planting, the farmer taught his sow to sow.
17) The wind was too strong to wind the sail.
18) Upon seeing the tear in the painting I shed a tear.
19) I had to subject the subject to a series of tests.
20) How can I intimate this to my most intimate friend?

Let’s face it – English is a crazy language. There is no egg in eggplant, nor ham in hamburger; neither apple nor pine in pineapple. English muffins weren’t invented in England or French fries in France . Sweetmeats are candies while sweetbreads, which aren’t sweet, are meat. We take English for granted. But if we explore its paradoxes, we find that quicksand can work slowly, boxing rings are square and a guinea pig is neither from Guinea nor is it a pig.

And why is it that writers write but fingers don’t fing, grocers don’t groce and hammers don’t ham? If the plural of tooth is teeth, why isn’t the plural of booth, beeth? One goose, 2 geese. So one moose, 2 meese? One index, 2 indices? Doesn’t it seem crazy that you can make amends but not one amend? If you have a bunch of odds and ends and get rid of all but one of them, what do you call it?

If teachers taught, why didn’t preachers praught? If a vegetarian eats vegetables, what does a humanitarian eat? Sometimes I think all the English speakers should be committed to an asylum for the verbally insane. In what language do people recite at a play and play at a recital? Ship by truck and send cargo by ship? Have noses that run and feet that smell?
How can a slim chance and a fat chance be the same, while a wise man and a wise guy are opposites? You have to marvel at the unique lunacy of a language in which your house can burn up as it burns down, in which you fill in a form by filling it out and in which, an alarm goes off by going on.

English was invented by people, not computers, and it reflects the creativity of the human race, which, of course, is not a race at all. That is why, when the stars are out, they are visible, but when the lights are out, they are invisible.

PS. – Why doesn’t ‘Buick’ rhyme with ‘quick’?

Taken from: https://plus.google.com/+DavidDStanton/posts/dFxLRjeXmvb

Understanding the Weather

Alexander — Thu, 11 Apr 2013 14:00:47 +0000

If you live in Europe you will agree with me that the weather this spring has been the worst since a long time. In fact it was freezing and snowing all through March right into April. And it has not been much better on the East coast in the U.S. We experienced some really strange weather here in Germany, with cold air hovering over the country creating a permanent hazy mist in the air. And a lot of snow everywhere, even at the Easter holiday which normally is green. I have never experienced such conditions before.

But not only was the weather bad. Also the weather forecasts were very unreliable and often downright wrong. For a long time they predicted an imminent arrival of spring – and then it snowed the other day, then they predicted fog and snow when we actually had some sunny, although cold days. Why did this happen?

The answer is – because pattern matching failed. As many do not know, weather forecasts rely heavily on historical data. While meteorlogists will claim that they run complicated simulations on huge supercomputers to predict the weather, in fact most of the predictions are based on experience. Based on a huge collection of data similar meteorological situations of the pasts are searched and matched. Then the historically observed development is extrapolated to the future. More or less in the same way our grandmothers or every farmer is doing it. Don’t get me wrong, this approach still requires enormous computing power and elaborate algorithms, but – it fails if the situation is very unusual. This is what happened this year. Pattern matching failed, because there was no existing pattern to match with. What remains was the pure “model”. But weather is a highly non-linear phenomenon which cannot be predicted by a model, however complex it is. This has been brilliantly shown by nobel prize winner Ilya Prigogine in his book “The end of Certainty”.

We have a similar situation in document understanding. As we know, statistical systems that have been trained by a user (in so-called supervised learning), work well as long as the documents to process somehow match to what has been trained. But beware if there is an unexpected or unknown input. The statistical classifier will in the best case raise a flag, in the worst case produce a false positive result. Because pattern matching needs patterns to match with.

Fortunately human language is not as complex as the weather as it is fairly linear. Therefore it is possible to describe language with a model that can actually describe and understand the text. This is the Semantic Model. With syntactic and semantic analysis of text, software can go beyond pure matching and really understand text as a human does it. Therefore also unknown input can be reliably analyzed and correctly processed. Semantic analysis is the key to understanding text in the future.

Springtime in Freiburg – in the upper Rhine valley

P.S. Finally – as of today – we are back to normal with stable Atlantic west-wind conditions. Forecasts are very precise again and spring has arrived at last. Springtime in Freiburg – in the upper Rhine valley

Why Semantics is Important for Classification

Alexander — Wed, 13 Mar 2013 14:04:00 +0000

Many automatic classification systems out there today use a pure bag of words approach for finding relevant features that determine the meaning of a document. Few are using correlation and collocation – to account for the fact that words have a different meaning based on their context. None of them is using full semantic analysis of the meaning of words. But this is very much needed to be able to accurately classify a document.

Ambiguous Text on Viking Ship

The main reason is that (especially English) language is so ambiguous. English nouns have on average 5-8 close synonyms. There are words – example “strike” – that have more than 30 common meanings (strike a baseball, strike price buying stock, going on strike as an employee etc.). Now if you use a simple bag of words as features the software will never be able to make a clear distinction between an important fact (strike = work stoppage) and irrelevant information (baseball). Hence the classification result is also ambiguous and not very precise.

This can be solved by a full semantic analysis. A good definition of semantic is given by Wikipedia: “Linguistic semantics is the study of meaning that is used for understanding human expression through language”. Sounds exactly like what we want to achieve in document classification and understanding. Linguistic semantics is actually able to resolve the ambiguity of expressions and assign a unique meaning to the words. This is achieved by analyzing the relations the word has within a text.

The word “plant” is a good example. If used in the meaning of a factory then this is what you can do: You can enter the plant, you can build a plant or close it. But you cannot pick it or eat it. This is the other plant – the “living organism lacking the power of locomotion”. Thus by analyzing these relations a semantic analyzer can exactly determine the meaning of words – in the same way a human brain is doing it – rule based. Therefore we can make sure that we can distinguish the meaning of plants. Not to speak about Apple. – Just looking for Apple in a text without semantics you will find also all the rotten apples, apple pies and of course Big Apple in your results. Only semantics can identify Apple as a company through its relations in a text.

Using semantic analysis as a feature generator for classification greatly improves the precision of classification algorithm and at the same time allows distinguishing between important and irrelevant features in a text. In the same way you are doing through reading. It is obvious that future intelligent algorithms will need to use this technology.

In an upcoming post we will explain how semantic analysis can be used to generalize concepts to find topics which are another important aspect for good classification.