Skilja Blog

Auto Classification and Bias

von | Jul 22, 2015 | Erkennung, Grundlagen, Klassifikation

Personal bias and individual opinions are a big issue in standardized business processing if they happen to influence the outcome of a process and the decisions made. Nobody wants to be subject to random changes in the outcome of a personal request – and yet it happens. Because humans have a bias in how they see facts, based on their education, cultural background and even the mood they happen to be in at a certain time in the week. So in addition to different persons making different decisions you can even expect that the same person makes different decisions during the week. You just look differently at a task on Monday morning than on Friday evening. The reason is so-called priming which happens to all of us day-by-day through our experience, knowledge, physical condition, context and a lot of other small factors.

In a recent article on the meaning of words we have shown how the sound of words influences our perception. There are a lot more linguistic associations that influence the way we think and behave, which introduce bias. If for example I tell you that I am driving north across a hilly terrain, would you expect the trip to be mosty uphill or downhill? In fact most people associate movement to north with uphill and to south with downhill. An interesting study by psychologists Leif D. Nelson from UC San Diego and Joseph Simmons from Yale shows that these associations can actually be measured and produce some strange biases: People think it will take longer to travel north than south, that it will cost more to ship to a northern than to a southern location, and that a moving company will charge more for northward movement than for southward movement. A similar study concluded, that people assume that property is more valuable when it sits in the northern part of town. Of course these opinions stem from the decision by the old Greeks to plot the map of the world with northern parts above the south. But also it shows us clearly how much we are biased by our language and of course north/south is only one of many linguistic associations we are exposed to.

North-South Compass

Ancient mapmakers introduced north and south unwittingly, but lawyers do have an intention when they describe car accidents. While the defense might call a car accident “contact”, the plaintiff might say one car “smashed” the other. Elizabeth Loftus and John Palmer showed in a classic experiment that these labels really matter. They had a group of students watch the same series of traffic accidents. Then they were asked to estimate the speed of the cars when the accident occurred. When the scene was described in a way that the cars “contacted” one another, the average speed estimation by the students was thirty-two miles an hour, whereas the estimate was forty miles an hour when they said that the cars “smashed” one another. In a another experiment, 14% of participants incorrectly remembered seeing shattered glass when told that the cars “hit” one another, whereas 32% of participants made the same error when told the cars “smashed” into one another. This shows that even a single word can change how people remember an event they witnessed only minutes earlier – making it very clear how priming can bias our decisions.

This brings us back to auto-classification. A classifier like the Skilja Content Classifier is trained with representative samples that are collected by a group of people. If applied to a bunch of documents it will then make the same decision over and over again. It represents – through machine learning – the average opinion about the content of a document and will repeat it without tiring. Same in Monday morning and Friday evening – with a speed of several 100.000 pages per hour. It will make errors – based on statistics – but not more than a human. And the errors are reproducible and can be corrected if necessary. If you have ever thought about compliance – this is a good example. Because compliance does not say that you cannot make errors. It says that you need a reproducible, documented procedure how you store and treat your documents. Auto-classification can help to achieve this goal. It is a great tool for boosting productivity. But it is even more helpful to avoid bias, irreproducible results and non-compliance.