Visiting Text Analytics Summit

Text Analytics is a part of the discipline that we call Document Understanding on this site. Originally focused on text mining, text analytics moves more and more to real time analysis of documents to create actionable insight. This year’s conference in Boston was the 8th of its kind and saw a significant shift of activity towards analyzing social media. This is a big trend in the U.S. as it is seen as crucial for understanding customer sentiment, maximizing social media productivity and optimizing market research. I have my doubts if the results gained from social media analytics are really as valuable and reliable as presented in the conference. Just think who is writing public comments (not restricted to friends and circles) in the web and if this is the group that a company should take as a guideline for their strategy. Nevertheless right now it is big hype maybe because it represents the only way for big enterprises to tackle social media at all.

But social media was not the only field that was presented. Airline security was shown by NASA in one example as a very valid field of text analytics, but also improving customer service by analyzing reports and of course e-discovery for litigation support.

As an opening presentation Seth Grimes, President Alta Plana made some good remarks on the state of the industry in his talk about “Text and Beyond”. In his view a mega trend is real time operation on Big Data (described by the three V’s: volume, velocity, variety). The goal must not be to reduce information but to do more with more (more actions with more data). The problem is not information overload but filter failure – which I find is a refreshing new look at the situation.

An important part of text analytics will be knowledge enrichment: Semantics enables a join across types and sources and structures using meaningful identifiers to create and ensemble that is greater than the sum of the part. Semantics interrelates information to represent knowledge. And text analytics generates semantics (= meaning) to bridge search, BI and applications enabling next generation information systems.

Kurt Williams from Mindshare Technologies gave a good insight in “Next Generation Customer Surveys with Text Analytics” using IBM tools. He uses text analytics on open-ended comments from customers to explain the “why behind the what” and fill-in the gaps in survey design. Text analytics is a disruptive technology for customer feedback programs. As an example he mentioned restaurant feedback that is solicited when you exit the restaurant. It is a proven fact that solicited feedback yields much higher rate of positive comments than when you simply wait for it. He made a good distinction between monitoring and discovery both of which can be achieved with text analytics:

  • Monitor the known unknowns = TQM for text
  • Discover the unknown unknown = reveals things that are hidden
  • Monitoring uses rules and discovery uses correlations

A customer’s view on survey analytics was given by David Williams, Manager of Marketing Analytics at Walt Disney. They analyze customer feedback and forum questions to find regularities and new trends. As an example Walt Disney analyzes comments in trip advisor (e.g. 1071 reviews for one hotel) correlated to rating (stars) to find issues and trends in their hotels.  Walt Disney also has forums like the “Disney Mom’s panel” which is used as a source for information but also as a marketing instrument doing real time analysis. By detecting patterns in questions and relating them to the subsequent rate of booking it is possible to identify individuals who require a promotion to convert or who need further marketing. Something you need to keep in mind in the future when talking to big organizations!

The best and most unusual application of text analytics was presented by Ashok Srivastava, Principal Scientist Data Mining at NASA. They discover precursors to aviation incidents from data like flight reports, radar data, weather information etc.  There are 6 million flights per year in the U.S. each of which has associated free text reports. An impressive visualization of the sheer number of daily flights can be seen on “A Day in the Life of Air Traffic Over the United States”. The goal of mining is to reduce the accident rate by identifying and responding to precursor events before the accident occurs. Using analytics they can detect when the system moves from a safe state to a compromised state and to an anomalous state. Over 100k reports that can be used to answer WHY something happened. Text mining actually discovered airports and areas on airports that need further investigation. As an example DFW (Dallas Fort Worth) has some problems on runways because it is very complex leading to problems in certain areas that could be detected using text analytics.  And this cannot be done manually because these are many thousands of reports. A full discussion of this fascinating application to our all benefit can be found on YouTube.

An application of text analytics on a totally different field but also dealing with risk was presented by Mattias Tyrberg, CEO and Founder, Saplo from Sweden. They are using predictive text analytics for assessment of credit scores of companies using publicly available reports and news articles. The system has been trained with thousands of articles from the past 10 years for some 200 companies and was able to predict credit ratings for these.  They are not using phrases but a refined model to represent text and to learn statistically and train with past data.

It is interesting in how many field text analytics and hence document understanding can be applied. We will see much more of these in the near future as now the unstructured data becomes available as never before and the technologies evolve very rapidly.

Feel free to comment and share your opinions using the comment function or to send me an e-mail to further discuss. I am sure there is a lot more that can be said and done. Please share this article with friends and colleagues if you find it worthwhile reading.