Now this is a good example about the dangers of self learning. Many document understanding systems are using supervised learning as a method to teach their statistical algorithms. Normally this is done by supervisors with carefully selected sample documents. But more and more systems also provide the possibility of “auto-learning” that let’s users enter new samples during production. While in principle this is a big improvement it must be clear that this can only happen with strict control of the quality of user input. This is done either by automatic quality checks by the algorithm itself or by manual checks through supervisors.
Obviously that was not the case for the intelligent library assistant Xiaotu at Tsinghua University in China. Xiaotu was designed in 2010 and programmed with 10,000 questions and answers about borrowing books and searching essays. But it seems that Xiaotu has in the meantime been trained with at least 40,000 messages and phrases by students, including language that should not be heard in a library. The answers Xiaotu gave in the end were so much off track that authorities had to shut down the system.
According to The Times Tsinghua authorities are now racing to clean up Xiaotu’s act before term begins on February 20, but it has proved delicate work. The system cannot simply be made more stupid, because its intended functions would suffer. The solution, said a librarian, was to wash out Xiaotu’s digital mouth regularly – clearing the system of any stray smut but allowing the sensible badinage to develop naturally.
Some of Xiaotu’s words of wisdom have already become famous in the web:
A: Web user X: Xiaotu
1. A: Do you feel upset, Xiaotu?
X: Sigh, it’s a long story.
A: Is it because the girls are wearing too many layers?
X: Hey, I’m an artificial intelligence robot, want to know me more?
A: Don’t change the subject.
X: I didn’t, just being shy.
2. A: Can I touch you, Xiaotu?
X: Sure, but there’s 20 yuan extra charge.
3. A: How old are you, Xiaotu?
X: I’m “all the flowers blossom” years old.