by Natasha Singer: In a YouTube clip from one of Steve Jobs’s last interviews,
he appears to be enjoying reminiscing about how he first hit upon the idea for the keyboardless tablet that eventually became the iPad.
“I had this idea of being able to get rid of the keyboard, type on a multitouch glass display and I asked our folks, could we come up with a multitouch display that I could type on, I could rest my hands on and actually type on,” Mr. Jobs says, smiling slightly as he recounts his enthusiasm at seeing the first prototype. “It was amazing.”
But in a billboard superimposed over the nearly two-minute video clip, an emotion analytics company called Beyond Verbal has added its own algorithmic evaluation of Mr. Jobs’s underlying feelings. It is an emotion detection system meant to parse not the meanings of people’s words, but the intonations of their voices.
“Conflict between urges and self-control. Loneliness, fatigue, emotional frustration,” the ticker above Mr. Jobs’s head reports as he speaks. Moments later, it suggests a further diagnosis: “Insistence, stubbornness. Possibly childish egoism.” And then concludes: “sadness mixed with happiness. Possibly nostalgia.”
Humans generally have inklings when their interlocutors, out of solicitousness or sarcasm, utter phrases aloud that contradict their inner feelings: Thanks a bunch. You’ve been very helpful. Wish I were there. Let’s have lunch.
But now, new techniques in computational voice analysis are promising to help machines identify when smiley-sounding phrases like Mr. Jobs’s belie frustration and grief within. Although the software is still in its early phases, developers like Beyond Verbal, a start-up in Tel Aviv, are offering the nascent technology as a deeper approach for call centers and other customer services that seek to read and respond to consumers’ emotions in real time. The company says its software can detect 400 variations of different moods.
“It’s not what you say. It’s how you say it,” says Dan Emodi, vice president for marketing at Beyond Verbal. “Listening to these patterns, we can allow machines for the first time to understand the emotional side of our communications.”
The more invasive audio mining also has the potential to unnerve some consumers, who might squirm at the idea of an unknown operator getting an instant entree into their psyche.
Industry analysts say companies that adopt emotion detection should be transparent with consumers, alerting them to the uses and analysis of their data beyond the standard disclosure to which we’ve become inured: “This call may be recorded for quality assurance purposes.”
“It’s a potential privacy issue, capturing a consumer, mining that conversation,” says Donna Fluss, the president of DMG Consulting, a market research firm focused on the call center industry. “What are they doing with that information?”
Another question is whether emotion detection is any more valid than novelties like handwriting analysis. After all, only Steve Jobs could say how he really felt during that interview.
“It seems to me that the biggest risk of this technology is not that it violates people’s privacy, but that companies might believe in it and use it to make judgments about customers or potential employees,” says George Loewenstein, a professor of economics and psychology at Carnegie Mellon University. “That could end up being used to make arbitrary and potentially discriminatory decisions.”
FOR more than a decade, call centers have generally recorded every service request, complaint, diatribe, account closure and nuisance call from consumers. In the early days of these recordings, companies archived the calls and reviewed a handful of them after the fact, examining the conversation patterns and giving agents feedback on their performance.
But as software and server power have improved, call centers are using a more advanced approach called “word spotting” to examine each call. In fact, the business of analyzing words and their sentiments, called speech analytics, is a $214 million market, according to estimates from DMG Consulting, and used in finance, insurance, health, travel, retailing and telecommunications.
Call centers, for instance, can program their speech engines to search for specific words or phrases — like “This is the third time I have called in!” or “I’ve been a loyal customer for 10 years!” — which tend to be emotionally charged, indicating mounting consumer dissatisfaction.
“We record and mine every single call, every single word and phrase,” says Daniel Ziv, vice president for voice-of-the-customer analytics at Verint, a leading speech analytics company. “One of my favorites is ‘You people!’ or ‘This is ridiculous!’ It’s unlikely that people are using ‘ridiculous’ in a positive, playful way.”
Another call center analytics company, called CallMiner, classifies consumers’ spoken words into categories like “dissatisfaction” or “escalation.” Speech analytics engines can also be used to search consumer calls for unexpected events or trends, like a sudden problem with product delivery or using gift cards.
“If you can identify the problem, get to the root cause and fix it, you can save millions of dollars,” says Ryan Hollenbeck, Verint’s senior vice president for marketing.
Beyond Verbal is proposing a different tactic with algorithms that ignore emotional trigger words like “ridiculous” in favor of voice qualities like tone and frequency.
Company executives say their technique is based on the work of Israeli researchers in the 1990s who studied how babies understand and respond to the moods of adult speech before understanding actual language. The researchers developed their mood-detection algorithms by analyzing the emotions of 70,000 people in 30 languages. Company executives say the software can detect not only callers’ primary and secondary moods, but also their attitudes and underlying personalities.
“It helps agents decide how to respond. If there’s a customer-is-always-right type, you want to give them proper appreciation and respect,” Mr. Emodi says. “If the caller is seeking friendship, the agent should speak in a friendly, direct way.”
He and other company executives envision a variety of commercial uses for emotion detection. Consumers might use it to analyze and modulate their own voices, as could public speakers. People who wish to test out the accuracy of the emotion meter for themselves can check out the mood recognition app on the company’s site.
Executives say a few companies are working on call-center applications for the software and they expect the first of those apps to be ready for use around the end of this year. The idea is to use it not just to identify and mollify dissatisfied callers, but also to help agents distinguish between frustrated callers who wish to solve a problem and are worth spending more time on from angry callers who want merely to vent.
Yuval Mor, the chief executive of Beyond Verbal, says the program can also pinpoint and influence how consumers make decisions. He calls it deciphering “the human emotional genome.”
“If this person is an innovator, you want to offer the latest and greatest product,” Mr. Mor says. “If this person is a more conservative person, you don’t want to offer the latest and greatest, but something tried and true.”
But people’s voices change over time and depending on different situations, says Professor Loewenstein of Carnegie Mellon. So categorizing a consumer based on one phone call could be commercially irrelevant over the long-term.
“They are just reading your voice at one moment in time. You are not going to read someone’s personality from their voice,” Professor Loewenstein says. “In my view, we are very far from that being a reality.”
Even without a mood detection algorithm, you can classify that emotion: skepticism.