AI and the Evolution of the Modern Call Center
Use of the telephone as a marketing tool dates back to the early 1900s when sales teams scoured phone books to develop and sell lists of prospects. This evolved into a rudimentary version of call centers in the 1950’s. Stay at home moms called acquaintances to sell baked goods to generate extra income. The invention of toll-free 800 numbers in the 1960s seeded the development of the modern call center. Suddenly, companies had an easy way to help customers make purchases and resolve concerns remotely.
This success of this model has seen contemporary call volumes reach 12.6 billion calls annually in the United States alone, according to CMSWire, the leading customer experience authority. To keep up with the ever-growing demand, American companies began outsourcing call centers to India and the Philippines. This brought new sources of employment to those countries, while helping the companies address nearly any type of concern.
However, a problem soon emerged.
The Language Barrier
Despite concerted efforts to help offshore call center representatives sound more “Western,” American customers insisted upon speaking with someone “from America” when they called. This led to some rather tense encounters for call center reps, as they typically do not have the ability to route calls back to the States, or are discouraged from doing so to avoid the associated costs. In more than a few instances, call center staffers were not afforded the respect they deserve—simply because their patterns of speech were different than those of callers.
Enter AI-Based Accent Smoothing
The latest development in call center technology is termed “accent smoothing.” This relies upon artificial intelligence based-platforms to soften accents for callers. According to Tai-Yin Chu, a leading research scientist developing these algorithms, “The aim of these platforms is to improve intelligibility and reduce customer frustration.”
An Entirely New Call Center Paradigm
Chiu is the senior research scientist at Tomato.ai, whose platform is designed to smooth accents, rather than eliminate them. Tomato.ai also strives to deliver an experience for a caller that is virtually indistinguishable from a normal conversation. To accomplish this, Chiu, who holds a PhD in Electrical and Computer Engineering from the University of Texas at Austin, led three key areas of the company’s research:
- Accent Softening – The speech pattern of a heavily accented speaker is smoothed, but not eliminated.
- Real Time Speech Generation – Less accented speech is generated in a split second.
- On-the-Fly Voice Conversion – Conversion of any voice can be accomplished without retraining the algorithm.
Accent Softening
Chiu’s background in what’s known as Style Transfer—a technique primarily applied to image generation—led to the development of the company’s most successful accent softening models. Style Transfer is normally used to combine the artistic qualities of an artist (like Van Gogh’s brushstrokes) with the content of a photograph to create an image that appears to have been painted in the style of the artist.
Chiu developed a method by which the Style Transfer algorithm can be applied to accent softening. He says, “Style transfer and accent softening are related concepts, in that both involve transforming one set of characteristics into another—while retaining the underlying structure or content.”
Real Time Speech Generation
While accent softening can go a long way toward making it easier for American callers to converse with offshore call center representatives, real time speech generation makes conversation flow, without sounding computer generated. Lacking Chiu’s innovations in this area, the result would come across as an aural version of the “uncanny valley.” Voices might sound real, but they’d have an unnatural aspect people would find off putting.
To resolve this concern, Chiu’s work entailed shrinking various aspects of the smoothing algorithm so they could run concurrently. Says Chiu, “While most speech generation solutions take multiple seconds or more to produce an output, our platform responds in a split-second.”
On The Fly Voice Conversion
This aspect of the Tomato.ai software makes it adaptable to any voice—without retraining the algorithm. Chiu leveraged this voice conversion technology to accelerate progress on accent softening models by generating more training data for those models. He recalls, “Through a variety of methods, I synthesized a significant amount of voice data for our research team to enable our platform to train new accent softening models automatically.“
The Result “Speaks” For Itself
Chiu, as does the rest of the team at Tomato.ai feels accents are beautiful and should be celebrated. At the same time, the benefits for commerce are indisputable. To that end, Tomato.ai’s algorithm strives to preserve some elements of the speaker’s original accent, while simultaneously making it easier for listeners to understand them.
Chiu’s work in this area is key to the success and ongoing growth of the company. His innovative approach has enabled Tomato.ai to offer what is arguably the best AI voice filter on the market. And, in turn, he and Tomato.ai continue to make life easier for call center representatives all over the world.