A virtual host named “i+” has been put into use for the upcoming Beijing 2022 Olympic and Paralympic Winter Games. Empowered by intelligent speech recognition technology, it can translate Mandarin Chinese into English, French, Japanese, and other languages in real time, spreading knowledge about the Beijing 2022 to the world more quickly.
Intelligent speech recognition technology has been increasingly integrated into terminal applications to serve people’s everyday life. Besides AI host, its application scenarios also includes smart fitness mirror, smart wearables that can assist couriers in collecting and delivering parcels, and intelligent mouse that can type automatically when you talk.
Intelligent voice industry, as an important part of the software industry, has entered a new stage characterized by high-speed development, said Wang Jianwei, deputy director of the information technology department at China’s Ministry of Industry and Information Technology (MIIT), at the China Intelligent Voice Industry Development Summit Forum held in December 2021.
China’s intelligent voice industry has grown vibrantly in recent years, with breakthroughs made in core technologies, Wang said, adding that the accuracy rate of speech recognition has reached 98 percent.
The size of China’s intelligent voice market hit 21.7 billion yuan ($3.4 billion) in 2020, an increase of 31 percent from the previous year, and is projected to rise 44 percent year on year to 28.5 billion yuan in 2021, effectively driving industrial digitalization, pointed out a white paper on the development of China’s intelligent voice industry (2020-2021) released on Dec. 18, 2021.
In the world where all things are connected, more smart devices are required to be controlled from a distance, which brings about opportunities for the industry, said Liu Qingfeng, chairman of the council of the Speech Industry Alliance of China (SIAC) and chairman of iFlytek, a leading Chinese AI firm.
The number of smart devices driven by rising demand for speech interaction is increasing rapidly, Liu noted, adding that the amount of interaction services handled by the company’s voice assistants registered a year-on-year increase of 84 percent in 2021.
Intelligent voice technology is faced with three major challenges, including multilingual intercommunication, human-machine interaction in complex scenarios and multi-modal virtual world, Liu pointed out.
Multilingual intercommunication not only includes foreign languages, but also dialects in China; effective interaction in complex scenarios calls for accurate speech recognition when several people are talking at the same time; and multi-modal interaction means adding timbre, tone, expression, mouth shape and other factors to voice to make speech recognition more intelligent, Liu explained.
It’s estimated that the recognition rate of iFlytek products increased from 69 percent to 80 percent in complex scenarios in 2021, according to Liu.
The key innovation drivers of the future development of the intelligent voice industry include unsupervised learning, multi-modal fusion and innovative cross-disciplinary research in brain science, the white paper said.
Algorithm concerning unsupervised learning and low-resource model algorithm still need major breakthroughs; and in terms of AI chips, the foundation of computing power, China needs to hurry to catch up with frontrunners in the world.
According to Wang, the MIIT plans to push ahead with the high-quality development of intelligent voice industry in three aspects.
The ministry will call on regional departments to gather pace in formulating industrial policies that promote the integrated development of intelligent voice technology and the real economy, he noted.
Meanwhile, it intends to encourage leading enterprises and scientific research institutions to join hands for further technological breakthroughs to continuously improve technologies related to speech recognition, synthesis, interaction and speech chips, and build national public service platforms for intelligent voice testing to support the development of the industry, the official said.
The SIAC has already attracted over 70 enterprises with core technologies along the industry chain. At least 70 more will join the alliance in the future, with more research institutes and universities to be expected, according to Wang.
The MIIT will also expand the application scenarios of voice technology so that it can be integrated into intelligent manufacturing, smart home, smart health, education, elderly care and other fields.