Intelligent Voice Industry Observation: Microsoft Xiaobing builds a semi-open ecosystem Has AI creation and commercialization arrived?

Different from the past five times, the sixth-generation Microsoft Xiaoice launch conference held on July 26 moved out of the first-floor lecture hall of Microsoft Asia-Pacific Research Headquarters and moved to the large venue in the 798 area for the first time. "The scale of the press conference has also expanded from dozens of media in the past to hundreds, covering the entire country." A person close to Microsoft told the 21st Century Business Herald reporter.

This is a signal. In the past, Microsoft has never put any commercial pressure on Xiaoice. Even in recent interviews with media including 21st Century Business Herald, Li Di, the head of Microsoft Xiaoice, still emphasized that Xiaoice has no profit indicators.

But just like the press conference itself, Xiaobing is also unconsciously stepping out of laboratories and research institutions and gradually trying to commercialize. This is the confidence behind Xiaoice’s first move. After going through the past five generations, from Xiaobing's sprouting to growing up, from having a two-dimensional frame diagram to a two-dimensional image, and now to the display of three-dimensional holographic images, Xiaobing is getting closer and closer to a human being.

The technology behind it continues to iterate, and the ecosystem is beginning to take shape. According to Microsoft, this conference is a comprehensive upgrade of all parts of XiaoIce's emotional technology framework, from the EQ and IQ settings when it was first completed, to conversational artificial intelligence, generative models, and full-duplex voice. Today's XiaoIce Begin to enter the stage of AI creation. In terms of ecology, Microsoft proposed for the first time to build a Dual AI semi-open ecosystem to differentiate and integrate the advantages of partners to create Xiaoice’s exclusive skills and capabilities.

"The ultimate goal of artificial intelligence is 'human-machine collaboration', using digital intelligence to help humans, but this direction has different routes." Microsoft Global Executive Vice President, Microsoft Artificial Intelligence and Microsoft Research Division Mr. Shen Xiangyang said, "The XiaoIce team has taken a different path."

AI Creation

Since last year, Microsoft XiaoIce has made many attempts in creation, and even produced Wrote a collection of poems of his own. Now, Xiaobing will go further.

At the press conference, Shun Xiangyang announced that Microsoft has considered three principles for AI creation: first, its subject must be a combination of IQ and EQ, not just IQ; secondly, artificial intelligence creation The product must be able to become a work with independent intellectual property rights; thirdly, the process of artificial intelligence creation must correspond to some kind of creative human behavior, rather than a simple replacement of human labor.

Xiaobing’s goal is to become a robot with high emotional intelligence. "We plan to treat AI creation as an emerging industry." At the press conference, Xu Yuanchun, general manager of Microsoft's Artificial Intelligence Creation Division, said, "If AI creation is treated as a content industry rather than a simple literary and artistic creation, only 'Concept car' is not enough, since last year we have been working on 'production car' in parallel."

According to reports, in the past 12 months, Xiaobing has hosted 21 TV shows and 28 TV shows. Radio programs cover 41 TV stations and radio stations in China, including 9 David TV. Today, Xiaobing hosts 25 radio programs every day. In Japan and China, XiaoIce has produced a total of 2,878 hours of audio-visual content.

At the same time, Xiaoice’s audiobooks have covered more than 90% of early childhood education robots and 80% of online playback platforms in China. In addition, News Reading Xiaobing, which cooperates with NetEase News Client, has exceeded 10 million news reading comments two months ago. In finance and other related fields, Xiaoice is also engaged in continuous content creation.

The technical support behind this comes from XiaoIce’s emotional technology framework, and the core dialogue engine and interactive senses of the sixth generation XiaoIce have also been further upgraded. Microsoft has launched a new sensory model on the sixth-generation Xiaoice, and is testing a new sensory model that combines text, full-duplex voice and real-time vision.

Among them, the ***sense model is a dialogue engine based on a generative model.

According to reports, the generative model completed by Xiaobing last year can create responses by itself instead of retrieving them from existing dialogue corpora. Today's sense-of-interest model further enhances Xiaoice's control over the content, field and rhythm of dialogue. That is, XiaoIce can create his own responses to guide the direction of the conversation.

This public beta of a new sensory model that combines the three categories of dialogue engine, full-duplex voice and real-time vision in test equipment allows Xiaobing to use real-time visual and voice communication. Continuous interaction directs users to complete face detection and conduct open-domain conversations in the process.

In addition, Microsoft also released the fourth version of the AI ????song DNN model. According to Luan Jian, XiaoIce's chief speech scientist, this version of the model can quickly synthesize songs of the same quality as human singers. It also allows XiaoIce to freely absorb the singing skills and characteristics of human singers, and even complete new works on behalf of humans while imitating them.

However, although Microsoft has proposed the principles of AI creation and carried out technical updates, what Xiaoice has done will only be the beginning of real AI creation. “According to the 2017 Gartner Technology Hype Cycle, it will still take 5-10 years for virtual assistants to become mainstream.” When commenting on AI’s creative capabilities, Gartner Research Vice President Cai Huifen told the 21st Century Business Herald reporter, “This application is mainly aimed at intelligent Narrow areas such as personal assistants or voice control in home devices still require improvements in technologies such as building knowledge graphs and natural language understanding and generation for different fields. ”

Dual AI. Ecology

In addition to the upgrade of technical capabilities, the biggest feature of the sixth generation of Xiaoice is that it has begun to build its own ecology - Dual AI.

“Before Microsoft, a variety of different cooperation ecosystems and models had emerged in the industry. Among them, there are two most important models. One is the open empowerment model, which provides SDK/API to the outside world. Build an ecosystem. "Peng Shuang, product manager of Xiaobing, analyzed, "The other type is to focus on its own, closed platform and build an ecological environment by opening an AI application store on the platform."

Dual AI is different, more similar to a semi-open ecosystem. “In such an ecological environment, on the one hand, Microsoft will be directly responsible for the product experience and control the most specific product details that are directly in contact with users. On the other hand, we are not closed on our own platform, but contact external parties. It can even be directly integrated into third-party platforms,” Peng Shuang said.

The reason for this choice is that the other two types of ecology have their own problems. Among them, the closed mode greatly limits the free flow of data, which is contrary to the nature of AI. Since the amount of basic data required for iteration cannot be obtained, it is difficult to iterate quickly and take advantage of upgrades.

In the open empowerment model, both the empowering and the empowered parties have a relatively loose relationship, "which means that no one is really responsible for the final product experience." For example, the actual experience of currently popular smart speakers is generally lower than expected because of problems caused by loose partnerships.

At the same time, because API/SKD in the open empowerment ecosystem emphasizes versatility, it also limits the timeliness of the application of the latest and best technologies to a certain extent, which can be obtained through such interfaces or toolkits. The data may not be of the highest quality.

In the process of cooperation, Xiaobing is also exploring its own profit model. At present, XiaoIce has launched four major commercial fields including finance, popular culture, media and publishing. "We have explored various AI profit models and finally found that they fall into two broad categories. One is to use AI technology to replace human low-concurrency and AI high-concurrency work at a lower cost, such as content production," Li Di told A reporter from the 21st Century Business Herald said, “The second is the collaboration between AI and humans, which can be shared by increasing the collaborative conversion rate.”