The core technology of AR, and the relationship between artificial intelligence

AR/VR is often compared as a twin brother, is generally recognized as a new technology for the application layer or "smart wearable devices", compared to artificial intelligence relative to the "algorithm" label, seems to be not enough to have the depth and connotation of high, that AR and artificial intelligence in the end what is the relationship between? The AR is not part of the artificial intelligence that we know today.

Is AR an artificial intelligence? After reading this article, you will understand AR information

In March 2018, the Shanghai Municipal Commission of Economy and Information Technology announced the first batch of 2018 special projects to be supported by the city's artificial intelligence innovation and development. "A **** there are 19 innovative enterprises shortlisted, Brightwind as an AR company also shortlisted for this support project" Brightwind staff told Qingting.com, this is not the first time that AR companies are categorized into artificial intelligence, but this categorization is also not common. It is understood that this special by the Economic and Information Commission and the Municipal Finance Bureau to jointly carry out, to support the amount of more than 100 million.

Simply combing the core technology of AR

AR (Augmented

Reality), is superimposed on the virtual information in the real world, that is, the reality of the "enhancement", this enhancement can be from the visual, auditory and even tactile, the main purpose is to make the real world and the real world sensory.

This is the first time that we've seen a real world in a real world, and we've seen a virtual world in a real world.

Among them, the perception of the real world is mainly reflected in the visual, which needs to be helped by the camera to obtain information, in the form of images and video feedback. Video analysis is used to achieve perceptual understanding of the 3D world environment, such as the 3D structure of the scene, what objects are in it, and where they are in space. The purpose of 3D interaction understanding is to inform the system what to "enhance".

There are a couple of key points here:

First is 3D environment understanding. Understanding what you see relies on object/scene recognition and localization techniques. Recognition is mainly used to trigger an AR response, while localization is to know where to overlay AR content. Positioning can also be categorized into coarse and fine positioning depending on the accuracy, coarse positioning is to give a general orientation, such as areas and trends. And fine positioning may need to be accurate to the point, such as XYZ coordinates under the 3D coordinate system, the angle of the object. Depending on the application environment, both dimensions of localization have application needs in AR. In AR, common detection and recognition tasks include face detection, pedestrian detection, vehicle detection, gesture recognition, biometrics, emotion recognition, and natural scene recognition.

After perceiving the real 3D world and fusing it with virtual content, it is necessary to present this virtual-reality fusion information in a certain way, which requires the second key technology in AR: display technology.

Most current AR systems use see-through helmet displays, which are subdivided into video see-through and optical see-through, with other representatives of light-field technology (mainly due to Magic

Leap), holographic projection (often seen in sci-fi movies and TV shows), and so on.

The third key technology in AR is human-computer interaction (HCI), which allows people to interact with superimposed virtual information. AR pursues natural human-computer interaction methods other than touching buttons, such as voice, gesture, posture, and face, and uses more voice and gesture.

Artificial Intelligence and AR technology correlation

In the field of Artificial Intelligence, there are several concepts that are often mentioned, such as Deep Learning (DL), Machine Learning (ML), and in the academic field, including Artificial Intelligence (AI), including several major fields have their own research boundaries, and in the general sense, we often talk about generalized artificial intelligence, including all In a general sense, we often refer to AI in a generalized sense, encompassing all directions for the application of technologies that "make machines like people".

From this chart can also be a simple glimpse of the relationship between the three, deep learning is a technical way to achieve machine learning, and machine learning is to make the machine become intelligent, to achieve artificial intelligence. It can be said that artificial intelligence is the ultimate goal, and machine learning is a technical direction extended to achieve this goal. In this, there is another important concept for computer vision (CV), mainly to study how to make the machine like a human to "see", is currently an important branch of the concept of artificial intelligence, this is also because one of the most important way for human beings to obtain information is vision, computer vision has been in the commercial market to play a value, such as Face recognition; reading traffic signals and paying attention to pedestrians to navigate in automated driving; industrial robots used to detect problems to control the process; the reconstruction of the three-dimensional environment image processing and so on. These concepts are both differentiated and overlap to some extent.

Among them, starting in 2006, Hinton triggered a deep learning boom began to spread, to a certain extent, led to the rise of AI again, in ten years, in a number of areas, including speech recognition, computer vision, natural language processing, including major breakthroughs, and to the extension of the application field, is developing in full swing.

In the core technology of AR, 3D environment understanding, 3D interaction understanding, computer vision, and deep learning are all closely related. 3D environment understanding in academia mainly corresponds to the field of computer vision, and in recent years, deep learning has been widely used in computer vision. In terms of interaction, the use of more natural interaction methods such as gestures and voice in hardware terminals has benefited from the breakthroughs of deep learning in related fields in recent years. It can also be said that deep learning in AR is mainly applied in the visual key technology.

At present, the most common form of AR is 2D picture scanning recognition, such as Tencent QQ-AR torch activities, Alipay five blessings and so on most of the AR marketing seen in the scanning of the phone to identify the picture appeared to be superimposed on the content, but the main direction of research and development is still in the 3D object recognition and 3D scene modeling.

Realistic objects exist in 3D form, with different angles and spatial orientations. So a natural extension would be to go from 2D image recognition to 3D object recognition, recognizing categories and gestures of objects, where deep learning can be used. Taking fruit recognition as an example, recognizing different categories of fruits and giving the localization region means integrating object recognition and detection.

3D scene modeling, expanding from recognizing 3D objects to larger and more complex 3D regions. For example, identifying what is inside the scene, their spatial location and interrelationships, etc., this is 3D scene modeling, which is the core technology of AR. This involves the currently popular SLAM (real-time localization and map construction). By scanning a certain scene and then superimposing 3D virtual content such as virtual battlefields on it. If only based on ordinary 2D image recognition requires a specific picture, and when the picture is not visible, the recognition will fail. In SLAM technology, even if a specific plane does not exist, the spatial localization is still very accurate because of the help of the surrounding 3D environment.

Here I would like to discuss the integration of deep learning and SLAM technology, computer vision can be divided into two schools of thought, one based on the idea of learning, such as feature extraction - feature analysis - classification, deep learning technology has achieved a dominant position in this route. The other route is based on geometric vision, from the lines, edges, 3D shapes to launch the spatial structure of the object information, the representative technology is SFM/SLAM. deep learning basically reigns supreme in the direction based on learning, but in the field of geometric vision based on the current relevant progress is still very little. In terms of academia, the research progress of deep learning technology can be said to be rapidly changing, while the progress of SLAM technology in the latest decade is relatively small. In the SLAM technology symposium organized by the top international vision conference ICCV

2015, based on the rapid development of deep learning in other fields of vision in recent years, some participating experts had proposed the possibility of adopting deep learning in SLAM, but there is no mature idea yet. Overall, integrating deep learning and SLAM in the short term is a direction worth studying, and joint semantic and geometric information is a very valuable trend in the long term. Therefore, SLAM+DL is worth looking forward to.

In terms of interaction methods, the main ones include voice recognition and gesture recognition, voice recognition has made great progress at present, domestic such as Baidu, KU Xunfei, Yun Zhisheng, etc. are among the leaders, and AR companies are more interested in breaking through the mature commercialization of gesture recognition.

"Bright wind platform has demonstrated a deep learning-based gesture recognition system, mainly defined up and down, left and right, clockwise, counterclockwise six kinds of gestures" Bright wind platform staff told Qingting.com, the first to achieve the detection and positioning of the human hand, and then through the recognition of the corresponding gesture trajectory to achieve the recognition of human gestures. Although other popular areas of artificial intelligence such as face recognition are also used in AR, but is not an important R & D direction for AR companies.

It is not difficult to see above that the underlying technology or the basic part of AR is the integration of computer vision as well as related fields, and the combination of the current popular deep learning and AR is also the direction of the algorithm engineers' efforts. This is also the basis for the argument that AR is a cross-discipline of computer vision and human-computer interaction, and that the foundation of AR is artificial intelligence and computer vision.

20180528163858218.png

Figure: computer vision and AR process correlation

In last year's "Artificial Intelligence Influence Report" released by today's headlines also briefly counted the distribution of AI scientists, which includes companies in the fields of face recognition, speech recognition, robotics, AR, chips, etc. and large-scale R&D organizations, the distribution of high-end R&D personnel also illustrates the segmentation direction of the AI field.

So is AR artificial intelligence?

For AR practitioners, the ideal state is to replace smartphones with smarter AR terminals, so for users to contact the use of AR is first affected by the content, followed by the terminal, the AR industry chain can be roughly divided into technology providers, intelligent terminal R & D companies, and AR content providers. In this, AR equipment providers inevitably focus on hardware technology, such as the underlying chip, battery, optical lens, etc., as well as the performance optimization of the hardware itself, while content providers are more inclined to optimize the content and performance on the basis of existing technology. So we can say that AR technology providers, or AR companies that have had some success in the development of the underlying algorithms are artificial intelligence companies.

For companies, especially the founders will transform the underlying technology into mature products or services, which may be such as drones, AR smart terminals, robots, etc., or industry solutions for commercial purposes, and this has become the expectations and requirements of the media, enterprises, and the general public for AI companies after the boiling sound. Recently, the Artificial Intelligence Industry Development Alliance (AIIA) published the book "Artificial Intelligence Wave: 100 cutting-edge AI applications of technology to change life" will be released to the public, as well as encompassing the current giant companies as well as start-ups in the commercialization of the cutting-edge results, but also directly reflect the main commercialization direction of the current AI.

As a technology-driven business field, whether it is AR or most of the other directions of artificial intelligence, the technology is still a long way from complete maturity, in the whole industrial chain gradually prosper, focus on commercialization to achieve at the same time, but also need to have more companies and institutions to continue to expand the technical boundaries, to establish the core competitiveness of the industry to explode greater value and potential, so that the AI era of Chinese The company's business is also a good example of how it can be used in the marketplace.