What you don't know about "big data" in detail

Detailed Explanation of What You Don't Know About "Big Data"

In 2012, the term "bigdata" has been mentioned more and more, and it has been used to describe and define the massive amount of data generated in the era of information explosion, and to name the technological developments and innovations related to it. It has been on the cover of the column of the Wall Street Journal, entered the news on the official website of the U.S. White House, appeared in a number of Internet-themed lectures and salons in China, and has even been written into investment recommendation reports by sniffing securities firms and others.

I. Background of the emergence of big data

Entering 2012, the term big data (bigdata) has been mentioned more and more, which is used to describe and define the massive data generated in the era of information explosion, and to name the technological developments and innovations related to it. It has been on the cover of the column of the Wall Street Journal, entered the news on the official website of the U.S. White House, appeared in a number of Internet-themed lectures and salons in China, and has even been written into investment recommendation reports by sniffing securities companies and others.

Data is rapidly expanding and getting bigger, and it determines the future development of enterprises. Although enterprises may not be aware of the pitfalls of the explosive growth of data to bring about problems now, but with the passage of time, people will be more and more aware of the importance of data to the enterprise. The era of big data poses new challenges to human data mastery and provides unprecedented space and potential for people to gain deeper and more comprehensive insights.

The earliest proposal for the arrival of the big data era was made by McKinsey, a world-renowned consulting firm, which said, "Data, which has permeated every industry and business function today, has become an important production factor. People's mining and use of massive amounts of data heralds a new wave of productivity growth and consumer surplus." "Big data" has existed for some time in fields such as physics, biology, environmental ecology, and in industries such as the military, finance, and communications, but has attracted attention in recent years because of the growth of the Internet and the information industry.

Big data in the Internet industry refers to a phenomenon in which Internet companies generate and accumulate data on users' online behavior in their daily operations. The scale of this data is so huge that it cannot be measured in terms of G or T. The starting unit of measurement for big data is at least P (1,000 T), E (1 million T) or Z (1 billion T).

II.What is Big Data?

The field of information technology has been "massive data", "large-scale data" and other concepts, but these concepts only focus on the scale of the data itself, failing to adequately reflect the data explosion in the context of data processing and application needs, and the "big data". The new concept of "big data" not only refers to the huge scale of data objects, but also includes the processing and application activities of these data objects, which is the unity of data objects, technology and applications.

1, big data (bigdata), or giant data, refers to the amount of information involved in the scale is so large that it can not be through the current mainstream software tools, in a reasonable period of time to achieve capture, management, processing, and organizing the information to help business decision-making for more positive purposes. Big data objects may be actual, limited data collection, such as a government department or enterprise database, may also be virtual, unlimited data collection, such as microblogging, microblogging, social networking on all the information.

Big data are massive, high-growth and diverse information assets that require new processing models to have stronger decision-making, insight discovery and process optimization capabilities. In terms of categories of data, "Big Data" refers to information that cannot be processed or analyzed using traditional processes or tools. It defines data sets that are beyond the scope and size of normal processing, forcing users to adopt non-traditional processing methods.

John Rauser, Big Data Scientist at Amazon Web Services (AWS), referred to a simple definition: Big Data is any volume of data that exceeds the processing power of a single computer. The R&D group defines big data as "Big data is the biggest publicity technique, the most fashionable technique, and when that happens, the definition becomes confusing." Kelly says: "Big data is something that may not contain all the information, but I think most of it is correct. Part of the perception of big data is that it's so big that analyzing it requires multiple workloads, which is the AWS definition.

2. Big data technology, the ability to quickly obtain valuable information from a wide variety of types of big data, including data collection, storage, management, analysis and mining, visualization and other technologies and their integration. Technologies applicable to big data include massively parallel processing (MPP) databases, data mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.

3. Big data application refers to the behavior of integrating and applying big data technology to a specific collection of big data to obtain valuable information. For different fields, different enterprises of different businesses, and even the same field of different enterprises of the same business, due to differences in their business needs, data collection and analysis and mining objectives, the use of big data technology and big data information systems may also have considerable differences. Only by adhering to the "object, technology, application" trinity synchronous development, in order to fully realize the value of big data.

When your technology reaches its limit, that's the limit of data. Big data is not about how it is defined, but how it is used. The biggest challenge is which technologies make better use of the data and how well big data is being used. This is compared to traditional databases, the rise of open source big data analytics tools of such as Hadoop, and what is the value of these unstructured data services.

Three, the types of big data and value mining methods

1, the types of big data can be broadly divided into three categories:

1) Traditionalenterprisedata: including CRMsystems consumer data, traditional ERP data, inventory data, and accounts data.

2)Machine and sensor data (Machine-generated/sensor data): including call records (CallDetailRecords), smart meters, industrial equipment sensors, equipment logs (usually Digital exhaust), transaction data and so on.

3) Socialdata: including user behavior records, feedback data, etc.. Such as Twitter, Facebook and other social media platforms.

2, big data mining business value of the method is divided into four:

1) customer group segmentation, and then customize special services for each group.

2)Simulate the real environment to discover new needs and improve the return on investment.

3)Strengthening departmental links to improve the efficiency of the entire management chain and industrial chain.

4)Reduce service costs and discover hidden clues for product and service innovation.

Four, the characteristics of big data

The industry usually uses 4 V (i.e. Volume, Variety, Value, Velocity) to summarize the characteristics of big data. Specifically, big data has four basic characteristics:

1, is a huge volume of data

Data volume (volume) is large, referring to large data sets, generally in the size of 10TB or so, but in practice, many business users put multiple data sets together, has formed a petabyte of data volume; Baidu information shows that its new home page navigation needs to be provided daily data more than 1.5PB. Provide more than 1.5PB of data (1PB = 1024TB), these data if printed out will be more than 500 billion pieces of A4 paper. There is information to confirm that, so far, the human production of all printed materials, the amount of data is only 200PB.

2, is a large category of data and a variety of types

Data categories (variety), data from a variety of data sources, data types and formats are becoming increasingly rich, has broken through the previously limited scope of structured data, including semi-structured and unstructured data. Data. Nowadays, the type of data is not only in the form of text, but also in the form of pictures, videos, audios, geographic location information and other types of data, and personalized data accounts for the absolute majority.

3, is the processing speed

In the case of a very large amount of data, but also to achieve real-time data processing. Data processing follows the "law of 1 second", from various types of data to quickly obtain high-value information.

4, is the value of high authenticity and low density

Data authenticity (Veracity) is high, with the interest of social data, enterprise content, transaction and application data and other new data sources, the limitations of the traditional data sources are broken, the enterprise more and more need to be effective in the power of the information in order to ensure its authenticity and security. In the case of video, for example, an hour's worth of video may be useful for only a second or two of data during uninterrupted monitoring.

Fifth, the role of big data

1, the processing and analysis of big data is becoming a new generation of information technology integration of the application of the node

Mobile Internet, the Internet of Things, social networking, digital home, e-commerce, etc. is a new generation of information technology applications, these applications continue to generate big data. Cloud computing provides a storage and computing platform for these massive and diverse big data. By managing, processing, analyzing and optimizing data from different sources, the results will be fed back into the above applications, which will create huge economic and social value.

Big data has the energy to catalyze social change. But unleashing that energy requires rigorous data governance, insightful data analysis, and an environment that inspires managerial innovation (RamayyaKrishnan, Dean of the Heinz College at Carnegie Mellon University).

2. Big data is a new engine for sustained and rapid growth of the information industry

New technologies, products, services and business models for the big data market will continue to emerge. In the field of hardware and integrated equipment, big data will have an important impact on the chip, storage industry, will also give rise to integrated data storage and processing servers, memory computing and other markets. In the field of software and services, big data will trigger the development of rapid data processing and analysis, data mining technology and software products.

3, the use of big data will become a key factor in improving core competitiveness

Decision-making in all industries is changing from "business-driven" to "data-driven". The analysis of big data can enable retailers to grasp the market dynamics in real time and quickly respond; can provide decision support for businesses to develop more accurate and effective marketing strategies; can help businesses provide consumers with more timely and personalized services; in the medical field, can improve the diagnostic accuracy and effectiveness of the drug; in the field of public **** business, big data has also begun to play an important role in promoting economic development and maintaining social stability. and other important roles.

4, big data era of scientific research methods and means will undergo significant changes

For example, sampling survey is the basic research method of social science. In the era of big data, the massive behavioral data generated by the research object on the Internet can be monitored and tracked in real time, mined and analyzed to reveal regularities and propose research conclusions and countermeasures.

Sixth, the commercial value of big data

1, the customer group segmentation

"Big data" can be subdivided into customer groups, and then take unique actions for each group. Targeting specific customer segments for marketing and service is a constant pursuit for businesses. The massive amounts of data stored in the cloud and the analytics of "big data" make it possible to cost-effectively segment consumers in real time and to the extreme.

2. Simulation

Using "big data" to simulate real-world scenarios, we can uncover new demand and increase the return on investment. More and more products are now equipped with sensors, and the proliferation of automobiles and smartphones has led to an explosion in the amount of data that can be collected, as well as the massive amounts of data generated by social networks such as Blogs, Twitter, Facebook and Weibo.

Cloud computing and "big data" analytics make it possible for merchants to store and analyze this data in real time, along with data on transactional behavior, in a cost-effective manner. Transaction processes, product usage, and human behavior can all be digitized. "Big Data" technologies can integrate this data for data mining, and in some cases, model simulations can be used to determine which programs have the highest return on investment given different variables (e.g., different promotional programs in different regions).

3, improve the return on investment

Improve the "big data" results in the relevant departments to share the degree of improvement of the entire management chain and industry chain return on investment. Departments with strong "big data" capabilities can share "big data" results with departments with weaker "big data" capabilities through cloud computing, the Internet and internal search engines, helping them to utilize "big data" to improve the return on investment in the entire management chain and industry chain. They can utilize "big data" to create business value.

4, data storage space rental

Enterprises and individuals have the need to store massive amounts of information, and only by properly storing the data is it possible to further explore its potential value. Specifically, this business model can be subdivided into two categories for personal file storage and for enterprise users. Mainly through easy-to-use APIs, users can conveniently place a variety of data objects in the cloud, and then charge according to usage like water and electricity. Several companies have already launched corresponding services, such as Amazon, NetEase, Nokia and so on. Operators have also launched corresponding services, such as China Mobile's colorful cloud business.

5, management of customer relations

The purpose of customer management applications is based on customer attributes (including natural attributes and behavioral attributes), from different perspectives, deep analysis of customers, understanding of the customer, as a way to increase new customers, improve customer loyalty, reduce customer turnover, improve customer spending, etc.. For small and medium-sized customers, specialized CRM is obviously big and expensive. Many small and medium-sized businesses use FMS as a primary CRM. For example, add old customers to FMS group, publish new product previews, special sales notices in the group's circle of friends, and complete pre-sale and after-sale services.

6, personalized and accurate recommendations

Inside the operator, according to user preferences recommended all kinds of services or applications is common, such as the application store software recommendations, IPTV video program recommendations, etc., and through the correlation algorithm, text abstract extraction, sentiment analysis and other intelligent analysis algorithms, it can be extended to the commercialization of the service, the use of data mining technology to help customers carry out precision marketing, the future profit can come from the customer's precise marketing. marketing, the future earnings can come from the customer value-added part of the share.

Taking the daily "junk SMS" as an example, the information is not always "junk" because the person who receives it doesn't need it and it is regarded as junk. After analyzing user behavior data, we can send the information we need to the people who need it, so that the "spam" becomes valuable information. At McDonald's in Japan, users download coupons on their cell phones and pay for them at the restaurant using the mobile wallet of the operator DoCoMo. The operator and McDonald's collect relevant consumption information, such as what kind of burgers they often buy, which stores they go to, and how often they consume, and then accurately push the coupon to the user.

7, data search

Data search is not a new application, with the advent of the "big data" era, real-time, full-scope search needs to become increasingly strong. We need to be able to search various social networks, user behavior and other data. The value of this business application is to link real-time data processing and analysis with advertising, i.e., real-time advertising business and social service of in-app mobile advertising.

Operators have information about users' online behavior, which makes the acquired data "more comprehensive" and more valuable. Typical applications include China Mobile's Pangu Search.

Seven, the important impact of big data on the economy and society

1, can promote the realization of huge economic benefits

For example, the contribution of China's retail industry net profit growth, reduce the manufacturing industry product development, assembly costs and so on. It is expected that in 2013 the global big data directly and indirectly pull information technology spending will reach 120 billion U.S. dollars.

2, can promote the enhancement of the level of social management

Big data in the field of public **** services, can effectively promote the relevant work, improve the relevant departments of the level of decision-making, service efficiency and the level of social management, resulting in huge social value. Several European cities have analyzed real-time traffic flow data to guide motorists in choosing the best path to improve urban traffic conditions.

3, if there is no high-performance analytical tools, the value of big data will not be released

The application of big data must maintain a clear understanding of the results of the analysis can not be superstitious, but also because of its incomplete accuracy and to negate its important role.

1)Due to various reasons, the data objects analyzed and processed will inevitably include a variety of erroneous data, useless data, coupled with data analysis as the core of big data technology, artificial intelligence and other technologies have not yet fully matured, so the results of the computer to complete the analysis and processing of big data can not be required to be completely accurate. For example, Google through the analysis of hundreds of millions of users search content can predict the flu outbreak faster than professional organizations, but due to the interference of useless information on the microblogging, this prediction has also been inaccurate many times.

2) It must be clearly positioned that the role and value of big data is focused on being able to guide and inspire the innovative thinking of those who apply big data to assist in decision-making. Simply put, if you are dealing with a problem, usually people can think of a method, and big data can provide ten reference methods, even if only three of them are feasible, but also the solution to the problem of the idea of expanding three times.

So, objectively recognizing and playing the role of big data, not exaggerating and not narrowing, is the premise of accurate perception and application of big data.

VIII. Summarize

Whether the core value of big data is not prediction, but based on big data to form a decision-making model has brought a lot of business profitability and reputation.

1, from the value chain of big data to analyze, there are three modes:

1) hand in hand with big data, but did not make good use of it; more typical is the financial institutions, the telecommunications industry, government agencies and so on.

2)No data, but know how to help people with data to utilize it; more typical of IT consulting and services companies, such as Accenture, IBM, Oracle and so on.

3)Have both data and a big data mindset; more typical are Google, Amazon, Mastercard, and so on.

2, the future of big data in the field of the most valuable are two things:

1) people with big data thinking, this kind of people can be the potential value of big data into the actual benefits;

2) has not yet been touched by the big data business areas. These are the untapped oil wells, gold mines, and the so-called blue oceans.

Big data is a typical field of close integration of information technology and professional technology, information technology industry and various industry sectors, with a strong demand for applications, broad application prospects. In order to grasp the new opportunities brought about by this emerging field, it is necessary to constantly track and study big data, constantly improve the knowledge and understanding of big data, adhere to the synergistic *** advancement of technological innovation and application innovation, accelerate the development and utilization of big data in various fields of the economy and society, and promote the application needs and application level of the country, industry and enterprises for data to enter a new stage.