How to escape the limitations of thinking?

This article is reproduced from: Human-Computer and Cognition Laboratory, authors: Diao Shengfu, Yao Zhiying, Foshan University of Science and Technology

Abstract: While attaching great importance to big data thinking, we must also maintain Rationality, take its limitations seriously: the misunderstanding of the full data model, the anxiety of quantitative thinking and the excessive worship of correlation; it is necessary to realize the complementarity of big data thinking by taking into account the whole and the parts, integrating quantification and quality, and emphasizing correlation between cause and effect. Beyond.

With the rapid development of the new generation of information technology, especially the widespread popularization of mobile Internet, big data, cloud computing and smart wearable technologies, data is growing explosively, and human society has entered a data-based era. Characterized by the big data era. “The arrival of a digital era in which ‘everything is recorded and everything is analyzed’ is irresistible.” [1]10 In the big data environment, data has become a "new energy" that drives economic and social development and creates greater economic and social benefits. In the field of scientific research, computer Turing Award winner Jim Gray proposed the "Fourth Paradigm" of scientific research, which is a scientific research paradigm based on data-intensive computing. In this context, "quantify everything" and "let the data speak" have become the slogans of the times. People pay more attention to the holistic thinking of "full data rather than samples", pursue the quantitative thinking of "quantification rather than qualitative", and emphasize " correlation rather than causation”. This undoubtedly has a huge impact on traditional thinking that grasps the interrelationships between things through the pursuit of regularity, causality and sampling methods. However, everything is a unity of opposites. In the current craze of big data thinking, we need to remain rational, dialectically view the changes in thinking brought about by it, take its limitations seriously, and explore ways to complement each other, so as to achieve better thinking at the level of thinking. to adapt to the survival and development of the big data era.

1 Limitations of big data thinking

1. Misunderstanding of the full data model

With the popularization of various sensors and smart devices, real-time monitoring of things and data collection and transmission can be achieved. The data obtained from things is not just sample data, but All data, this mode is called "full data mode". On the basis of the full data model, the characteristics and attributes of things can be more comprehensively analyzed and grasped, and it is also conducive to more objective and scientific decision-making. But for the full data model, some scholars also pointed out: "N = all" is often an assumption about the data, not a reality. Therefore, while pursuing full data, necessary deliberation is required.

First of all, we are gradually falling into the contradiction between the explosive growth of data and technological lag. In a big data environment, data changes rapidly and does not remain static. According to IBM's estimates, the amount of new data generated every day reaches 2.5*1018 bytes. If 1 cubic meter of water is compared to one byte, then its data volume is 1.42*1018 compared to the total water storage on the earth. It is even larger than cubic meters, and its data increase is very amazing. Even though the level of data technology is improving rapidly, it still lags behind the growth rate of data. "Even if we do collect all the data and analyze it with technology, we can only grasp the relationship between points, or grasp the local correlation. But this does not mean that we can obtain universal laws and trends in the development of things. "This shows that the relative lag in technology hinders the realization of the full data model.

Secondly, the objective existence of “data islands” places certain restrictions on the realization of the “full data model”. To realize the "full data model", an important prerequisite is to realize data openness and sharing. As the value contained in data becomes familiar to enterprises and governments, data openness and sharing have achieved certain results. However, so far, the circulation channels of data resources have not been fully opened, and the problem of "data islands" still exists to a certain extent. . The main manifestations are: first, the cross-industry flow of data has not yet been truly realized. After enterprises and governments realize the potential value of data, they also quickly realize the flow of data resources between departments or within departments to facilitate the convenient development of the organization. However, driven by the interests of various data subjects, data between departments and within departments have not realized true mutual flow. This has also become another important problem that "data islands" need to solve urgently. Second, the rise of the data trading market has intensified the formation of "data islands" to a certain extent. Emerging companies that use data sales as their profit model will inevitably increase the confidentiality of the data they collect, driven by interests. This psychology and behavior will also make the problem of "data islands" more prominent.

Third, the slow connection speed of enterprises and the rapid data update speed make the problem of "data islands" prominent. Because the development speed of technology cannot keep up with the growth rate of data, data updates are slow, and the intersection of new and old data will "bind" people's vision, resulting in a new level of "data islands." Therefore, the so-called "full data model" may become the ideal state we long for, a new "utopia" structured by the development of data technology, and a projection of the information society - the shadow of Plato's cave.

Finally, the key value of big data does not lie in "big" and "complete", but in "useful". The pursuit of all-data patterns will create the illusion that as long as all data can be obtained, more data value can be mined. At present, most of the data that can be mined for value are structured data that can be recognized by computers. However, in the entire data world, most valuable data are based on unstructured data whose documents have not been identified. In 2014, unstructured data accounted for more than 80% of the total amount of new data, and in 2015, this proportion exceeded 85%. At the same time, unstructured data is growing at more than twice the rate of structured data. This results in some unstructured data that cannot be identified because it cannot be identified and becomes "data garbage" and is eventually discarded. In this way, the implementation of what we call the "full data model" will become more difficult.

2. The anxiety of quantitative thinking

In the era of big data, all phenomena and behavioral changes in nature and human society have been digitized, making it possible to "quantify everything". While digitizing things, we need to pay attention to several problems with quantitative thinking. Defects of Ontology and Methods In today's big data era, all people's activities will leave data traces. The entire world has gradually evolved into a data-based world, and the data worldview continues to highlight. Under the guidance of the data worldview, “quantifying everything” has become the methodology of the big data era. Philosophers have also begun to reflect on the relationship between data and the world, and even put forward the assertion that "the origin of the world is data." But has data become the ontology of the world? We believe that the reason for such a concept is mainly due to the misunderstanding of the nature of data, and this issue needs to be considered carefully.

First of all, the data sources of big data are mainly based on people’s conscious or unconscious behaviors in social life. In other words, big data is a quantitative reflection of the objective existence of perceptual object activities in people's social life, and "quantifying everything" is an ideal method of understanding things proposed in the era of big data. Therefore, in essence, the source of data is still the objective material world. Without the material world, data becomes "water without a source and a root without wood."

Secondly, the main purpose of "quantifying everything" is to collect, transmit, store and analyze data generated by people's past perceptual object activities, so as to intervene and guide people's behavior. Its main function is to improve the objectivity and scientificity of predictions and better utilize people's subjective initiative and creativity. However, this ideal method of "quantifying everything" only realizes that "data is static data of human social life", but ignores the objective fact that "human social life is dynamic data". It regards the entire human social life as a lifeless static data set, ignoring that many phenomena in the entire nature and human society are rapidly changing and complex.

(2) Personal behavior is “selected”

Quantitative prediction will make personal behavior “selected”. Quantitative analysis and processing of people's behaviors, attitudes, personalities, etc. based on big data technology can predict and help people find the so-called suitable love and marriage partners, but we will also question: Is the partner found by the system for the individual the most suitable? What about? If we make this choice following quantitative analysis of data, should personal intuition and feelings be abandoned? Do we give up our right to choose or follow the system so that we are "chosen"? From another perspective, This is a question about the understanding of the relationship between sensibility and rationality: Perceptual factors such as feeling and inspiration are the only ones at the beginning of human life, and they are the most instinctive intuition about the entire nature and society. Rationality is acquired through gradual development based on sensibility. The reason why people pay more attention to rationality is mainly because rationality is easy for people to grasp because of its clear and rigorous logic, while sensibility is easy for people to ignore because of its uncertainty. But precisely because of this, rationality is limited, but sensibility can break through limitations and extend infinitely because of its uncertainty, and can also make the most instinctive and intuitive response to the ever-changing and developing world.

We have doubts about finding the so-called suitable love or marriage partner based on big data analysis, because just like the human brain cannot be replaced by computers, sensibility cannot be replaced by rationality.

The object predicted by big data analysis may be a good choice, but it is not necessarily the appropriate or best choice, and this kind of prediction has actually had a certain impact on the individual's freedom of choice.

(3) The intensification of data dictatorship

Quantitative forecasting intensifies the "data dictatorship". The core of data-based thinking is quantification, or “let data speak”. Successful predictions made by quantitative analysis will further increase people's dependence on data assets. Walmart’s so-called “beer and diapers” success story is proof of this. Nowadays, both enterprises and governments pay more attention to the role of data, especially in the decision-making process. It seems that the lack of data will greatly reduce its persuasiveness. If the government makes any decision based on data, it will have the opposite consequences. For example, assuming this year's GDP is 6%, last year's GDP was 6.3%, and this year is down 0.3 percentage points compared to last year, can we conclude that this year's economy will definitely be worse than last year? Obviously, based on this data alone, Making such an assessment for a standard is not objective. Internet philosopher Yevgeny Morozov has issued a scathing critique of the ideology behind many "big data" applications, warning of an impending "tyranny of data." "Words are not intended, but meaning comes from the situation." Data analysis and prediction need to be connected with the corresponding scene, otherwise "ambiguity" will occur.

(4) Privacy Peeping and Moral Torture

“Quantifying everything” further exposes personal privacy to prying eyes. At the same time, quantitative predictions are sometimes contrary to moral ethics. First, personal privacy is exposed to the sun. The application of various smart devices such as wearable tools and smart chips can monitor all people's behaviors in real time. We are exposed to the monitoring of the "third eye" and become "transparent people". For example, various medical sensors can monitor individual physiological changes in real time. Secondly, data privacy leaks deepen social discrimination. With the digitization of personal behavior, privacy leakage problems will easily occur under the inducement of data interests, which will also deepen social discrimination. For example, when a hospital leaks personal medical data and the data shows that someone has HIV, people will look at the person with colored glasses, causing the patient's psychological imbalance, life difficulties, employment difficulties, etc. In addition to personal human rights being violated, social discrimination The degree has also been further deepened. Finally, big data predictions sometimes violate human ethics. As we all know, Target has a project analysis, which is based on the data analysis of individual browsing and purchasing of pregnancy products. It can predict in advance when a girl will become pregnant, and give relevant pregnancy product coupons to the girl, but her father does not know about it. After learning, he scolded the manager. This incident reflects two questions worth pondering: First, how did the company learn that the girl was pregnant? How was the privacy of individuals leaked? In other words, our privacy is under the prying eyes, and without the individual’s knowledge, Being accessed without consent not only frightens the individual, but is also against the law. Secondly, as the girl’s closest person, the father has not yet learned about the matter, but the company learned about it first and pushed the coupons. Is this disrespectful to others? Is it against moral ethics? Related ethical issues are worth mentioning. Reflect.

3. Excessive worship of correlation

The core thinking of big data is correlation thinking, but correlation thinking also gives rise to the problem of excessive worship in life practice. There are several main reasons why people over-revere related thinking: First, the existence of massive data makes it impossible for people to directly dig out truly valuable things from the vast amounts of messy data. Therefore, people can only statistically Correlation analysis to obtain the correlation between things, and then further dig out the real "knowledge" behind it. Secondly, in the context of a highly complex and uncertain era, it has become more difficult for people to discover the causality between things. Complexity science tells us that the world is complex and universally connected, requiring us to use complexity thinking to view the world and grasp and study the entire human society as a whole. Relevant thinking grasps the correlation between things from a macro perspective, which further intensifies people's admiration for relevant thinking. Finally, in a rapidly changing environment, correlation analysis is more suitable for business operation logic: only focusing on form and not seeking reasons.

For practical business activities, the pursuit is to obtain the maximum profit at the lowest cost in the shortest time, which further intensifies the excessive worship of related thinking by enterprises. “The essence of big data is a statistical correlation. From a phenomenological point of view, it is consistent with the statistical laws in classical science. This is where they are the same or easily confused.” [2] However, the following two issues must be paid attention to when using correlation analysis: First, the key to correlation analysis is to find "correlated objects". With the growth of data volume, the breadth and depth of data are also expanding, and there are more and more meaningless redundant and junk data, which brings more data noise, and truly valuable data is submerged in it. How to Finding "correlators" from numerous data noises is an important issue that needs to be solved in big data analysis. Second, the objective existence of pseudo-correlation and false correlation is a difficulty in big data analysis. Statistically, there are many types of correlations, including positive correlations and negative correlations, strong correlations and weak correlations, as well as spurious correlations, spurious correlations, etc. False correlation and other correlations will lead to errors in analysis results and serious consequences. This is confirmed by several incorrect flu predictions made by Google’s flu system. How to identify false correlations and other related relationships is a difficulty that needs to be overcome in big data analysis. Looking for the causal relationship between things is a long-standing mindset and habit formed by human beings, and it is also a necessary way to grasp the inner essence of things. The famous philosopher of science Reichenbach believes: "There is no correlation without causation." To prevent blind worship of correlation thinking and break through the limitations of big data thinking, we must focus on using complementary thinking to transcend the limitations of big data thinking. sex.

2 Achieve transcendence of big data thinking through complementation

1. The whole takes into account the parts

As philosophical categories that mark the divisibility and unity of objective things, the whole and the part have important epistemological significance. From a methodological point of view, the "full data model" focuses on using a holistic approach to grasp things, rather than restoring methods. Therefore, to overcome the limitations of the "full data model", we must focus on the whole and grasp it systematically; we must take into account the parts and deepen our understanding. Achieve the unification of the overall approach and the reduction approach.

First of all, focus on the overall situation and grasp it systematically. Classical systems theory believes that the whole thing should be regarded as an organic whole, and attention should be paid to grasping the characteristics and functions of the whole. In addition, complexity science believes that the world is complex and changeable, requiring us to have a global vision and grasp complex objects as a whole. In the era of big data, what we should do is to treat all data as a whole, use machines and modeling to find correlations between data, find "related objects", grasp the overall attributes of things reflected behind the data, and further analyze The structure and connection between the various elements within things, in-depth exploration of the causality between elements, and a concrete and comprehensive understanding of things.

Secondly, consider all aspects and deepen understanding. Traditional reductionism believes that things are divided into different parts, and understanding of the whole is achieved through the understanding and integration of each part. Although traditional reductionism also has the defect of ignoring the interconnection and interaction between various parts of things, this does not mean that reductionism is useless, and its reduction method does not eliminate people's overall understanding of things. In terms of research strategies, the idea of ??reductionism is mainly reflected in a layer-by-layer analysis strategy. Therefore, in the era of complexity, the key to using reduction methods well lies in recognizing the hierarchy of things in reduction.

In the era of big data, due to the huge data and complex structure, it is difficult to find the causal relationship between the data. Therefore, we use the whole data as a whole to grasp its correlation, but the data What is the overall essence of materialization? We need to further analyze the causal logic between its internal elements, which essentially uses the reduction method. In this sense, causal logical inquiry is a concrete embodiment of the reduction method, but this reduction method is different from the traditional reduction method. Therefore, "the complex relationship between reduction methods and holistic methods should be "complementary" in the final analysis." The development of modern science also shows that "it will not work without reductionism, but it will not work as long as reductionism; it will not work without holism, and it will not work as long as holism... The scientific attitude is to combine reductionism with holism." Only by fully understanding the dialectical relationship between the whole and the part, and understanding the complex relationship between the holistic method and the reduction method, can we make good use of this tool to understand and transform the world.

2. Quantitative Integration of Qualitative

The purpose of quantitative research is to answer the quantitative attributes of things and their movements, while the purpose of qualitative research is to deeply study the specific characteristics or behaviors of objects and further explore the causes of their occurrence . From the content point of view, qualitative research and quantitative research should be unified and complementary to each other: qualitative research lays the foundation for quantitative research and is the basis for quantitative research; while quantitative research is the embodiment of qualitative research, making qualitative research It is more scientific and accurate, thus drawing more extensive and in-depth conclusions. The two analyze problems from different angles, and each has its own advantages. It is precisely because of this that we can achieve a more comprehensive understanding of things. Therefore, the two should be combined in scientific research to learn from each other's strengths and maximize the effect. First of all, the overall grasp of quantity lays the foundation for qualitative research. In the big data environment, "quantifying everything" plays an important role mainly for three reasons: First, massive data makes it possible to "quantify everything". Based on the application of various smart devices, people's physical world and virtual world can be quantified. Through the digital analysis of perceptual objects, the correlation between data can be found based on the degree of correlation presented by the quantitative correlation coefficient, and the relationship between data can be grasped. The correlation relationship quantitatively determines the connection between data materialization. Second, "quantifying everything" helps us grasp things from a quantitative perspective. Through quantitative analysis, we can have a general understanding of the quantitative integrity of things, and this overall understanding is not an abstract and universal understanding of things in the sense of qualitative research, but a specific and detailed understanding of relevant things. The overall understanding of specific things allows us to construct a new overall picture. Third, big data itself is essentially a collection of quantitative relationships, which has practical guiding significance. Albert-Laszlo Barabas pointed out: "93% of human behavior is predictable, but in the past we did not have relevant data, nor did we have certain methods to explore human behavior." Therefore, quantitative research is very important for grasping the relationship between things. The correlation trend plays an important value role. Second, qualitative causal research creates new connections and meets new needs. Although quantitative analysis of big data allows us to grasp the correlation of things as a whole, it cannot clarify the causality between the two. Causality is the connection between the interaction process between elements and their effects. Therefore, on the basis of grasping related things from the quantitative dimension, it is necessary to conduct in-depth research on the structure and combination of the internal components of things, explore the causality of each internal element, change the interaction between each element, and create creation based on the needs of human development. produce results that meet people's needs. On the other hand, when creating new causal relationships derived from the causal logic between internal elements, they can be further investigated or tested in quantitative research. In this way, quantitative research provides qualitative research with the quantified overall attributes and general structure of perceptual objects. On this basis, qualitative research deeply explores the interaction between elements, obtains representative conclusions, and then puts them into the full data. Quantitative research provides empirical evidence to achieve the complementarity of quantification and qualitative research.

3. Causality emphasizes correlation. In the context of the era of big data, Schonberg proposed that "it is enough for us to know what it is, and there is no need to know why." Since then, people have paid more attention to correlation rather than causation. However, while the entire human society is actively paying attention to correlation, it is also necessary to reflect on and re-evaluate the importance and impact of causality. We can't help but have doubts and ask: First, is there an ontological issue of causality in the world? Second, what is the relationship between correlation and causation? Third, how to realize both in scientific research Complementarity? Regarding the issue of ontology of causality, we believe that causality exists objectively. Causal thinking is a thinking habit formed by humans over a long period of time, and it is also the logical premise for us to understand the nature of the world. Since modern times, the research results of natural sciences and humanities and social sciences are based on rigorous mathematical logical reasoning of causality, and the central task of natural sciences is to reveal the causal relationships between things. Regarding the relationship between causation and correlation, some scholars believe that it is a reflection of the relationship between science and technology in the context of the big data era. Science is the study of causal relationships, that is, the law of cause and effect, while technology is the method and technique of solving problems. There are differences in the focus of the two, but they are not antagonistic, just as technology solves "how" and science answers "why" , correlation can guide us "how to do it" in practice, and causality can answer "why" we do it.

Even though the era of big data places more emphasis on correlation, it is still inseparable from the pursuit of causality, which is determined by the nature of thinking. Focusing on correlation analysis does not negate causal analysis, nor does it mean that causality is unimportant. On the contrary, it is more conducive to in-depth analysis of causality, because the two are not exclusive, but coexisting. We can realize the complementary advantages of the two in scientific research. First, correlation lays the foundation for causality research. In the era of big data, we can quickly, conveniently and accurately find the correlates of something through correlation analysis based on massive data, and then explore the causal relationship between the correlates to grasp the essence of the thing. As Schonberger said: "By finding things that may be related, we can conduct further causal relationship analysis on this basis. If there is a causal relationship, we can further find out the reasons." And when looking for characteristic correlates The process actually includes analysis of cause and effect relationships.

Secondly, causality is the inherent regulation and goal of correlation. In the field of scientific research, what we are pursuing is not only to know the correlation of "what", but more importantly, to clarify the causality of "why" between things, so that the scientific theory established can withstand the test of practice. In this sense, causality is the inherent and essential stipulation of correlation in the era of big data. It is also the goal pursued behind correlation and plays a decisive role. What we need to do is to use causal thinking as the research foundation and correlation thinking as the research orientation, complement each other to explore the value contained in big data, and achieve the transcendence of big data thinking.

Comments are welcome

"Calling Order"

If you possess academic skills, come quickly!

1. Have academic standards: a certain professional academic standard is a must!

2. Those with original ideas: those with depth, breadth and sharpness are the best!

3. Content categories: natural sciences, social sciences, etc. No restrictions!

4. Information or translation articles: any that meet the above conditions are acceptable.

Academic plus consultation/submission email

Statement: The copyright belongs to the original author. The views expressed in this article do not represent the position of this institution.

"Journal of China Electronics Research Institute" welcomes submissions from experts and scholars! Submission link ki.net

Journal phone number: 010-

Journal email address: