Who will protect our privacy in the era of big data?

"Birdie Cloud" is the cloud computing brand of Shenzhen Qianhai Birdie Cloud Computing Co., Ltd, a leading enterprise-level cloud computing service provider. The team has many years of experience in the industry, focusing on cloud computing technology research and development, for the majority of developers, government and enterprise users, financial institutions, etc., to provide a full range of cloud computing solutions based on intelligent cloud servers, to provide users with reliable enterprise-grade public cloud services.

There are always lessons to be learned from the frequent data breaches that occur every year, and one of them is that it's never too late to start taking data protection measures, no matter when. Fortunately, organizations are showing a greater focus on the data privacy side of things, and big data is one of their top areas of concern.

Just yesterday, five former Microsoft Corp. employees said in an interview with Reuters that Microsoft's vulnerability-reporting data had been breached in a break-in back in 2013, but that the incident didn't come to light at the time.

The former Microsoft employee said it took Microsoft more than a month to fix all the security holes listed in the breached database, so the vulnerability information that leaked out wouldn't have had much of an impact on users of Windows products. Microsoft had also hired a third-party company to investigate the incident to see if any attackers on the network were using the leaked vulnerability information to launch attacks, but the company did not find any attacks linked to the vulnerabilities in question.

Mary Shacklett is president of Transworld Data, a technology research and market development company. As an industry insider, she has some advice for enterprise management to ensure they are adopting solid data privacy practices for their big data.

One way to achieve anonymization is to encrypt personally identifiable data elements. Another way is by identifying data from individuals with similar values and then averaging them into a combined benefit value that is incorporated into a larger data analysis. Other methods include data revision or masking.

Collecting digitized information generated by governments, businesses, and individuals creates tremendous opportunities for knowledge- and information-based decision-making. Data can be exchanged and distributed between parties in need, driven by mutual benefit. However, data in its original form often contains sensitive personal information, the publication of which can violate individual privacy. Privacy protection under aggregate data publishing is an important and challenging dilemma. While most existing techniques use generalization and holistic deletion methods, we propose a partial (local) deletion method to anonymize aggregate-type data. The method guarantees that no matter how much a priori knowledge an attacker possesses, strong association rules about sensitive information no longer appear in the data after anonymization. The approach not only significantly reduces information loss, but also offers the choice of preferring to keep the original data distribution or protect useful association rules that can be mined, depending on the requirements of downstream usage scenarios. Preliminary evaluations show that our approach outperforms classical approaches by more than 100 times in preserving the original data distribution, retains more purposeful and useful association rules and introduces only a few spurious rules, and reduces information loss by about 30% on average.

These are just a few of the many ways that data privacy can be protected, such as identifying the departments within a company that are involved in big data and regularly reviewing the data privacy of those departments. Finally, when developing and implementing data privacy protection measures, they need to be based on the business needs and development of the organization.