Awareness of business and content security

Share some of my knowledge about enterprise security. Enterprise security is a very big concept. The ultimate goal of doing a good job in enterprise security is to ensure the normal development of the enterprise. The entire enterprise security system is composed of different modules. If any part is not done well, it will affect the development of the enterprise. This impact may be A company's revenue and profit may be a factor in its reputation, or even its survival.

Frequent contact with several departments of Party A: security department, operations department, review department, development department, etc. Each department has different concerns. The security department is basically responsible for network security matters. The operations department is responsible for ensuring the effectiveness of marketing strategies. The review department is responsible for content quality and content violations. The development department will be involved in the security platform. Unified development and construction. ? The importance of each department's work is also directly related to the company's business, but no matter which department has a problem with its work, the company will be affected.

To give an intuitive example, for a game company, it may suffer from DDOS attacks that affect the stable operation of the business, there may be data leakage issues that affect the company's reputation, and there may be content violations. , the entire game will be removed from the shelves for rectification. The most common problem is the plug-in problem. The direct consequence is the loss of users and revenue.

For example, various pornographic, gambling and drug information appeared. In June 2019, the Cyberspace Administration of China conducted a strict investigation on voice and removed a large number of applications from the shelves. ? The main solution in the industry is to connect business-related text, pictures, videos, and audios to the machine review platform. Currently, it is mainly the third-party service provider SaaS detection platform, or the company's self-built detection platform, which is mainly used Improve efficiency and reduce review time, while combining manual review to ensure results and reduce missed and misjudged rates.

Especially in game APPs, in terms of game cracking, if you are interested, you can search Taobao stores and enter the keyword game cracking. There will be a lot of stores and games to choose from. In addition to removing the normal charges in the game, the cracked game will also add some perverted functions, such as doubling the attack, etc. to attract players. Some stores charge according to the membership system, with a monthly payment of 150 yuan, which has exceeded the single-user income of many original games. Very deadly to the original game. To solve such problems, take mobile games as an example. For cracking problems, reinforcement methods can be used to prevent reverse cracking. For plug-in problems, game anti-plug-in technology can be used to deal with emulators, multi-openers, cloud real machines, and simulations. Click and wait to check, and combine it with operational methods to enhance the deterrent effect against cheats.

At the end of 2018, Starbucks conducted a coffee coupon event for newcomers to register. At that time, user verification was relatively simple. You could get coffee coupons by filling in relatively little information. Within a day and a half after going online, almost 4 million coupons were swiped away by the wool party. According to the valuation of the medium cup price, it was about 10 million yuan. . ? In the circle of the Wool Party, it is still possible to make hundreds of thousands every minute. The protection of the Wool Party is supported by a threat intelligence database, such as a blacklist of mobile phone numbers, IP addresses, and email addresses, and then data analysis and behavior analysis are performed by collecting relevant information about users during the activity. ? In this black and gray industry, the driving force of interests is very strong and the confrontation is fierce.

What’s interesting about data leaks is that basically more than 60% of data leaks are caused by insiders. The recent leakage of 160,000 resume information from a recruitment website is a typical incident of internal and external collusion. A resume worth 50 yuan is illegally sold to a vendor and then sold on Taobao for 1-2 yuan per copy. Therefore, data leakage prevention can not only be solved by using some data leakage prevention products, but also requires improving the system, paying attention to the division of authority, strengthening audit activities, conducting security awareness training for internal personnel, and increasing legal awareness.

DDoS attacks are also the oldest but most effective form of network attacks. Thanks to the development of network communications and Internet technology, DDoS attacks have become more and more intense. For example, many of today's IoT devices can be used to carry out DDos attacks. It is difficult for users to solve the source of attacks and can only protect them passively. In China, attacks of dozens of GB are now very common. They are usually mixed attacks with traffic and CC, and it is difficult to deal with them by deploying local protection equipment. Most of them are solved by cloud cleaning.

We can see that many domestic security manufacturers are starting to transform from hardware to cloud services, which is also a trend of cloud security services.

In this sharing, we still focus on how to solve the content security problems faced by enterprises in the context of the explosive growth of UGC content and the increasing intensity of national supervision.

The current state of content governance. From three perspectives, the first is the several characteristics of the regulatory perspective: there are many regulatory authorities, many regulatory requirements, and many special rectifications.

The regulatory authorities include: the Cyberspace Administration of China and the former State Administration of Radio, Film and Television, which are now split into the State Administration of Radio and Television, the State Press and Publication Administration, the National Film Administration, the Ministry of Culture, the Ministry of Public Security, and the Ministry of Industry and Information Technology.

The regulatory content of each regulatory department has its own focus, but there are also areas of overlap. ? For example, the Press and Publication Administration mainly supervises the content of news publications, and the State Administration of Radio and Television reviews radio and television content, such as various online dramas and TV series.

For an enterprise, as a subject of supervision, it will be supervised by multiple departments such as the public security department and the Cyberspace Administration of the place of registration. Supervision is generally implemented through user reports and special inspection activities. In particular, user reporting is a very important channel. For example, the Cyberspace Administration of China provides a central Cyberspace Administration of China illegal and harmful information reporting center. In June this year alone, it accepted 11.7 million reported incidents. Regulatory agencies not only establish their own reporting platforms, but also require major content platforms to build reporting channels, so we can see that, for example, major video websites have reporting feedback portals.

// In our future work and life, we can report any bad websites or content we encounter and submit them to the Cyberspace Administration of China.

The second characteristic of supervision is that there are many regulatory requirements. Those who are interested can check the regulatory requirements on the official websites of various regulatory authorities, which are very detailed at present;

I would like to emphasize the issue of the responsible subject. One of the subjects is the user and the other is the platform.

1. Take a scenario as an example. A user posted pornographic advertising information on a content platform. This kind of behavior by users is illegal, and it is also illegal for content platforms to publish this content. Objectively speaking, both should be punished, but in reality, the cost of holding users accountable is very high, so when it comes to various content violations, what we can see is mostly the handling of the platform.

And starting from June 1, 2017, the Cybersecurity Law was officially implemented, giving regulatory authorities another legal basis. Take another scenario as an example:

A malicious user tampered with the website to publish pornographic content through a network attack. The operating platform not only violated the content publishing requirements, but also violated the Cyber ??Security Law. If the operator fails to implement information system protection, certain penalties will be imposed in accordance with the Cybersecurity Law.

The third characteristic of supervision: numerous governance activities

Taking the Cyberspace Administration of China’s inspections as an example, from December 2018 to June 2019, content governance was initiated successively The activity is limited to 4 times.

In December 2018, a special inspection was conducted on APPs, mainly for apps related to pornography and drugs, illegal games, bad learning, etc., and 330,000 apps were removed from the shelves

In January 2019, special rectification was carried out on educational APPs, and more than 20 APPs such as "Work Dog" and "Pocket Teacher" were found to be illegally distributing pornographic content and were removed from the shelves

January 19 -In June, a six-month "network-wide rectification action" was carried out.

In June, a special voice rectification activity was carried out.

It can be seen that the country is determined and strong in building a green cyberspace environment.

Even under such strong supervision, illegal content still emerges in endlessly.

Characteristics of illegal content: multiple coverage scenarios, multiple data variants, and strong confrontation.

(1) In terms of covering scenes, it has reached the point of being pervasive. ? News content, user comments, user avatars, nicknames, and barrages for watching online dramas. No scene where content is published can escape the harassment of illegal content.

(2) In various scenarios, there are many types and variants of illegal data.

From the initial text sensitive words to the current font scoring, confusion of special symbols, and illegal content embedded in pictures, in the past year or two, there has been an additional ASMR content type in voice, which is mixed with a lot of pornographic content.

(3) Strong confrontation means that there is a certain degree of organization and confrontation in publishing illegal content, and changes in content form and account changes are used to resist detection or operation strategies. This section will explain in detail the necessity of building a defense-in-depth volume later.

So under the background of strong national supervision, ensuring content security is actually a relatively difficult issue. ?

For managers, what they ultimately want to look at generally include two indicators: the effect of detection and the impact on business. ? The detection effect here generally depends on the accuracy rate and recall rate. The business impact mainly depends on the detection time, and try not to affect the user experience. For example, in IM chat detection, if the detection time of a text exceeds 1 second, it will have a serious impact on the user experience.

In order to achieve these goals, there are many difficulties in building a self-built detection system from 0 to 1.

The first is cost input, the two most important costs: labor cost and equipment cost. In terms of labor costs, the cost of recruiting people in the Internet is still very high. The annual salary of a mature algorithm expert is generally around 50W. Moreover, the entire system requires not only algorithm personnel, but also related operations and review personnel. Just investing in manpower alone would require millions of dollars. ? In terms of equipment, the GPU nodes currently required for image processing are relatively expensive. For example, an NVIDIA P40 graphics card was launched in 2016. Now it costs about 5W. The image detection concurrency that a P40 can do is about 30QPS. In addition, GPU nodes are required for model training. It is also relatively high overhead

In addition to considering cost, there are also barriers to data accumulation and review experience. Taking image training as an example, a detection model requires tens of thousands or even hundreds of thousands of sample data. This kind of sample data accumulation is impossible without certain time and channels.

In addition, the experience of the auditors and the audit process and system are also important guarantees of the effect. The audit experience of the personnel determines the subjective audit effect and audit efficiency. The perfect process and system are the objective guarantee of the effect. protection. ? Personnel experience depends on continuous learning and training, and processes and systems take time to formulate and improve. All require a process.

Next, let me introduce the construction of the testing team and technical system

The first is team building. Here I will take the company’s team as an example;

The entire large team is subdivided into several small teams, algorithm team, system development team, operations team, and manual review team;

The core technology is implemented by the algorithm team, and the team is subdivided into different teams. Groups, such as the group doing text machine learning semester and the image machine learning group;

The system development team is responsible for building the business platform;

The operations team is responsible for directly connecting with the business department and clearly detecting Standard requirements, and adjust some detection strategies in real time to optimize the effect;

The audit team has the largest number of people, and currently completes all-weather audit work in a shift-based work model.

When formulating testing standards, two principles should be considered, one is the principle of comprehensiveness, and the other is the principle of implementability.

From a comprehensive perspective, there are two needing subjects that need to be considered, one is the country and the other is the operating platform. ? For the country, pornography, violent terrorism, and contraband are all prohibited content, and there will be relevant laws and regulations that prohibit their appearance. These standards are basically the tests that all content platforms must meet.

For operating platforms, content such as abuse, trolling, and competitive product advertising information is not expected to appear.

This emphasizes a real-time nature. From the request to the implementation of the standard, it needs to be completed as soon as possible to reduce the vacuum period of testing. ?

From the perspective of implementation, it is necessary to ensure that data can be collected and the model can be trained. Data can be collected for people, and the standards can be descriptive, but data collection and labeling must be detailed.

For example, under the category of pornography, for the detection requirements of "sexual behavior", the required text itself describes the scope and concept of sexual behavior. When it comes to data labeling, more details are needed. For example, for pictures of leaking buttocks, it needs to be explained. , descriptions are divided into different categories based on factors such as the angle of shooting, whether there are any missing spots, and whether it is a photo of children. Photos that end up being labeled as pornographic, vulgar, sexy, or normal.

After the standards are formulated, different standards are applied according to the scene detection needs. ? There is nothing wrong with posting sexy pictures in news content, but it is not normal for them to appear in children’s education IM.

The three most important platforms:

The detection platform (the core of the service) is preset with various models that have been trained.

Manual review platform (supplementing effects and capabilities to improve efficiency), its functions include random inspection of data, quick review operations and other functions.

The model training platform (effect guarantee) is mainly composed of GPU clusters.

The business system is connected to the detection system, and the detection results of text and images can be fed back in real time. ? Part of the data that requires manual review is connected by the detection platform and the review platform, and the review platform finally returns the results to the business system.

The machine training platform mainly performs model training and tuning based on badcases from various channels, and finally inputs the training results for use by the detection platform.

In this way, these platforms form a closed loop to achieve the goal of rapid business access and sustainable optimization of effects.

The above three parts, team, standard and platform, form a relatively complete detection system. Can handle routine content detection needs.

But the actual situation is that content management is not only about processing content, but also requires an in-depth detection and defense system.

Objective facts show that most of the illegal content is posted by abnormal users. Content management is a direct battle between enterprises and black and gray products. Only content detection methods are too single, or they may fall into a situation where they are exhausted. .

Why is content management a direct competition between enterprises and black and gray products? Let’s first look at the business process of a black and gray product:

From a role perspective, there are order issuers, There are business subcontracting and content platforms. There are several types of issuers, such as various pornographic, gambling and drug websites. In order to attract traffic, they need to publish website-related information. There are also people who publish illegal content on the same industry platform for the purpose of malicious competition. The issuer will find a business subcontracting role to publish illegal content. This business subcontracting will involve many roles, including people who specialize in writing automation tools, people who resell accounts, and platforms that implement content publishing, such as Various group control platforms. In the end, there was an issuer who launched a flood of releases on major platforms

Nowadays, the black and gray industry is very mature, and the division of labor in each link is different. As shown in the PPT, there are dedicated mobile phone card merchants and account merchants. , coding platform, various cloud control platforms, etc.

As we all know, current mobile phone cards are all real-name. So how do mobile card merchants implement large-volume card applications? There is an operation method. Register a company, and you can apply for large-volume IoT cards in the name of the company. These IoT cards do not have voice capabilities, but they can send and receive text messages. You can use it to register and log in to your account. ? So when you call back a registered mobile phone number, and the voice prompts: The number you dialed does not have the voice function enabled, it is most likely an Internet of Things card.

The profit driving force here is very strong. For example, a new account is worth a few yuan, but through means such as publishing normal content from time to time, if maintained well, it can eventually be worth tens of yuan or even One hundred dollars.

Posting on major content platforms, the current confrontation is particularly fierce. Take Weibo as an example. You can observe that in the past, pornographic accounts would directly publish pornographic remarks at various hot times, such as pornographic websites. , or add contact information. ? This type is easier to detect and ban. It has now been transformed into account avatars that are replaced by sexier pictures that are not pornographic. Most of the published content is normal comments, but the personal owners are all pornographic information. This is used to enhance confrontation.

In this context of strong confrontation, only content detection means are too single, and in-depth protection is the key.

Content governance is not only about the detection of published content, but also needs to start from the source. Carry out remediation.

? It is necessary to establish a comprehensive defense system, from account registration, to account login, to user behavior, and finally to published content, to conduct all-round detection to achieve better results. That is to say, it extends from content detection to user behavior detection. Only with the ability of user profiling can we better fight against black and gray attacks.

During the registration phase, there will be problems with batch registration and false registration. You can consider using verification codes, number authentication, and real-person authentication to solve the problems with batch login and brute force cracking during the login phase. You can Use verification codes and anti-cheating techniques. Then detect the publishing behavior and content, such as processing the behavior of the same account publishing a large amount of similar content in a short period of time.

Let’s briefly explain the technical means mentioned here with the verification code and anti-cheating.

Let’s start with the verification code, which is mainly used for human-machine identification. The purpose is to increase the attack cost of the attacker. . Early verification codes, such as character-type verification codes, are very easy to crack. The cracking mainly uses OCR recognition technology, which can easily identify the characters in the picture. Most verification codes currently used are still smart verification codes. It is judged by analyzing some user behavior information and device information. Nowadays, the more mainstream ones such as puzzle sliding verification code and text click verification code have enhanced the ability to resist.

The technologies used in anti-cheating, such as IP profiling, will detect the user’s IP geographical location, whether it is a proxy IP, etc. The detection of the device environment will detect whether the device is an emulator and whether it is an emulator. With root or jailbreak, the user's behavior is analyzed, and a normal behavior baseline is set through rules based on the information between various dimensions. Generally, these are event entries for registration, login, and key business operations, such as posting operations.

The above are typical security issues, and some sharing has been focused on content security construction. ?——Kaka Orange Juice, a content and business security practitioner