Xiao-he only touches the sharp corner (56) when the number of users of the product increases.

There has always been an iron law in the field of internet technology: quality is never a problem when the number of users is small, but once the number of users goes up, the quality problem is a big problem.

For example, if your product has 10000 users, even if 1% users have problems when using the product, only 100 users are available, and just a few customer service staff can answer the questions. But if the number of users reaches 1 million, 1% is 1 million. If not solved systematically, the quality problem will become a huge black hole and the company will go bankrupt.

The reason why I wrote this article is because I had an in-depth exchange with a customer some time ago and learned an unknown history of their early entrepreneurship.

A few years ago, they launched an App product within two weeks, and then they got a lot of traffic by accident. Within a few months, it became a phenomenal App, and the number of users soared by several orders of magnitude.

It is conceivable that during this time, the technical team of this App lived a dead life. ...

At the beginning, the team was very proud of their achievements: a product was launched in two weeks and became an explosion within a few months. Isn't this the legend that the Internet industry has been longing for, but few people have experienced? Think about the extremely complicated technology in the big factory, and it is estimated that it will take more than two weeks to do the preliminary technical research and selection, BS them!

However, as the number of users soared to 100, problems began to explode like time bombs: flashback, black screen, domain name hijacking, cache crash, rapid increase of SQL query time, and avalanche of system performance. When the number of users increased from 654.38+million to more than one million, the machine expanded by 10 times and barely survived. Although the cost has increased, but the money is enough, the boss gritted his teeth. When the number of users breaks through from 1 million to 10 million, if we want to expand the machine tenfold, it is estimated that the boss will chop them to death before bankruptcy.

For a while, the technical team was caught in an infinite loop: a problem was found, a patch was made to solve it, and more new problems appeared, more patches were made, and more problems appeared.

An engineer even set a record of not sleeping for two days and two nights. The boss was worried that he would die suddenly in front of the computer, so he had to go home and rest. Before leaving his job, he muttered with a wry smile: How can this be so unscientific? What can I do? ...

After a period of hard work, the boss thinks that the brothers who started business with him are all from small companies and have never seen products with such a large number of users. If this goes on, it is estimated that everyone will be gray together, so he found a friend who is a technical manager in a big factory to help solve it.

Just a few months ago, I boasted to him that what can be done in two weeks will take several months for their big factory to use, which is simply a foot-binding cloth for a lazy woman, smelly and long. Alas, in the face of life and death, the face is a ball.

Friends are friends because they have nothing to brag about and help them deal with their troubles when they have something to do. After exchanging a few jokes, the two sides agreed to organize a technical team exchange. There are several experts in the field of mobile terminals and servers, the whole technical team, and all the development, testing and even operation and maintenance have been done together. This is a rare opportunity.

There is a teahouse surrounded by green water and green mountains. There is a large meeting room in the teahouse for large companies to hold activities. The screen, tables and chairs, and tea were all ready, and more than 20 engineers discussed it all day.

From the optimization of the start-up speed of the first screen of the mobile terminal, how to reduce the crash rate, page link decoupling, picture memory management, HTTPDNS direct connection, multi-bundling to solve the problem of overstaffing, dynamic page container selection, to the separation of the back-end database and table, how to design cache Key to prevent hot spots and facilitate horizontal expansion, how to use four-tier and seven-tier load balancing together, how to output data from static separation, gateway performance and security standards, account system design, and how to realize domain model and architecture stratification.

Slowly, one side is getting more and more tired because of talking too much; The other party is getting more and more excited, because he finally broke the enough paper and saw the sunshine. Although I will definitely continue to solve technical problems slowly after returning to my own company, I finally see hope. As long as I am on the right path, I will leave the rest to time.

The technical team is communicating deeply, but the boss is a thief, so he digs someone back to be CTO.

Although the original team was also divided into front-end, mobile, server, testing and other types of work, in fact, they all fought in their own way and were chased by problems. Now, although a new supervisor has been established, it has convinced everyone, because the new CTO has performed very well in this exchange, and can basically point out the reasons and give solutions to many of his own problems.

This is also a rare opportunity for the new CTO himself.

Big factories are full of talented people. Although they are experienced, they are getting older and older, and their struggle in the front line is becoming more and more inadequate. In addition, in recent years, the Internet industry has encountered bottlenecks, making it difficult to do new business and get business results. It is also particularly difficult to promote in large factories.

Now the products of the new owners have been proved by the market, and they just have the experience they lack. It can be said that both sides have pillows when they doze off.

After the new CTO arrived at the post, all the technologies were sorted out, and the conclusion was that the design of the old system was inconsistent with the order of magnitude of the product, and the cost of transformation was too high, so it was better to reinvent it.

If you want to do something, you must first set up a team: you have applied to your boss for a new HC, and the mobile, front-end and testing teams are all grassroots engineers TL with experience in large factories, allowing TL to bring some smart fresh graduates or young engineers who have just graduated for 2 or 3 years but are full of enthusiasm but lack of experience.

The first thing to do is to spend money to buy time: for hundreds of millions of users, buy the solutions of mainstream cloud vendors as the system foundation, and their employees will move the business logic in the old system to the new system foundation according to the new architecture design.

Then there is the replacement of the aircraft engine in flight: when the new system goes online, it will cut off part of the flow and replace the problem at the same time. It took half a year to finally switch all the traffic to the core system. Whether it is architecture design, development protocol, quality control, release process, online problem solving, it all began to look like a model, and finally the team was polished out.

The next step is to reduce costs and increase efficiency: the old machine will go offline, and the new system will also pull out the paid components one by one for review, starting from the edge module, taking the self-developed route based on open source projects and slowly replacing the paid components.

Finally, it is to build core technology competitiveness: the company's products are related to user safety. Because of the official background, a lot of user data has been accumulated. Based on these user data, it recruited excellent engineers and algorithmic personnel to make technological breakthroughs, and formed its own unique safety risk control advantages. Received more official projects, made money and got more data at the same time. Under the feedback of data, I made my own risk control system more stable, exported my technology to more partners, and even began to sell SaaS services in risk control.