How to improve the data storage and disaster tolerance of health information platform

The regional health information platform stores the data of residents' health records. The platform should provide continuous services for individuals and medical institutions for 7×24 hours in real time, and adopt a six-level disaster recovery scheme to copy real-time data to realize real-time backup and zero loss of remote data. The processing center system of disaster tolerance platform has the same processing capacity and is completely compatible with the production data processing system, which can realize real-time seamless switching and has the real-time monitoring and automatic switching ability of remote cluster system; The end users of the business system can access the active and standby centers at the same time through the network; Disaster recovery center provides 7×24 application service technical support, and has a perfect and strict operation and management mechanism. According to the actual application and development requirements of regional health information platform data center, disaster recovery backup should have the following characteristics.

1. High performance

Fully consider the processing capacity of disaster recovery system, so that the whole system design can maintain the leading level in China and has considerable development capacity to adapt to the development trend of disaster recovery technology in the future.

2. High reliability

Disaster recovery system provides the disaster-resistant ability of medical key business data, and its own stability and reliability need to be fully considered in the system design stage to ensure that key data can be continuously and stably transmitted to the disaster recovery system. When there is any problem, the normal operation of the business system can be restored through disaster recovery data.

3. Standardization

The disaster recovery system should comply with relevant domestic and international standards to ensure interoperability between different brands of disaster recovery solutions and openness of the system.

4. Measurability

The design of disaster recovery system should not only meet the current demand, but also fully consider the business development. At the same time, it is convenient to upgrade and connect the updated technology to protect the current investment.

5. Maintainability

The design of the whole disaster recovery system should fully consider easy management, easy maintenance and easy operation.

6. Security

Because the disaster recovery system carries the data copy of residents' health records, the security of data during transmission, storage and access should be considered in the design.

7. Disaster tolerance in different places

This scheme realizes the disaster tolerance of two data centers, that is, when any data center has a disaster, the other data center can automatically take over the business. RPO = 0, RTO< 15 minutes.

8. Easy to expand

The disaster recovery system should be easy to expand to meet the increasing data disaster recovery needs of customers, while protecting the existing investment of customers and flexibly adapting to future business development and disaster recovery system upgrade.

9. Quick recovery

According to the characteristics of medical services, Huawei provided a dual-active disaster recovery scheme for FusionCloud platform based on Huawei VIS cluster technology and mirror technology, which solved the data storage and disaster recovery problems of regional medical information platforms. This scheme requires that the distance between the main medical center and the disaster recovery center should be within 100km to ensure the reliability and stability of the system. It also flexibly integrates the virtual machine HA function of FusionCloud platform, as well as the virtualization function, mirroring function and multi-node cluster technology of Huawei VIS6000. Through the virtualization function of Huawei VIS6000, the storage pools of production center and disaster recovery center are integrated, and the real-time synchronization of data of production center and disaster recovery center is realized through the mirroring technology of VIS6000. At the same time, the multi-node cluster technology of Huawei VIS6000 is used to realize the high availability of VIS6000 nodes in the production center and disaster recovery center. When a disaster occurs in any data center, the virtual machine and related business systems can automatically switch to another data center, which fully meets the continuous demand of customers for cloud platform business.

Huawei OceanStor N8000 cluster storage system is adopted in the storage arrays of the main center and disaster recovery center of the regional health information platform to meet the long-term safe storage of big data of health records, and Huawei VIS6000 is adopted for virtualization integration to realize unified management of storage resources in the two places. Huawei RH5885V2, RH2488V2 and E9000 server groups are used as the computing resources of the main center and disaster recovery center of the regional health information platform. Through the unified management of the cloud platform, the resources that meet the business requirements are dynamically allocated according to the business requirements to meet the platform's requirements for computing power. Deploy ManageONE in the data center of regional health information platform to manage and monitor all resources in the data center in a unified way. See figure 1 for the network topology of the dual-active disaster recovery scheme of FusionCloud cloud platform.

The dual active disaster recovery scheme of FusionCloud cloud platform can realize the following four disaster recovery scenarios.

1. storage failure

Assuming that one or more storage in the production center fails, the virtual machines and application systems deployed on these failed storage can be seamlessly switched to the mirror storage corresponding to the disaster recovery center, and the virtual machine operating system and application system will not be interrupted.

2. Virtualization equipment failure

Assuming that the VIS virtualization equipment in the production center fails, all virtual machines and application systems in the production center can be seamlessly switched to the VIS virtualization equipment in the disaster recovery center, and the virtual machine operating system and application system in the production center will not be interrupted.

3. Server failure

When the management node server fails, any main management node fails, and its standby node deployed in the disaster recovery center can immediately take over the business of the failed node without affecting the normal operation of the cloud platform. When a compute node server fails, all virtual machines on the failed node will be automatically rebuilt and restored.

4. Total disaster in the production center

Production center due to large-scale natural disasters (such as earthquakes, tsunamis, etc. ) or man-made disasters (fires, etc. ), resulting in the unavailability of the entire production center. The normal operation of the whole cloud platform of the disaster recovery center can be ensured by automatic switching of storage mirror, automatic failover of VIS cluster and automatic failover of main and standby management nodes of the cloud platform. At the same time, through the HA function of virtual machines, virtual machines with faults in the production center can be automatically rebuilt and restored in the disaster recovery center, and related business systems can be restored in the disaster recovery center to continue to provide services to the outside world.

Advantages of dual active disaster recovery scheme for Huawei regional health data center

1. Dual-active disaster recovery mode: The medical business system is deployed in the production center and the disaster recovery center at the same time, which greatly improves the utilization rate of resources and the working efficiency and performance of the system, so that customers can get the maximum value from the investment in disaster recovery system.

2. Automatic disaster recovery: effectively reduce the management cost of customers.

3. Flexible online expansion This scheme has flexible online expansion function to fully protect the existing investment of customers.

4. "Zero" data is lost.

5. Zero to minute RTO storage array failure.

6. If a single computing node of the cloud platform fails, the virtual machine and the application will automatically switch to another site, and the RTO is minutes.

7. Remote hot migration of virtual machines This scheme supports seamless hot migration of virtual machines between production centers and disaster recovery centers, which can ensure the continuous operation of business systems during the hot migration of virtual machines, facilitate the flexible deployment of workloads between two data centers, and provide the utilization rate of system resources.