How does Keepalived work

Keepalived software was originally designed for LVS load balancing software, which was used to manage and monitor the status of each service node in LVS cluster system. Later, VRRP function was added to realize high availability. Therefore, Keepalived can not only manage LVS software, but also be a highly available solution software for other services (such as Nginx, Haproxy, MySQL, etc.). ).

Keepalived adopts modular design, and different modules realize different functions.

Keepalived has three main modules, namely core, check and vrrp.

Core: the core of keepalived, responsible for starting and maintaining the main process, loading and parsing the global configuration file, etc.

Check:? Responsible for healthchecker, including various health check methods and corresponding configuration analysis, including LVS configuration analysis; You can check the health status of the IPVS backend server according to the script check.

VRRP: VRRPD subprocess, which is used to realize the VRRP protocol.

Keep active profile:

The Keepalived configuration file is: keepalived.conf；;

There are three main configuration areas, namely: global configuration, VRRPD configuration and LVS configuration.

The global configuration includes two sub-configurations:? Global definition? Static IP address/routing configuration (static

IP address/route)

How the VRRP keepalive service works:

Keep-active high-availability pairs communicate with each other through VRRP, which determines the primary device and the standby device through the election mechanism. Therefore, when the master node works, it will get all the resources first, and the standby node is in a waiting state. When the main node goes down, the standby node will take over the resources of the main node and then provide services to the outside world instead of the main node.

Between the service pairs that remain active, only the server that is the primary server will always send VRRP broadcast packets, telling the standby server that it is still active. At this time, the backup will not catch the Lord. When the primary service is unavailable, that is, when the backup can't listen to the broadcast packet sent by the primary service, related services will be started to take over the resources to ensure business continuity. Take over the fastest.

Causes of cerebral fissure:

The heartbeat link between high-availability server pairs has failed, resulting in abnormal communication.

Because the heartbeat line is broken (including broken and aging).

Because the network card and related drivers are broken, ip configuration and conflict problems (the network card is directly connected)

Equipment failure due to the connection between heartbeat lines (network card and switch)

There is something wrong with the arbitration machine (arbitration scheme is adopted)

Iptables firewall is turned on on the high availability server, which will prevent heartbeat message transmission.

Incorrect configuration of heartbeat network card address and other information on high availability server leads to heartbeat transmission failure.

Other reasons such as improper service configuration, such as different heartbeat modes, wide heartbeat conflicts, software bugs, etc.

How to solve the brain fissure:

①? At the same time, serial cable and Ethernet cable are used to connect, and two heartbeat lines are used at the same time, so that one line is broken and the other is still good, and heartbeat messages can be transmitted.

②? When a split brain is detected, a heartbeat node is forced to shut down (this function needs the support of special equipment, such as Stonith and feyce). It is equivalent to the standby node not receiving the heartbeat, and sending a shutdown command to turn off the power of the primary node through a separate line.

③? Do a good job in monitoring and alarming the cracked brain (such as email and SMS or on duty). When problems occur, people will intervene in arbitration at the first time to reduce losses. Administrators can reply the corresponding numbers or simple character strings to the server through the mobile phone, so that the server can automatically handle the corresponding faults according to the instructions, which will shorten the time to solve the faults.

I. Experimental environment

Operating system: CentOS7.2 Minial

###################

Server a:

eno 16777736 192. 168. 1. 104

eno 33554984 192. 168. 1. 105

##########################

Server b:

eno 16777736 192. 168. 1. 109

eno 33554984 192. 168. 1. 106

###########################

VIP 0 1: 192. 168. 1. 1 1 1

vip02: 192. 168. 1. 1 12

Second, set up a firewall.

/usr/bin/ firewall -cmd-direct

-Permanent-Add Rule ipv4 Filter Input 0- Inbound Interface eth 0- Destination $ {Multicast Address}-Protocol vrrp-j Accept

/usr/bin/ firewall -cmd-reload

Third, the software installation

On server a and server b

# rpm-IVH-force libnl 3-3 . 2 . 28-4 . el7 . x86 _ 64 . rpm

# rpm-IVH-forcelm _ sensors-libs-3 . 4 . 0-4.20 16060 1 gitf 9 185 e 5 . el7 . x86 _ 64 . rpm

# rpm-IVH-force net-SNMP-agent-libs-5 . 7 . 2-32 . el7 . x86 _ 64 . rpm

# rpm-IVH-force net-SNMP-libs-5 . 7 . 2-32 . el7 . x86 _ 64 . rpm

# rpm-IVH-force ipset-libs-6.38-3 . el7 _ 6 . x86 _ 64 . rpm

# rpm-IVH-force keepalived- 1 . 3 . 5-6 . el7 . x86 _ 64 . rpm

Fourth, configure keepalived.

If the VRRP synchronization group is not used, if the keepalived host has two network segments, and each network segment opens a VRRP instance, if the external network segment has problems, VRRPD thinks it is still healthy, so the master and standby switch each other, resulting in the failure of normal service and the failure of high-availability cluster. To solve this problem, the synchronization group can put two instances in the same synchronization group!

serverA

# vim/etc/keepalived/keepalived . conf

######################################

! Keepalived's profile

global_defs {

Development of router identification LVS

}

vrrp_sync_group VG 1 {

Group {

VI_ 1

Six _2

}

vrrp_instance VI_ 1 {

State backup

Interface eno 16777736

Virtual router id 5 1

Priority 100

nopreempt

advert_int 1

Authentication {

Verification type passed.

auth _ pass 1 1 1 1

}

track_interface {

eno 16777736

eno33554984

}

Virtual IP address {

192. 168. 1. 1 1 1

}

vrrp_instance VI_2 {

State backup

Interface eno33554984

Virtual router identification 52

Priority 100

nopreempt

advert_int 1

Authentication {

Verification type passed.

Verification passed 2222

}

track_interface {

eno 16777736

eno33554984

}

Virtual IP address {

192. 168. 1. 1 12

}

Server b

# vim/etc/keepalived/keepalived . conf

######################################

! Keepalived's profile

global_defs {

Development of router identification LVS

}

vrrp_sync_group VG 1 {

Group {

VI_ 1

Six _2

}

vrrp_instance VI_ 1 {

State backup

Interface eno 16777736

Virtual router id 5 1

Priority 90

nopreempt

advert_int 1

Authentication {

Verification type passed.

auth _ pass 1 1 1 1

}

track_interface {

eno 16777736

eno33554984

}

Virtual IP address {

192. 168. 1. 1 1 1

}

vrrp_instance VI_2 {

State backup

Interface eno33554984

Virtual router identification 52

Priority 90

nopreempt

advert_int 1

Authentication {

Verification type passed.

Verification passed 2222

}

track_interface {

eno 16777736

eno33554984

}

Virtual IP address {

192. 168. 1. 1 12

}

Verb (short for verb) test

On server a and server b

# systemclt start？ Stay alive

At serverA