RHEL 7 – Configuring Pacemaker/Corosync – Redhat Cluster – Part 4

Cloud_Devops

9 years ago

Redhat cluster - pacemaker corosync configuration

In this article, we will see that how to configure two node Redhat cluster using pacemaker & corosync on REHL 7.2. Once you have installed the necessary packages, you need to enable the cluster services at the system start-up. You must start the necessary cluster services before kicking off the cluster configuration. “hacluster” user will be created automatically during the package installation with disabled password. Corosync will use this user to sync the cluster configuration, starting and stopping the cluster on cluster nodes.

Environment:

Operating System: Redhat Enterprise Linux 7.2
Type of Cluster : Two Node cluster – Failover
Nodes: UA-HA & UA-HA2 (Assuming that packages have been installed on both the nodes)
Cluster Resource : KVM guest (VirtualDomain) – See in Next Article.

Hardware configuration:

CPU – 2
Memory – 4GB
NFS – For shared storage

Enable & Start the Services on both the Nodes:

1.Login to both the cluster nodes as root user.

2. Enable the pcsd daemon on both the nodes to start automatically across the reboot. pcsd is pacemaker configuration daemon. (Not a cluster service)

[root@UA-HA ~]# systemctl start pcsd.service
[root@UA-HA ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@UA-HA ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:22:08 EST; 14s ago
 Main PID: 18411 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─18411 /bin/sh /usr/lib/pcsd/pcsd start
           ├─18415 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─18416 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Dec 27 23:22:07 UA-HA systemd[1]: Starting PCS GUI and remote configuration interface...
Dec 27 23:22:08 UA-HA systemd[1]: Started PCS GUI and remote configuration interface.
[root@UA-HA ~]#

3. Set the new password for cluster user “hacluster” on both the nodes.

[root@UA-HA ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA ~]#
[root@UA-HA2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA2 ~]#

Configure corosync & Create new cluster:

1. Login to any of the cluster node and authenticate “hacluster” user.

[root@UA-HA ~]# pcs cluster auth UA-HA UA-HA2
Username: hacluster
Password:
UA-HA: Authorized
UA-HA2: Authorized
[root@UA-HA ~]#

2.Create a new cluster using pcs command.

[root@UA-HA ~]# pcs cluster setup --name UABLR UA-HA UA-HA2
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
UA-HA: Succeeded
UA-HA2: Succeeded
Synchronizing pcsd certificates on nodes UA-HA, UA-HA2...
UA-HA: Success
UA-HA2: Success

Restaring pcsd on the nodes in order to reload the certificates...
UA-HA: Success
UA-HA2: Success
[root@UA-HA ~]#

3. Check the cluster status .

[root@UA-HA ~]# pcs status
Error: cluster is not currently running on this node
[root@UA-HA ~]#

You see the error because , cluster service is not started.

4. Start the cluster using pcs command. “–all” will start the cluster on all the configured nodes.

[root@UA-HA ~]# pcs cluster start --all
UA-HA2: Starting Cluster...
UA-HA: Starting Cluster...
[root@UA-HA ~]#

In the back-end , “pcs cluster start” command will trigger the following command on each cluster node.

# systemctl start corosync.service
# systemctl start pacemaker.service

5. Check the cluster services status.

[root@UA-HA ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:31 EST; 11s ago
  Process: 18994 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 19001 (corosync)
   CGroup: /system.slice/corosync.service
           └─19001 corosync

Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[1]: 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [TOTEM ] A new membership (192.168.203.131:1464) was formed. Members joined: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] This node is within the primary component and will provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[2]: 2 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA systemd[1]: Started Corosync Cluster Engine.
Dec 27 23:34:31 UA-HA corosync[18994]: Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@UA-HA ~]# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:32 EST; 15s ago
 Main PID: 19016 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─19016 /usr/sbin/pacemakerd -f
           ├─19017 /usr/libexec/pacemaker/cib
           ├─19018 /usr/libexec/pacemaker/stonithd
           ├─19019 /usr/libexec/pacemaker/lrmd
           ├─19020 /usr/libexec/pacemaker/attrd
           ├─19021 /usr/libexec/pacemaker/pengine
           └─19022 /usr/libexec/pacemaker/crmd

Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA[1] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:   notice: Watching for stonith topology changes
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: Notifications disabled
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: The local CRM is operational
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Dec 27 23:34:33 UA-HA attrd[19020]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:33 UA-HA attrd[19020]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:34 UA-HA stonith-ng[19018]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
[root@UA-HA ~]#

Verify Corosync configuration:

1. Check the corosync communication status.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
[root@UA-HA ~]#

In my setup, first RING is using interface “br0”.

[root@UA-HA ~]# ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.134  netmask 255.255.255.0  broadcast 192.168.203.255
        inet6 fe80::84ef:2eff:fee9:260a  prefixlen 64  scopeid 0x20
        ether 00:0c:29:2d:3f:ce  txqueuelen 0  (Ethernet)
        RX packets 15797  bytes 1877460 (1.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7018  bytes 847881 (828.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@UA-HA ~]#

We can have multiple RINGS to provide the redundancy for the cluster communication. (We use to call LLT links in VCS )

2. Check the membership and quorum API’s.

[root@UA-HA ~]# corosync-cmapctl  | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.203.134)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.203.131)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@UA-HA ~]#
[root@UA-HA ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

Verify Pacemaker Configuration:

1. Check the running pacemaker processes.

[root@UA-HA ~]# ps axf |grep pacemaker
19324 pts/0    S+     0:00  |       \_ grep --color=auto pacemaker
19016 ?        Ss     0:00 /usr/sbin/pacemakerd -f
19017 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
19018 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
19019 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
19020 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
19021 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
19022 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd

2. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:44:44 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@UA-HA ~]#

3. You can see that corosync & pacemaker is active now and disabled across the system reboot. If you would like to start the cluster automatically across the reboot, you can enable it using systemctl command.

[root@UA-HA2 ~]# systemctl enable corosync
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
[root@UA-HA2 ~]# systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:51:30 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

4. When the cluster starts, it automatically records the number and details of the nodes in the cluster, as well as which stack is being used and the version of Pacemaker being used. To view the cluster configuration (Cluster Information Base – CIB) in XML format, use the following command.

[root@UA-HA2 ~]# pcs cluster cib

5. Verify the cluster information base using the following command.

[root@UA-HA ~]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@UA-HA ~]#

By default pacemaker enables STONITH (Shoot The Other Node In The Head ) / Fencing in an order to protect the data. Fencing is mandatory when you use the shared storage to avoid the data corruptions.

For time being , we will disable the STONITH and configure it later.

6. Disable the STONITH (Fencing)

[root@UA-HA ~]#pcs property set stonith-enabled=false
[root@UA-HA ~]# 
[root@UA-HA ~]#  pcs property show stonith-enabled
Cluster Properties:
 stonith-enabled: false
[root@UA-HA ~]#

7. Verify the cluster configuration again. Hope the errors will be disappear

[root@UA-HA ~]# crm_verify -L -V
[root@UA-HA ~]#

We have successfully configured two node redhat cluster on RHEL 7.2 with new components pacemaker and corosync. Hope this article is informative to you.

Share it ! Comment it !! Be Sociable !!!