RHEL 7 – Pacemaker – Define the Resource Behaviour – Part 9

Cloud_Devops

9 years ago

In Pacemaker/Corosync cluster, there are may aspects/key elements that we need to understand before playing with cluster operations. Otherwise , it might cause unnecessary outage/downtime for the services. The most important elements are setting the preferred resource location , ordering (defining dependencies) , resource fail counts, resource-stickiness, colocation, clone, master/slave, promote/demote etc. Let’s go through the article and understand how these elements contributes in cluster operation.

Cluster Status:

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 19:44:40 2015          Last change: Sun Jan 10 14:18:25 2016 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

Preferred Resource Location:

Pacaemaker/corosync allows resource to choose the preferred location. You can defined the preferred location using “pcs constraint” command. Here we are just intimating “UAKVM2” resource’s preferred node as UA-HA with a score of 50. The score here indicates how badly we would like the resource to run somewhere.

[root@UA-HA ~]# pcs constraint location UAKVM2 prefers UA-HA=50
[root@UA-HA ~]# pcs constraint
Location Constraints:
  Resource: UAKVM2
    Enabled on: UA-HA (score:50)
Ordering Constraints:
Colocation Constraints:
[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA
[root@UA-HA ~]#

If you didn’t specify any score, then resource prefer to run all the time on UA-HA. The default score is INFINITY.

[box type=”info” align=”” class=”” width=””]
From PCS Man Page,

 location <resource id> prefers <node[=score]>...
        Create a location constraint on a resource to prefer the specified
        node and score (default score: INFINITY)

    location <resource id> avoids <node[=score]>...
        Create a location constraint on a resource to avoid the specified
        node and score (default score: INFINITY)

[/box]

Using location constraint, you can also avoid specific node to run a particular resource.

[root@UA-HA ~]# pcs constraint location UAKVM2 avoids UA-HA2=50
[root@UA-HA ~]# pcs constraint
Location Constraints:
  Resource: UAKVM2
    Enabled on: UA-HA (score:50)
    Disabled on: UA-HA2 (score:-50)
Ordering Constraints:
Colocation Constraints:
[root@UA-HA ~]#

At any time , you can remove the location constraint using the constraint ID.

To get the constraint id, use “–full” option.

[root@UA-HA ~]# pcs constraint --full
Location Constraints:
  Resource: UAKVM2
    Enabled on: UA-HA (score:50) (id:location-UAKVM2-UA-HA-50)
    Disabled on: UA-HA2 (score:-50) (id:location-UAKVM2-UA-HA2--50)
Ordering Constraints:
Colocation Constraints:
[root@UA-HA ~]#

Remove the constraint which we have created.

[root@UA-HA ~]# pcs constraint location remove location-UAKVM2-UA-HA-50
[root@UA-HA ~]# pcs constraint location remove location-UAKVM2-UA-HA2--50
[root@UA-HA ~]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
[root@UA-HA ~]#

[box type=”info” align=”” class=”” width=””]When defining constraints, you also need to deal with scores. Scores of all kinds are integral to how the cluster works. Practically everything from migrating a resource to deciding which resource to stop in a degraded cluster is achieved by manipulating scores in some way. Scores are calculated on a per-resource basis and any node with a negative score for a resource cannot run that resource. After calculating the scores for a resource, the cluster then chooses the node with the highest score. INFINITY is currently deﬁned as 1,000,000. Additions or subtractions with it stick to the following three basic rules:

Any value + INFINITY = INFINITY

Any value – INFINITY = -INFINITY

INFINITY – INFINITY = -INFINITY

When defining resource constraints, you specify a score for each constraint. The score indicates the value you are assigning to this resource constraint. Constraints with higher scores are applied before those with lower scores. By creating additional location constraints with different scores for a given resource, you can specify an order for the nodes that a resource will fail over to.[/box]

Resource Ordering: (Defining Resource dependencies):

You need to define the resource ordering if you are not using the resource groups. Most of the cases , resources are need to start in sequential. For an example, File-system resource can’t be started prior to volume group resource. Similar to that IP resource should be on-line before starting the Apache resource.

Let’s assume that we do not have the resource group and following resources are configured in cluster

[root@UA-HA ~]# pcs resource
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

At this point , no constraint has been configured.

[root@UA-HA ~]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
[root@UA-HA ~]#

Plan for the Resource order:

Volume Group (LVM) – vgres
Filesystem – webvolfs (To store the website data)
IP address – Cluster IP (To access the website )
Apache – webres (To provide the web services)

To achieve the above resource order, use the following set of commands.

[root@UA-HA ~]# pcs constraint order vgres then webvolfs
Adding vgres webvolfs (kind: Mandatory) (Options: first-action=start then-action=start)
[root@UA-HA ~]# pcs constraint order webvolfs then ClusterIP
Adding webvolfs ClusterIP (kind: Mandatory) (Options: first-action=start then-action=start)
[root@UA-HA ~]# pcs constraint order ClusterIP then webres
Adding ClusterIP webres (kind: Mandatory) (Options: first-action=start then-action=start)
[root@UA-HA ~]# pcs constraint
Location Constraints:
Ordering Constraints:
  start vgres then start webvolfs (kind:Mandatory)
  start webvolfs then start ClusterIP (kind:Mandatory)
  start ClusterIP then start webres (kind:Mandatory)
Colocation Constraints:
[root@UA-HA ~]#

We have successfully configured the resource dependencies.

To remove the resource dependencies, use the following set of commands.
1. List the constraints with id.

[root@UA-HA ~]# pcs constraint --full
Location Constraints:
Ordering Constraints:
  start vgres then start webvolfs (kind:Mandatory) (id:order-vgres-webvolfs-mandatory)
  start webvolfs then start ClusterIP (kind:Mandatory) (id:order-webvolfs-ClusterIP-mandatory)
  start ClusterIP then start webres (kind:Mandatory) (id:order-ClusterIP-webres-mandatory)
Colocation Constraints:
[root@UA-HA ~]#

2. Remove the order constraint using following command.

[root@UA-HA ~]# pcs constraint order remove vgres order-vgres-webvolfs-mandatory
[root@UA-HA ~]# pcs constraint order remove webvolfs order-webvolfs-ClusterIP-mandatory
[root@UA-HA ~]# pcs constraint order remove ClusterIP order-ClusterIP-webres-mandatory
[root@UA-HA ~]#  pcs constraint --full
Location Constraints:
Ordering Constraints:
Colocation Constraints:
[root@UA-HA ~]#

[box type=”note” align=”” class=”” width=””]You need to configure the resource order constraint when you do not have the resource group. If you have resource group, it does the resource ordering and reduces the manual work. [/box]

Resource fail counts & Migration Threshold:

Migration thresholds defines that how many times failed resource should try to start on the running node. For an example, if you define migration-threshold=2 for a resource , it will automatically migrate to a new node after 2 failures.

To set the migration threshold, use the following command.

[root@UA-HA ~]# pcs resource update  UAKVM2_res meta migration-threshold="4"
[root@UA-HA ~]# pcs resource show UAKVM2_res
 Resource: UAKVM2_res (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: hypervisor=qemu:///system config=/kvmpool/qemu_config/UAKVM2.xml migration_transport=ssh
  Meta Attrs: allow-migrate=true priority=100 migration-threshold=4
  Operations: start interval=0s timeout=120s (UAKVM2_res-start-interval-0s)
              stop interval=0s timeout=120s (UAKVM2_res-stop-interval-0s)
              monitor interval=10 timeout=30 (UAKVM2_res-monitor-interval-10)
              migrate_from interval=0 timeout=120s (UAKVM2_res-migrate_from-interval-0)
              migrate_to interval=0 timeout=120 (UAKVM2_res-migrate_to-interval-0)
[root@UA-HA ~]#

Resource fail count will come in to play, when it reaches the configured migration-threshold value. If this resource failed on running node , it will tried to start the resource on same node for 4 times. If it still fails, then it will move the resource to the next available node based configured constraint.

To see the fail counts , use the one of the following command.

[root@UA-HA ~]# pcs resource failcount show UAKVM2_res
Failcounts for UAKVM2_res
 UA-HA: 1
[root@UA-HA ~]# crm_failcount -r UAKVM2_res
scope=status  name=fail-count-UAKVM2_res value=1
[root@UA-HA ~]#

Do you would like to reset the fail-counts manually ?

[root@UA-HA ~]# pcs resource cleanup UAKVM2_res
Waiting for 2 replies from the CRMd.. OK
Cleaning up UAKVM2_res on UA-HA, removing fail-count-UAKVM2_res
Cleaning up UAKVM2_res on UA-HA2, removing fail-count-UAKVM2_res

[root@UA-HA ~]#

[root@UA-HA ~]# pcs resource failcount reset UAKVM2_res UA

Check the resource fail-count again.

[root@UA-HA ~]# crm_failcount  --r UAKVM2_res
scope=status  name=fail-count-UAKVM2_res value=0
[root@UA-HA ~]#
[root@UA-HA ~]# pcs resource failcount show UAKVM2_res
No failcounts for UAKVM2_res
[root@UA-HA ~]#

Resource-Stickiness:

In some circumstances, it is highly desirable to prevent healthy resources from being moved around the cluster. Moving resources almost always requires a period of downtime. For complex services like Oracle databases, this period can be quite long. To address this, Pacemaker has the concept of resource stickiness which controls how much a service prefers to stay running where it is.

1. Check the Resource status:

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

2. Let’s stop the cluster services on UA-HA.

[root@UA-HA ~]# pcs cluster stop
Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...
[root@UA-HA ~]#

3. UAKVM2 resource group should be moved to UA-HA2 automatically.

[root@UA-HA ~]# ssh UA-HA2 pcs status
Cluster name: UABLR
Last updated: Mon Jan 11 05:30:25 2016          Last change: Mon Jan 11 05:29:44 2016 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA2 ]
OFFLINE: [ UA-HA ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

4. Start the cluster service on UA-HA and see what happens to UAKVM2.

[root@UA-HA ~]# pcs cluster start
Starting Cluster...
[root@UA-HA ~]# ssh UA-HA2 pcs status
Cluster name: UABLR
Last updated: Mon Jan 11 05:30:39 2016          Last change: Mon Jan 11 05:29:44 2016 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

UAKVM2 is moved back to UA-HA node but this will create some sort downtime. If you have configured the resource-stickness, you could prevent the resource moving from one node to another.

[root@UA-HA ~]# pcs resource defaults resource-stickiness=100
[root@UA-HA ~]# pcs resource defaults
resource-stickiness: 100
[root@UA-HA ~]#

Perform Step 2 to Step 4 & see the difference . UAKVM2 should be running on UA-HA2.

colocation:

When the location of one resource depends on the location of another one, we call this colocation. I would say colocaiton and resource order required when you are not using resource group.

Assuming that you have configured the volume group resource and File-system resource. You have also configured the resource order that which one to start first. But cluster might try to start volume group resource on one node and filesystem resource on other node. In such a cases, we need to tell to the cluster that run the filesystem resource where you are running the volume group resource.

Let’s see that how we can configure the colocation between vgres (LVM VG) & webvolfs (filesystem).

[root@UA-HA ~]# pcs constraint colocation add vgres with webvolfs INFINITY
[root@UA-HA ~]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
  vgres with webvolfs (score:INFINITY)
[root@UA-HA ~]#

We have successfully configured the colocation parameter between vgres & webvolfs. In this case, webvolfs resource follows the vgres.

We will see clone,Master/slave and Promote/Demote stuffs on up coming articles.

Hope this article is informative to you. Share it ! Comment it !! Be Sociable !!!