Linux – Performance Issues Troubleshooting

Cloud_Devops

11 years ago

When it comes to performance issues on any operating systems ,then you need lot of patience to troubleshoot it before contacting the operating system vendors.Here we are going to see how to troubleshoot the Redhat Linux performance issues. The performance issue mainly raises due to system resource shortage. If the application is not properly configured according to the system resources ,then definitely the system will encounter these kind of issues.As a system admin,we need to figure out what kind of resource shortage we have and what are process are making the system in to such a situations.

Key System resources in Redhat Linux:
1. CPU
2. Memory
3. Swap
4. Filesystem (Disk or LUN)
5. Network

1.CPU:

CPU utilization can be monitor using various in build Linux tools. top,vmstat and “sar -u” are few of them.You can also check out here about CPU information.

To get the current CPU utilization details,

VMSTAT

[root@Global-RH ~]# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 402432  45308 449980    0    0    92    54   74  112  1  2 94  3  0
 0  0      0 402424  45308 450008    0    0     0     0   54   82  1  0 99  0  0
 0  0      0 402424  45308 450008    0    0     0     0   33   52  0  0 100  0  0
 0  0      0 402432  45316 450004    0    0     0    72   59  109  0  1 99  0  0
 0  0      0 402432  45316 450008    0    0     0     0   37   59  0  0 100  0  0

SAR

[root@Global-RH ~]# sar -u 5 5
Linux 2.6.32-279.el6.x86_64 (Global-RH)         11/25/2013      _x86_64_        (1 CPU)

10:34:11 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
10:34:16 AM     all      0.00      0.00      0.80      0.00      0.00     99.20
10:34:21 AM     all      0.40      0.00      2.00      0.00      0.00     97.60
10:34:26 AM     all      0.00      0.00      1.41      0.00      0.00     98.59
10:34:31 AM     all      0.00      0.00      0.40      0.00      0.00     99.60
10:34:36 AM     all      0.00      0.00      0.80      1.00      0.00     98.20
Average:        all      0.08      0.00      1.08      0.20      0.00     98.63
[root@Global-RH ~]#

TOP

top - 10:36:09 up  1:16,  4 users,  load average: 0.00, 0.01, 0.04
Tasks: 156 total,   1 running, 155 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.3%us,  1.5%sy,  0.0%ni, 93.9%id,  3.3%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   1250268k total,   841816k used,   408452k free,    45384k buffers
Swap:  2523128k total,        0k used,  2523128k free,   450152k cached

High CPU consuming process:

[root@Global-RH ~]# top
top - 10:42:11 up  1:22,  4 users,  load average: 0.76, 0.22, 0.08
Tasks: 156 total,   2 running, 154 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1250268k total,   889884k used,   360384k free,    80212k buffers
Swap:  2523128k total,        0k used,  2523128k free,   450164k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6010 root      20   0  109m 1244  888 R 98.1  0.1   0:32.21 find
 1094 root      20   0     0    0    0 S  1.3  0.0   0:02.40 flush-253:0
 6031 root      20   0 15028 1296  964 R  0.7  0.1   0:00.03 top
 2680 root      20   0 40336  616  364 S  0.3  0.0   0:01.20 udisks-daemon
    1 root      20   0 19348 1564 1252 S  0.0  0.1   0:02.07 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kthreadd

Using PS command,

[root@Global-RH ~]# ps -eo pcpu,args | sort -k 1 -r | head -8
%CPU COMMAND
90.5 find / -name temp_myname*
 0.2 /usr/sbin/vmtoolsd
 0.2 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
 0.1 /usr/sbin/restorecond -u
 0.0 [watchdog/0]
 0.0 [vmmemctl]
 0.0 /usr/sbin/wpa_supplicant -c /etc/wpa_supplicant/wpa_supplicant.conf -B -u -f /var/log/wpa_supplicant.log -P /var/run/wpa_supplicant.pid
[root@Global-RH ~]#

2.Memory:

Memory bottle can be easily identified using vmstat and sar command. You need to be careful to determine the free memory.Because redhat Linux will use free physical memory as cache. The cache memory will be released when its required for applications.

To get the memory information,

Using Meminfo,

[root@Global-RH ~]# cat /proc/meminfo
MemTotal:        1250268 kB
MemFree:          408724 kB
Buffers:           45416 kB
Cached:           450156 kB
SwapCached:            0 kB
Active:           319124 kB
Inactive:         333028 kB
Active(anon):     156764 kB
Inactive(anon):     3332 kB
Active(file):     162360 kB
Inactive(file):   329696 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       2523128 kB
SwapFree:        2523128 kB
Dirty:                28 kB
Writeback:             0 kB
AnonPages:        156596 kB
Mapped:            71844 kB
Shmem:              3520 kB
Slab:             124828 kB
SReclaimable:      63844 kB
SUnreclaim:        60984 kB
KernelStack:        2040 kB
PageTables:        27704 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3148260 kB
Committed_AS:     712176 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      280420 kB
VmallocChunk:   34359440948 kB
HardwareCorrupted:     0 kB
AnonHugePages:     28672 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8192 kB
DirectMap2M:     1282048 kB
[root@Global-RH ~]#

Using stat and sar commands,

[root@Global-RH ~]# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0      0 326304  80864 450164    0    0    85    52  119  126  1  6 89  3  0
 5  0      0 326272  80864 450164    0    0     0     0 1021  587  0 100  0  0  0
 5  0      0 326140  80864 450164    0    0     0     0 1017  700  0 100  0  0  0
 6  0      0 326272  80864 450164    0    0     0     0 1018  671  0 100  0  0  0
 5  0      0 326272  80864 450164    0    0     0     0 1019  658  0 100  0  0  0

Using SAR command,

[root@Global-RH ~]# sar -r 5 5
Linux 2.6.32-279.el6.x86_64 (Global-RH)         11/25/2013      _x86_64_        (1 CPU)

10:48:22 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
10:48:27 AM    326404    923864     73.89     80872    450164    728784     19.31
10:48:32 AM    326388    923880     73.89     80872    450164    728784     19.31
10:48:37 AM    326404    923864     73.89     80872    450164    728784     19.31
10:48:42 AM    327140    923128     73.83     80872    450164    727352     19.28
10:48:47 AM    327388    922880     73.81     80872    450164    727352     19.28
Average:       326745    923523     73.87     80872    450164    728211     19.30

Using top command,

[root@Global-RH ~]# top
top - 10:49:04 up  1:29,  4 users,  load average: 5.69, 2.87, 1.18
Tasks: 162 total,   7 running, 155 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1250268k total,   922616k used,   327652k free,    80880k buffers
Swap:  2523128k total,        0k used,  2523128k free,   450164k cached

Using Free command,

[root@Global-RH ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          1220        897        323          0         79        439
-/+ buffers/cache:        379        841
Swap:         2463          0       2463
[root@Global-RH ~]#

As per the above commands outputs , system has 1220MB physical memory and used memory is 897MB. Free memory is 323MB. In that used memory (897MB), 439MB is using as cache by the system.This memory will be released upon the application demands.

If you are running out of free memory and cached memory is very less ,then system is in real memory bottleneck.

High memory consuming process,
Using top,

[root@Global-RH ~]# top -M
top - 10:56:52 up  1:37,  4 users,  load average: 0.01, 0.97, 1.01
Tasks: 155 total,   1 running, 154 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  1220.965M total,  897.832M used,  323.133M free,   79.102M buffers
Swap: 2463.992M total,    0.000k used, 2463.992M free,  439.621M cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    7 root      20   0     0    0    0 S  0.3  0.0   0:04.16 events/0
 2706 root      20   0  426m  26m  18m S  0.3  2.1   0:15.30 vmtoolsd
 6426 root      20   0 15028 1284  964 R  0.3  0.1   0:00.08 top
    1 root      20   0 19348 1564 1252 S  0.0  0.1   0:02.08 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0

using ps command,

[root@Global-RH ~]# ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS |tail -10
13088 gnome-volume-control-applet
13268 /usr/bin/gnome-terminal -x /bin/sh -c cd '/root/Desktop' && exec $SHELL
13656 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=28
14444 nm-applet --sm-disable
17796 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
19260 python /usr/share/system-config-printer/applet.py
20016 /usr/sbin/restorecond -u
20628 nautilus
25744 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-0wmzZs/database -nolisten tcp vt1
26824 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
[root@Global-RH ~]#

Using ps command with command details,

[root@Global-RH ~]# /bin/ps ax -orss,%mem,cmd --sort=rss|tac|head -10
26824  2.1 /usr/lib/vmware-tools/sbin64/vmtoolsd -n vmusr --blockFd 3
25744  2.0 /usr/bin/Xorg :0 -nr -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-0wmzZs/database -nolisten tcp vt1
20628  1.6 nautilus
20016  1.6 /usr/sbin/restorecond -u
19260  1.5 python /usr/share/system-config-printer/applet.py
17796  1.4 /usr/bin/gnote --panel-applet --oaf-activate-iid=OAFIID:GnoteApplet_Factory --oaf-ior-fd=22
14444  1.1 nm-applet --sm-disable
13656  1.0 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=28
13268  1.0 /usr/bin/gnome-terminal -x /bin/sh -c cd '/root/Desktop' && exec $SHELL
13088  1.0 gnome-volume-control-applet
[root@Global-RH ~]#

3.Swap:

When the physical memory is completely used,then system will be start using the swap space.If the system is running out of swap space, you can see fork errors in the /var/log/messages file. If the system is having 2GB physical memory and configuring with 8GB swap is completely waste. If system start swapping more process to the disk, system performance will be degrade.

I can find very few commands to list the swap information.
1.swapfs

[root@Global-RH ~]# cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/dm-1                               partition       2523128 0       -1
[root@Global-RH ~]#

2.Free command.

[root@Global-RH ~]# free -t
             total       used       free     shared    buffers     cached
Mem:       1250268     920380     329888          0      81248     450584
-/+ buffers/cache:     388548     861720
Swap:      2523128          0    2523128
Total:     3773396     920380    2853016
[root@Global-RH ~]#

3.Top

[root@Global-RH ~]# top
top - 11:12:37 up  1:53,  4 users,  load average: 0.00, 0.03, 0.34
Tasks: 156 total,   1 running, 155 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1250268k total,   920388k used,   329880k free,    81272k buffers
Swap:  2523128k total,        0k used,  2523128k free,   450584k cached

How to identify the memory/swap bottle neck ?
unfortunately Linux doesn’t offer to get the scan rate(sr) to identify the high swap rate like Unix.You need to look at swap in (si) and swap out (so) rate in vmstat command output to determine that.

4. Filesystem (Disk/LUN I/O Bottle Neck)

Sometimes you can see that system may have enough free CPU and Memory resources but still see some performance issues.In these cases ,you need to look at the “iowait” field in “iostat -x” . “mpstat” also you can see the iowait. In Top command, you need to look at “wa%” field. If anything more than 10 ,then CPU’s are waiting for disk to complete the write. Most of the time SAN’s poor performance will make the iowait value to higher.

Using Mpstat,

[root@Global-RH ~]# mpstat 1 5
Linux 2.6.32-279.el6.x86_64 (Global-RH)         11/25/2013      _x86_64_        (1 CPU)

11:38:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
11:38:51 AM  all    0.00    0.00  100.00   46.67    0.00    0.00    0.00    0.00    0.00
11:38:52 AM  all    0.00    0.00  100.00   89.67    0.00    0.00    0.00    0.00    0.00
11:38:53 AM  all    0.00    0.00  100.00   94.67    0.00    0.00    0.00    0.00    0.00
11:38:54 AM  all    0.00    0.00   83.33   46.67    0.00    0.00    0.00    0.00    0.00
11:38:55 AM  all    4.35    0.00   95.65   63.67    0.00    0.00    0.00    0.00    0.00
Average:     all    1.72    0.00   94.83   99.67    0.00    0.00    0.00    0.00    0.00
[root@Global-RH ~]#

Using iostat,

[root@Global-RH ~]# iostat -x
Linux 2.6.32-279.el6.x86_64 (Global-RH)         11/25/2013      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.99    0.01    6.29    2.26    0.00   90.45

Device: rrqm/s   wrqm/s   r/s     w/s     rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
scd0     0.23     0.00    0.06    0.00     1.14     0.00    19.95     0.00    4.85   3.64   0.02
sda      0.78     8.01    3.09    2.65   116.94    85.50    35.33     0.52   90.15   5.61   93.21
dm-0     0.00     0.00    3.70   10.69   115.76    85.50    13.99     2.29  159.42   2.23   97.21
dm-1     0.00     0.00    0.04    0.00     0.32     0.00     8.00     0.00    4.09   2.38   0.01
[root@Global-RH ~]#

If the svctm is less than 10 is acceptable value for SAN environment. If the svctm value is <10 and still the utilization is more than 60% ,then you need to tune the application or need to spread the database to multiple filesystem to increase the write rate.

How to identify which process is making high i/o wait on the system ?
Use the below syntax to find the process which is making high i/o wait to the system

[root@Global-RH ~]#  for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
D  7161 cp -i -r /usr /usr_old
----
D   391 [jbd2/dm-0-8]
----
^C
---
[root@Global-RH ~]#

Now we need to get the details of the pid 7161 and 391.

[root@Global-RH ~]# cat /proc/7161/io
rchar: 145039848
wchar: 145032790
syscr: 14556
syscw: 8777
read_bytes: 87707648
write_bytes: 158609408
cancelled_write_bytes: 0
[root@Global-RH ~]# lsof -p 7161
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
cp      7161 root  cwd    DIR  253,0     4096 130821 /root
cp      7161 root  rtd    DIR  253,0     4096      2 /
cp      7161 root  txt    REG  253,0   122736    900 /bin/cp
cp      7161 root  mem    REG  253,0   156872 178706 /lib64/ld-2.12.so
cp      7161 root  mem    REG  253,0    22536 178723 /lib64/libdl-2.12.so
cp      7161 root  mem    REG  253,0  1918016 178707 /lib64/libc-2.12.so
cp      7161 root  mem    REG  253,0   145720 178710 /lib64/libpthread-2.12.so
cp      7161 root  mem    REG  253,0    47064 178711 /lib64/librt-2.12.so
cp      7161 root  mem    REG  253,0   124624 178741 /lib64/libselinux.so.1
cp      7161 root  mem    REG  253,0    33816 178838 /lib64/libacl.so.1.1.0
cp      7161 root  mem    REG  253,0    21152 150217 /lib64/libattr.so.1.1.0
cp      7161 root  mem    REG  253,0 99158576 134657 /usr/lib/locale/locale-archive
cp      7161 root    0u   CHR  136,2      0t0      5 /dev/pts/2
cp      7161 root    1u   CHR  136,2      0t0      5 /dev/pts/2
cp      7161 root    2u   CHR  136,2      0t0      5 /dev/pts/2
cp      7161 root    3r   REG  253,0 60073002 169255 /usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
cp      7161 root    4w   REG  253,0 29884416 407742 /usr_old/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/rt.jar
[root@Global-RH ~]#

In this way you can pinpoint the iowait.

5.Network Bottle Neck.

Network overloading very rare case to impact the system performance.

Look at the interface for any errors(RX-ERR,TX-ERR) using netstat command.

[root@Global-RH ~]# netstat -i
Kernel Interface table
Iface   MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0    1500   0    10697      0      0      0     6930      0      0      0 BMRU
lo     16436   0       20      0      0      0       20      0      0      0 LRU
virbr0  1500   0        0      0      0      0       26      0      0      0 BMRU
[root@Global-RH ~]#

dstat is very important tool to monitor the all the system resources.If its not already installed,install it using “yum install dtstat”

[root@Global-RH ~]# dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  1   6  89   4   0   0| 126k  107k|   0     0 |   0     0 | 114   123
  0   0 100   0   0   0|   0     0 |  60B  826B|   0     0 |  62   113
  0   0 100   0   0   0|   0     0 | 166B  506B|   0     0 |  60   101
  0   1  99   0   0   0|   0     0 |  60B  346B|   0     0 |  57    97
  0   0 100   0   0   0|   0    12k|  60B  346B|   0     0 |  53    98
  0   0 100   0   0   0|   0     0 |  60B  346B|   0     0 |  51    95
  0   0 100   0   0   0|   0     0 | 152B  346B|   0     0 |  50    99
  0   1  99   0   0   0|   0     0 | 152B  346B|   0     0 |  61   100 ^C
[root@Global-RH ~]#

You can see the nework traffic details in “net/total” fields.

Here is the complete list of option for dstat tool.

[root@Global-RH ~]# dstat -h
Usage: dstat [-afv] [options..] [delay [count]]
Versatile tool for generating system resource statistics

Dstat options:
  -c, --cpu              enable cpu stats
     -C 0,3,total           include cpu0, cpu3 and total
  -d, --disk             enable disk stats
     -D total,hda           include hda and total
  -g, --page             enable page stats
  -i, --int              enable interrupt stats
     -I 5,eth2              include int5 and interrupt used by eth2
  -l, --load             enable load stats
  -m, --mem              enable memory stats
  -n, --net              enable network stats
     -N eth1,total          include eth1 and total
  -p, --proc             enable process stats
  -r, --io               enable io stats (I/O requests completed)
  -s, --swap             enable swap stats
     -S swap1,total         include swap1 and total
  -t, --time             enable time/date output
  -T, --epoch            enable time counter (seconds since epoch)
  -y, --sys              enable system stats

  --aio                  enable aio stats
  --fs, --filesystem     enable fs stats
  --ipc                  enable ipc stats
  --lock                 enable lock stats
  --raw                  enable raw stats
  --socket               enable socket stats
  --tcp                  enable tcp stats
  --udp                  enable udp stats
  --unix                 enable unix stats
  --vm                   enable vm stats

  --plugin-name          enable plugins by plugin name (see manual)
  --list                 list all available plugins

  -a, --all              equals -cdngy (default)
  -f, --full             automatically expand -C, -D, -I, -N and -S lists
  -v, --vmstat           equals -pmgdsc -D total

  --bw, --blackonwhite   change colors for white background terminal
  --float                force float values on screen
  --integer              force integer values on screen
  --nocolor              disable colors (implies --noupdate)
  --noheaders            disable repetitive headers
  --noupdate             disable intermediate updates
  --output file          write CSV output to file

delay is the delay in seconds between each update (default: 1)
count is the number of updates to display before exiting (default: unlimited)

[root@Global-RH ~]#

Hope you got some visibility to troubleshooting the performance issues on Redhat Linux .

Thank you for visiting UnixArena.