Solaris memory management – Performance issue

Cloud_Devops

12 years ago

Memory bottlenecks are evidenced by two different things happening on the system — paging and swapping. Paging refers to pages of memory being reclaimed by the page daemon when the system starts to get low on free memory. Swapping is more extreme, and refers to entire processes being swapped out.

To determine if you are only paging, or also swapping, examine two columns in the vmstat output. The first column is the sr (scan rate) column. If the value in this column is greater than zero then the page scanner is scanning memory pages to put them back on the free list to be reused. The page scanner runs when memory falls under the value of a system parameter known as lostfree – default value is 1/64^th of physical memory – or cachefree if priority_paging is enabled default value is 1/128^th of physical memory.

You should not worry about high scan rates if you are using the file system heavily. High scan rates can be normal in many circumstances. If priority_paging is enable, the page scanner steals the pages more effectively so the file system I/O does not cause unnecessary paging of applications. priority_paging causes sr rate to be higher for its own good. Solaris 8 introduces the cyclic cache. With cyclic cache, the scanner is not used to reclaim pages during file system I/O therefore if sr is greater than 0 then it’s a indication that the system is running low in memory.

To see if the system is swapping, refer to the w column. It is the third column of the output, and refers to entire processes which are swapped out. You can determine what these processes are by running the command ‘ /usr/bin/ ps -e -o pid,rss,args ‘ and looking for a RSS of 0 (sched, pageout and fsflush processes should always have a RSS of 0).

If you have anything in the w column, the system is either low on memory right now, or have been in the past. If your system gets low on memory and processes are swapped out, it may take a long time for them to get back into memory.

This is especially true if there are daemons running infrequently, because they have to receive an event in order to try to run again. This is not necessarily bad, as long as when they need to run, they will have the memory to do so.

If, over time, you see swapping, you should probably consider adding memory to the system or devising a strategy to lower overall memory usage on the system.

Credit to https://support.oracle.com

1. Check any degraded memory on the system using below command.

# fmadm faulty

XSCF> showstatus
    CMU#1 Status:Normal;
*       MEM#11B Status:Degraded;

If you see degraded part,then raise support case with oracle to replace the unit.

2.Execute the below command to find the scan rate and system swap activity by executing the below command.

# vmstat 5 5
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m1 m1 m1 m2   in   sy   cs us sy id
 5 0 0 91618488 83691688 497 1634 263 145 143 0 0 4 4 4 81 35121 55605 42293 6 2 93
 2 0 0 60244800 52376544 448 3410 5 17 15 0 0 0 0 0  0 55068 72720 62179 15 3 81
 9 0 0 60082296 52388200 519 2080 2 39 39 0 0 0 0 0  0 54136 82583 60776 19 3 78
 2 0 0 60236480 52381312 934 3421 8 82 82 0 0 0 0 0  0 57312 81429 64225 20 4 76
 2 0 0 60245224 52382864 624 2833 0 68 68 0 0 0 0 0  0 53064 84185 58746 18 3 79

3.The below command will help you to identify which local zones are using more memory.

[root@Arena~]# prstat -s size -Z
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 21092 dbusr     17G  999M sleep    2    0   0:00:15 0.0% disp+work/1
 13224 dbusr     17G 1082M sleep   24    0   0:00:51 0.0% disp+work/1
 13162 dbusr     17G 1039M sleep   53    0   0:03:57 0.0% disp+work/1
 13199 dbusr     17G 1038M sleep   54    0   0:04:14 0.0% disp+work/1
 13174 dbusr     17G  957M sleep   54    0   0:02:10 0.0% disp+work/1
 13155 dbusr     17G 1020M sleep   38    0   0:03:16 0.0% disp+work/1
 13172 dbusr     17G 1034M sleep   49    0   0:04:10 0.0% disp+work/1
 13169 dbusr     17G  996M sleep   51    0   0:02:17 0.0% disp+work/1
 13888 dbusr     17G  986M sleep   59    0   0:00:16 0.0% disp+work/1
 10877 dbusr     17G 1027M sleep   47    0   0:00:12 0.0% disp+work/1
ZONEID   NPROC  SWAP    RSS   MEMORY   TIME        CPU    ZONE
    21    644  88G      44G     17%   357:55:56    15%   arenaz1
     1    101  186G     60G     24%   76:14:21     0.6%  arenaz2
     2    75   25G      16G     6.3%  95:16:41     0.9%  arenaz3
    11    70   15G      13G     5.1%  20:18:25     1.4%  arenaz4
     3    47   14G      7809M   3.0%  69:54:52     0.1%  arenaz5

As per above output, arenaz2 is consuming 24% of physical memory.

4.once you find the issue local zone,then login to that zone and execute the below command to determined which processing holding more memory.

 rootArenaz1 ~$ prstat -t -s size -c 1 1
 NPROC USERNAME  SWAP     RSS     MEMORY      TIME       CPU
   854    dbusr1   18G     16G     6.1%     246:40:45    0.2%
   310    mdbusr   132G    34G     13%      679:57:37    1.1%
   103    mdbusr2  20G     9613M   3.7%     53:41:06     0.1%
    61    mdbusr3  2114M   2209M   0.8%     7:20:57      0.0%
    46    adm2     11G     7839M   3.0%     59:11:41     0.2%
    63    mntg2    1470M   1368M   0.5%     12:36:33     0.0%
     3    dbusr2   2331M   2588M   1.0%     0:08:50      0.0%
    23    mdadm    1825M   1576M   0.6%     38:56:13     0.1%

As per the above command output,user mdbusr is consuming more memory.

5.In database server sometimes , semaphore will hold most of the physical memory.please see the below link for more information.

https://www.unixarena.com/2012/09/solaris-memory-leaks-due-shared-memory.html

6.If you need to monitor memory throught out the day,it better to enable SAR on the system.Please check the below link for more information about SAR.

https://www.unixarena.com/2012/07/how-to-enable-sarsystem-activity.html

Thank you for reading this article.

Please leave a comment if you have any doubt ,i will get back to you as soon as possible.