As you know that Solaris zones are not completely isolated from Solaris global zone.All of the local zones will be considered as global zone’s instances and it all depends on global zone’s kernel.For an example,you can see all the local zone process from global zone using “ps -efZ” command and that shows that zones are not completely isolated from global.Sometimes this mechanism makes Unix admin job more harder.In recent times i have seen some of the local zones are not halting and it’s going to temporary state “shutting_down” permanently .
Here is my local zone status after issuing reboot command to the local zone.The zone “shuting_down” state indicates that the zone is being halted.
bash-3.00# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / native shared 3 sol1 shutting_down /export/zone/sollz1 native shared
Sometimes it may went to down status as well.
bash-3.00# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / native shared 2 sol1 down /export/zone/sollz1 native shared
Finally we end-up with global zone reboot to fix the issue.I have tried the below things to bring down local zone and didn’t work for me.
Find out the local zone’s zoneadm process & tried to kill it.
bash-3.00# ps -ef |grep zoneadmd |grep sol1 root 4763 4762 0 14:41:15 ? 0:00 zoneadmd -z sol1 root 4783 29263 0 14:42:20 pts/4 0:00 grep zoneadmd root 4762 4761 0 14:41:15 pts/4 0:00 zoneadmd -z sol1 bash-3.00# kill -9 4763 bash-3.00# kill -9 4762 bash-3.00# zoneadm -z sol1 halt
We had raised oracle support case for the same and they said that its know issue.
To find the root cause ,they have requested to collect the below information.
bash-3.00# zoneadm list -cv |grep -i down ID NAME STATUS PATH BRAND IP 0 global running / native shared 3 sol1 shutting_down /export/zone/sollz1 native shared bash-3.00# pgrep -fl sol1 320 zpool-sol1pool 4761 zoneadm -z sol1 boot 13675 zlogin sol1 halt bash-3.00# pstack 13675 13675: zlogin sol1 halt fee2b075 read (0, 80442c0, 400) 08052de8 ???????? (4, 5) 08053017 ???????? (4, 5, 6, 8, 1) 08053f84 ???????? (80476db, 8046d50, 1, 8166d60, 8176f50) 080549cf main (3, 80475a8, 80475b8) + 740 080520da ???????? (3, 80476d4, 80476db, 80476e0, 0, 80476e5) bash-3.00# ptree 13675 5021 /usr/bin/gnome-terminal 5024 sh 5025 bash 13675 zlogin sol1 halt 13676 bash-3.00# pfiles 13675 13675: zlogin sol1 halt Current rlimit: 256 file descriptors 0: S_IFCHR mode:0620 dev:296,0 ino:12582922 uid:0 gid:7 rdev:24,3 O_RDWR /devices/pseudo/pts@0:3 1: S_IFCHR mode:0620 dev:296,0 ino:12582922 uid:0 gid:7 rdev:24,3 O_RDWR /devices/pseudo/pts@0:3 2: S_IFCHR mode:0620 dev:296,0 ino:12582922 uid:0 gid:7 rdev:24,3 O_RDWR /devices/pseudo/pts@0:3 4: S_IFIFO mode:0000 dev:294,0 ino:12236 uid:0 gid:0 size:0 O_RDWR 5: S_IFIFO mode:0000 dev:294,0 ino:12236 uid:0 gid:0 size:0 O_RDWR 6: S_IFIFO mode:0000 dev:294,0 ino:12237 uid:0 gid:0 size:0 O_RDWR 8: S_IFIFO mode:0000 dev:294,0 ino:12238 uid:0 gid:0 size:0 O_RDWR bash-3.00# gcore 13675 gcore: core.13675 dumped
Provide all the above mentioned command output and core files to oracle support to find the root cause for this issue.
2.In some cases ,none of the commands will work.In this situation,its better to reboot the global zone using “reboot -d” to generate the crash dump. After rebooting the global zone ,you can upload the crashdump to oracle to find root cause.
The zone can become stuck in one of these states if it is unable to tear down the application environment state (such as mounted file systems)or if some portion of the virtual platform cannot be destroyed.In Such cases require operator intervention.In most of the times you need to end up with global zone reboot to fix this kind of issues.
The conclusion is that there is no way to force the zone halt if its stuck in one of the above mentioned issues.You need to reboot the global zone to come out from that.
Thank you for reading this article.Please leave a comment if you have any doubt.