Site icon UnixArena

How to Rebuild Hardware RAID from OS level ?

How to rebuild the hardware RAID from the operating system ? Is it possible ? Yes .  Using the MegaCLI or LSIutil , we can re-build the hardware RAID for Non-OS disks from the OS.  In my case , We have lost the one of the HDD which was part of RAID 0 . It contains the non-critical data and we have planned to restore the data from backup after replacing the faulty drive.  RAID 0 will not rebuild automatically , since array fails for single disk failure. You need to construct the RAID 0 from the scratch after replacing the drive.   Let’s assume such a scenario here .

1. List the available SCSI devices.

UA-RHEL-6# lsscsi
[1:0:1:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:2:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:3:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:4:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:5:0]    disk    CISCO-UA TBE2846RC        SC19  /dev/sdf
[1:0:6:0]    disk    CISCO-UA TBE2846RC        SC19  /dev/sdg
[1:0:7:0]    disk    CISCO-UA TBE2846RC        SC19  /dev/sdh
[1:1:1:0]    disk    LSILOGIC Logical Volume   3000  /dev/sdj
[1:1:3:0]    disk    LSILOGIC Logical Volume   3000  /dev/sdi
UA-RHEL-6#

 

In the OS level , we can see that /dev/sdi failed and the filesystem was showing I/O error . Form the hardware console ,we can see that one of the HDD has been failed which was part of RAID 0. We have opened the vendor case to replace the HDD. The hardware vendors suggested to rebuild the RAID 0 array which has failed due to the disk failure.

Remove the failed device from the OS device tree using “echo 1 >  /sys/block/sdi/device/delete”  .  Here /dev/sdi will be removed from the system.

2. Verify the volume status. Use the “lsiutil” to check the volume status.

UA-RHEL-6# lsiutil
LSI Logic MPT Configuration Utility, Version 1.60, July 21, 2010
1 MPT Ports found

     Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
1.  /proc/mpt/ioc1    LSI Logic SAS1086E B3     105      011e0a00     0  -->  Select the controller by selecting "1".

Select a device:  [1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions              --------------> Select the RAID actions by selecting "21".
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 21

1.  Show volumes    ------------------------------> Check the volume status by selecting "1"
2.  Show physical disks
3.  Get volume state
4.  Wait for volume resync to complete
23.  Replace physical disk
26.  Disable drive firmware update mode
27.  Enable drive firmware update mode
30.  Create volume
31.  Delete volume
32.  Change volume settings
33.  Change volume name
50.  Create hot spare
51.  Delete hot spare
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 1

1 volumes are active, 2 physical disks are active

Volume 0 is Bus 0 Target 3, Type IS (Integrated Striping)
  Volume Name:
  Volume WWID:  0ed4cb5783a5b6dg
  Volume State:  failed, enabled
  Volume Settings:  write caching disabled, auto configure
  Volume draws from Hot Spare Pools:  0
  Volume Size 418164 MB, Stripe Size 64 KB, 3 Members
  Member 0 is PhysDisk 4 (Bus 0 Target 9)
  Member 1 is PhysDisk 3 (Bus 0 Target 11)
  Member 2 is PhysDisk 0 (     -         )

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]   ------> To Quit , enter "0" .

Here we can see that ” volume 0 ” is in failed state.  Physical Disk 0 is missing and it needs to replace by the hardware vendor.

Once they replace the disk , the volume state will remain same but you will be able to see the newly inserted disk like below.

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 1

1 volumes is active, 3 physical disks are active

Volume 0 is Bus 0 Target 3, Type IS (Integrated Striping)
  Volume Name:
  Volume WWID:  0ed4cb5783a5b6dg
  Volume State:  failed, enabled
  Volume Settings:  write caching disabled, auto configure
  Volume draws from Hot Spare Pools:  0
  Volume Size 418164 MB, Stripe Size 64 KB, 3 Members
  Member 0 is PhysDisk 4 (Bus 0 Target 9)
  Member 1 is PhysDisk 3 (Bus 0 Target 11)
  Member 2 is PhysDisk 0 (Bus 0 Target 5)

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

In the above output, we can see that , system is able to see the “Target 5”  which was not available prior to the disk replacement.   At this point all the three disks are available for the RAID 0 but array is in failed state.  So let me delete the failed array.

3. Delete the failed array . From the lsiutil – > RAID Actions – >

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

1.  Show volumes
2.  Show physical disks
3.  Get volume state
4.  Wait for volume resync to complete
23.  Replace physical disk
26.  Disable drive firmware update mode
27.  Enable drive firmware update mode
30.  Create volume
31.  Delete volume     --------------------------------------> Delete the volume .
32.  Change volume settings
33.  Change volume name
50.  Create hot spare
51.  Delete hot spare
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 31

Volume 0 is Bus 0 Target 3, Type IS (Integrated Striping)

Volume:  [0-1 or RETURN to quit] 0

All data on Volume 0 will be lost!

Are you sure you want to continue?  [Yes or No, default is No] yes
Zero the first block of all volume members?  [Yes or No, default is No]

Volume 0 is being deleted

RAID ACTION returned IOCLogInfo = 00000001

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

4.  Re-create the new RAID 0 array with three physical disk.

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

1.  Show volumes
2.  Show physical disks
3.  Get volume state
4.  Wait for volume resync to complete
23.  Replace physical disk
26.  Disable drive firmware update mode
27.  Enable drive firmware update mode
30.  Create volume
31.  Delete volume
32.  Change volume settings
33.  Change volume name
50.  Create hot spare
51.  Delete hot spare
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 30

     B___T___L  Type       Vendor   Product          Rev   Disk Blocks  Disk MB
1.  0   3   0  Disk       CISCO-UA TBE2846RC        SC16    286748000   140013
2.  0   5   0  Disk       CISCO-UA TBE2846RC        SC16    286748000   140013
3.  0   6   0  Disk       CISCO-UA TBE2846RC        SC16    286748000   140013
4.  0   7   0  Disk       CISCO-UA TBE2846RC        SC16    286748000   140013
5.  0   8   0  Disk       CISCO-UA TBE2846RC        SC16    286748000   140013
6.  0  11   0  Disk       CISCO-UA TBE2846RC-89     B63K    286748000   140013

To create a volume, select 1 or more of the available targets
  select 3 to 10 targets for a mirrored volume
  select 1 to 10 targets for a striped volume

Select a target:  [1-6 or RETURN to quit] 3      -------- > Enter the first disk Number 
Select a target:  [1-6 or RETURN to quit] 4      -------- > Enter the second disk Number
Select a target:  [1-6 or RETURN to quit] 6      -------- > Enter the Third disk Number
Select a target:  [1-6 or RETURN to quit]        ---------> Just Press Enter 

3 physical disks were created

Select volume type:  [0=Mirroring, 1=Striping, default is 0] 1
Select volume size:  [1 to 418164 MB, default is 418164]
A stripe size of 64 KB will be used
Enable write caching:  [Yes or No, default is No]
Zero the first and last blocks of the volume?  [Yes or No, default is No]
Skip initial volume resync?  [Yes or No, default is No]

Volume was created

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit]

 

5. Check the volume status.

1.  Show volumes
2.  Show physical disks
3.  Get volume state
4.  Wait for volume resync to complete
23.  Replace physical disk
26.  Disable drive firmware update mode
27.  Enable drive firmware update mode
30.  Create volume
31.  Delete volume
32.  Change volume settings
33.  Change volume name
50.  Create hot spare
51.  Delete hot spare
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 1

1 volume is active, 3 physical disks are active

Volume 1 is Bus 0 Target 6, Type IS (Integrated Striping)
  Volume Name:
  Volume WWID:  08703748d1be5b0e
  Volume State:  optimal, enabled
  Volume Settings:  write caching disabled, auto configure
  Volume Size 418164 MB, Stripe Size 64 KB, 3 Members
  Member 0 is PhysDisk 0 (Bus 0 Target 9)
  Member 1 is PhysDisk 3 (Bus 0 Target 7)
  Member 2 is PhysDisk 4 (Bus 0 Target 11)

RAID actions menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 0

We have successfully re-created the RAID 0 volume from lsiutil .

6. Check the new logical RAID 0 volume using lsscsi.

UA-RHEL-6# lsscsi
[1:0:1:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:2:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:3:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:4:0]    disk    CISCO-UA TBE2846RC        SC19  -
[1:0:5:0]    disk    CISCO-UA TBE2846RC        SC19  /dev/sdi
[1:0:6:0]    disk    CISCO-UA TBE2846RC        SC19  /dev/sdg
[1:0:7:0]    disk    CISCO-UA TBE2846RC        SC19  /dev/sdh
[1:1:1:0]    disk    LSILOGIC Logical Volume   3000  /dev/sdj
[1:1:3:0]    disk    LSILOGIC Logical Volume   3000  /dev/sdf
UA-RHEL-6#

/dev/sdi has been replaced successfully by /dev/sdf.   LSILOGIC Logical Volume – Represents the RAID logical volume.  Create the new filesystem and restore the data from backup which you have lost.

Hope article is informative to you . Thank you for visiting UnixArena.

Exit mobile version