Tuesday, January 12, 2016

Solaris 11: Hands-on Lab – Replacing Failed Disks in ZFS Pools (Simple/Mirrored/RaidZ)

Quick Recap about ZFS Pools
  1. Simple and Striped Pool (Equivalent to Raid-0  and Data is Non redundant)
  2. Mirrored Pool (Equivalent to Raid-1)
  3. Raidz pool (Equivalent to Single Parity Raid 5 – Can with stand up to single disk failure)
  4. Raidz-2 pool (Equivalent to Dual Parity Raid 5 – Can withstand up to two disk failures)
  5. Raidz-3 pool (Equivalent to Triple Parity Raid 5 – Can with stand up to three disk Failures)
RAIDZ Configuration Requirements and Recommendations

A RAIDZ configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.
  • Start a single-parity RAIDZ (raidz) configuration at 3 disks (2+1)
  • Start a double-parity RAIDZ (raidz2) configuration at 6 disks (4+2)
  • Start a triple-parity RAIDZ (raidz3) configuration at 9 disks (6+3)
  • (N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
  • The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups
General consideration: your goal is maximum disk space or maximum performance?
  • A RAIDZ configuration maximizes disk space and generally performs well when data is written and read in large chunks (128K or more).
  • A RAIDZ-2 configuration offers better data availability, and performs similarly to RAIDZ. RAIDZ-2 has significantly better mean time to data loss (MTTDL) than either RAIDZ or 2-way mirrors.
  • A RAIDZ-3 configuration maximizes disk space and offers excellent availability because it can withstand 3 disk failures.
  • A mirrored configuration consumes more disk space but generally performs better with small random reads. 
Disk Failure Scenario for Simple/Striped ZFS Non Redundant Pool 
Disk Configuration:
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       3. c3t5d0
          /pci@0,0/pci8086,2829@d/disk@5,0
       4. c3t6d0
          /pci@0,0/pci8086,2829@d/disk@6,0

Creating Simple ZFS Storage Pool
root@solarisbox:/dev/chassis# zpool create poolnr c3t2d0 c3t3d0
'poolnr' successfully created, but with no redundancy; failure of one device will cause loss of the pool
root@solarisbox:/dev/chassis# zpool list
NAME     SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
poolnr  3.97G  92.5K  3.97G   0%  1.00x  ONLINE  -
rpool   63.5G  5.21G  58.3G   8%  1.00x  ONLINE  -

Creating Sample Filesystem for new pool
root@solarisbox:/dev/chassis# zfs create poolnr/testfs

root@solarisbox:/downloads# zpool status poolnr
  pool: poolnr
 state: ONLINE
  scan: none requested
config:

        NAME      STATE     READ WRITE CKSUM
        poolnr    ONLINE       0     0     0
          c3t2d0  ONLINE       0     0     0
          c3t3d0  ONLINE       0     0     0

errors: No known data errors


After Manual Simulation of the Disk (c3t2d0) failure:
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       3. c3t5d0
          /pci@0,0/pci8086,2829@d/disk@5,0
       4. c3t6d0
          /pci@0,0/pci8086,2829@d/disk@6,0

root@solarisbox:~# zpool status poolnr
pool: poolnr
state: UNAVAIL
status: One or more devices are faulted in response to persistent errors.  There are insufficient replicas for the pool to
        continue functioning.
action: Destroy and re-create the pool from a backup source.  Manually marking the device repaired using 'zpool clear' may allow some data to be recovered.
  scan: none requested
config:
        NAME      STATE     READ WRITE CKSUM
        poolnr    UNAVAIL      0     0     0  insufficient replicas
          c3t2d0  FAULTED      1     0     0  too many errors
          c3t6d0  ONLINE       0     0     0


From The above Scenario it has been observed that Simple ZFS pool cannot withstand for any disk failures.


Disk Failure Scenario for Mirror Pool  

Initial Disk Configuration
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0               
   4. c3t7d0
      /pci@0,0/pci8086,2829@d/disk@7,0               
Specify disk (enter its number): Specify disk (enter its number):


Create Mirror Pool
root@solarisbox:~# zpool create mpool mirror c3t4d0 c3t7d0

root@solarisbox:~# zfs create mpool/mtestfs

               >>> Copy Some Sample data to new file system

root@solarisbox:~# df -h|grep  /mpool/mtestfs
mpool                  2.0G    32K       2.0G     1%    /mpool
mpool/mtestfs          2.0G    31K       2.0G     1%    /mpool/mtestfs

 
root@solarisbox:~# zpool status mpool
  pool: mpool
 state: ONLINE
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        mpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
errors: No known data errors

After Manually simulating the Disk Failure
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0               
Specify disk (enter its number): Specify disk (enter its number):
               <== we lost the disk c3t7d0
           

Checking pool Status  after Disk Failure
root@solarisbox:~# zpool status mpool
  pool: mpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mpool       DEGRADED     0     0     0
          mirror-0  DEGRADED     0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  UNAVAIL      0     0     0  cannot open

errors: No known data errors

After physically Replacing the Failed disk (placing new disk in same location)
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c3t7d0
          /pci@0,0/pci8086,2829@d/disk@7,0  << New Disk
>>> Label new disk with SMI Label ( A requirement to attach to ZFS pool)

root@solarisbox:~# format -L vtoc -d c3t7d0
Searching for disks…done
selecting c3t7d0
[disk formatted]
c3t7d0 is labeled with VTOC successfully.


Replace the Failed Disk Component from the ZFS pool 
root@solarisbox:~# zpool replace  mpool c3t7d0

root@solarisbox:~# zpool status -x mpool
pool 'mpool' is healthy

root@solarisbox:~# zpool status  mpool
  pool: mpool
 state: ONLINE
  scan: resilvered 210M in 0h0m with 0 errors on Sun Sep 16 10:41:21 2012
config:

        NAME        STATE     READ WRITE CKSUM
        mpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0    <<< Disk Online

errors: No known data errors

root@solarisbox:~#


Single and Double Disk Failure Scenarios for ZFS Raid-Z Pool 

Disk Configuration Available for new Raid-Z pool Creation
root@solarisbox:~# echo|format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c3t7d0
          /pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):

Creating New RaidZ Pool 
root@solarisbox:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t2d0s0 is part of exported or potentially active ZFS pool poolnr. Please see zpool(1M).
        ==> Here we had an issue with one of the disk we selected for the pool, and the reason is the disk already used by some zpool earlier. But now the old zpool no longer available, and we want to reuse the disk for the new zpool.
        ==> We can solve the problem by two ways  
1. Use -f option to override the configuration
2. Reinitialize the partition table for the disk (Solaris X86 only).
        ==> In this example I have reinitialized the whole disk as solaris partition with below command

root@solarisbox:~# fdisk -B /dev/rdsk/c3t3d0p0

root@solarisbox:~# zpool create rzpool raidz c3t2d0 c3t3d0 c3t4d0 c3t7d0

root@solarisbox:~# zpool status rzpool
  pool: rzpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rzpool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

errors: No known data errors


Create File system and Copy some test data to rzpool/r5testfs
root@solarisbox:~# zfs create rzpool/r5testfs
root@solarisbox:/downloads# df -h|grep test
rzpool/r5testfs        5.8G   575M       5.3G    10%    /rzpool/r5testfs

root@solarisbox:/downloads# cd /rzpool/r5testfs/

root@solarisbox:/rzpool/r5testfs# ls -l
total 1176598
-rw-r–r–   1 root     root     602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip

root@solarisbox:/rzpool/r5testfs#


After Manual Simulation of the Disk failure ( i.e. c3t7d0) 
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0             <<== c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):


Checking the zpool Status – it is in Degraded State 
root@solarisbox:~# zpool status -x rzpool
  pool: rzpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  UNAVAIL      0     0     0  cannot open
errors: No known data errors


Checking if the File system is Still Accessible
root@solarisbox:~# df -h |grep testfs
rzpool/r5testfs        5.8G   575M       5.3G    10%    /rzpool/r5testfs
root@solarisbox:~# cd /rzpool/r5testfs
root@solarisbox:/rzpool/r5testfs# ls -l
total 1176598
-rw-r–r–   1 root     root     602057762 Sep 16 11:09 OLE6-U2-VM-Template.zip

root@solarisbox:/rzpool/r5testfs#


After replacing the failed disk with new disk, in the same location
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0
       3. c3t4d0
          /pci@0,0/pci8086,2829@d/disk@4,0
       4. c3t7d0
          /pci@0,0/pci8086,2829@d/disk@7,0
Specify disk (enter its number): Specify disk (enter its number):
root@solarisbox:~# zpool status -x
  pool: rzpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  FAULTED      0     0     0  corrupted data     <<== this State Changed to Faulted just because the zpool could see the new disk but with no/corrupted data
errors: No known data errors


Replacing the Failed Disk Component in the Zpool
root@solarisbox:~# zpool replace rzpool c3t7d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t7d0s0 is part of exported or potentially active ZFS pool mpool. Please see zpool(1M).

root@solarisbox:~# zpool replace -f rzpool c3t7d0   <<== using -f option to override above message

root@solarisbox:~# zpool status -x
all pools are healthy

root@solarisbox:~# zpool status rzpool
  pool: rzpool
 state: ONLINE
  scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0
errors: No known data errors


Two Disk Failures Scenario for  RaidZ pool  – And it Fails   

Zpool Status Before Disk Failure
root@solarisbox:~# zpool status rzpool
  pool: rzpool
 state: ONLINE
  scan: resilvered 192M in 0h1m with 0 errors on Sun Sep 16 11:50:49 2012
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

Disk Configuration After Simulating double disk failure
root@solarisbox:~# echo|format
Searching for disks…done
AVAILABLE DISK SELECTIONS:
       0. c3t0d0
          /pci@0,0/pci8086,2829@d/disk@0,0
       1. c3t2d0
          /pci@0,0/pci8086,2829@d/disk@2,0
       2. c3t3d0
          /pci@0,0/pci8086,2829@d/disk@3,0   <== C3t4d0 & c3t7d0 missing
Specify disk (enter its number): Specify disk (enter its number):

Zpool Status after the Double Disk Failure 
root@solarisbox:~# zpool status -x
  pool: rzpool
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        rzpool      UNAVAIL      0     0     0  insufficient replicas
          raidz1-0  UNAVAIL      0     0     0  insufficient replicas
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  UNAVAIL      0     0     0  cannot open
            c3t7d0  UNAVAIL      0     0     0  cannot open

 Conclusion:  /rzpool/r5testfs filesystem not available for usage and the Zpool cannot be recovered from the current status

Solaris 11.2 – Security Compliance

With version 11.1 Oracle added OpenSCAP to its Solaris IPS repository.
OpenSCAP is using NIST standards to verify the compliance of a system. Weather it is about installed packages or certain system configurations. This sounds really great but it is not as easy to handle. There are a few tools out there to handle the different data exchange formats and help you create your own checks. Which means you will end up with a handful of tools to manage the compliance topic. Still better than nothing though or doing it all by hand.
The Solaris engineering though seemed to feel with the users and used their Python expertise to simplify users’ experience. With Solaris 11.2 there are only a few things to know to get started.
OpenScap is still installed but the user doesn’t need to use its complex command structure. With Solaris 11.2 it is all about compliance! And that’s the command too. Easy, right!
Let’s start with the compliance command.
# compliance
No command specified
Usage:
        compliance list [-v] [-p]
        compliance list -b [-v] [-p] [benchmark ...]
        compliance list -a [-v] [assessment ...]
        compliance guide [-p profile] [-b benchmark] [-o file]
        compliance guide -a
        compliance assess [-p profile] [-b benchmark] [-a assessment]
        compliance report [-f format] [-s what] [-a assessment] [-o file]
        compliance delete assessment
As you can see this will be almost trivial to use. The command speaks for itself. List will show you information about benchmarks, profiles and assessments. Guide is great for people who like to read about a feature before using it ;). Assess will get you really going and by default outputs everything on stdout. Report lets you generate reports in three different formats (log, xccdf, and html).
After you installed compliance
# pkg install compliance
you are ready to run compliance checks. And as I said before it is simple without any additional configuration needed.
# compliance assess
Assessment will be named 'solaris.Baseline.2015-02-02,11:14'
        Package integrity is verified
        OSC-54005
...
        Check all default audit properties
        OSC-02000
        pass
Done. Actually if you just want to get started with compliance and get a hang of it this would be all you need. What this does is to use the default benchmark and its default profile.
In this case it is solaris – Baseline. Instead of just using assess you could also say compliance assess -b solaris -p Baseline but no need for the all the extra typing unless you want to use a different benchmark or/and profile.
#  compliance list -p
Benchmarks:
pci-dss:        Solaris_PCI-DSS
solaris:        Baseline, Recommended
Assessments:
        solaris.Baseline.2014-12-22,20:52
As you can see above -p will not only list the available assessment(s) and benchmarks but also its profile(s).
The following will run the pci-dss benchmark.
# compliance assess -b pci-dss
Let’s check out the report command. As I have mentioned it earlier in this post compliance in Solaris 11.2 is all about giving the user the opportunity to take care of compliance in a simple administrative way.
So this is how you generate a html report:
# compliance report /var/share/compliance/assessments/solaris.Baseline.2015-02-02,11:14/report.html
 

The header includes a handful of information like the hostname, date, profile, etc.. The score indicates how many of the run tests failed or passed. For more details just look at the Rule Results Summary. As you can see out of 200 rules/tests/checks 125 passed, 18 failed, and 57 where not selected. If a rule fails just click on the link and more information will be provided.

For example, the following command creates an assessment using the Recommended profile.
# compliance -p Recommended -a recommended
The command creates a directory in /var/share/compliance/assessments named recommended that contains the assessment in three files: a log file, an XML file, and an HTML file.
# cd /var/share/compliance/assessments/recommended
# ls
recommended.html
recommended.txt
recommended.xml
If you run this command again, the files are not replaced. You must remove the files before reusing an assessment directory.
(Optional) Create a customized report.
# compliance report -s -pass,fail,notselected
/var/share/compliance/assessments/recommended/report.-pass,fail,notselected.html
This command creates a report that contains failed and not selected items in HTML format. The report is run against the most recent assessment.
You can run customized reports repeatedly. However, you can run the full reports, that is, the assessment, only once in the original directory.
View the full report.
You can view the log file in a text editor, view the HTML file in a browser, or view the XML file in an XML viewer.
For example, to view the customized HTML report from the preceding step, type the following browser entry:
file:///var/share/compliance/assessments/recommended/report.-pass,fail,notselected.html

Fix any failures that your security policy requires to pass.
  1. Complete the fix for the entry that failed.
  2. If the fix includes rebooting the system, reboot the system before running the assessment again.
(Optional) Run the compliance command as a cron job.
# cron -e
For daily compliance assessments at 2:30 a.m., root adds the following entry:
30 2 * * * /usr/bin/compliance assess -b solaris -p Baseline
For weekly compliance assessments at 1:15 a.m. Sundays, root adds the following entry:
15 1 * * 0 /usr/bin/compliance assess -b solaris -p Recommended
For monthly assessments on the first of the month at 4:00 a.m., root adds the following entry:
0 4 1 * * /usr/bin/compliance assess -b pci-dss
For assessments on the first Monday of the month at 3:45 a.m., root adds the following entry:
45 3 1,2,3,4,5,6,7 * 1 /usr/bin/compliance assess


(Optional) Create a guide for some or all of the benchmarks that are installed on your system.
# compliance guide -a
A guide contains the rationale for each security check and the steps to fix a failed check. Guides can be useful for training and as guidelines for future testing. By default, guides for each security profile are created at installation. If you add or change a benchmark, you might create a new guide.