If the diskgroup SYSTEMDG is not mounting (because of multiple disk failures) then following procedure can be used to recreate this diskgroup.

This procedure should be followed if you are NOT able to mount the SYSTEMDG diskgroup even with FORCE option.

Sometimes if one of the disk is missing then diskgroup will not mount and will give error regarding missing disk but if all other members are available then we can mount the diskgroup with FORCE option and later drop the missing disk. Once the disk is replaced we can add this missing disk again to diskgroup.

Fix

In this note we assume that OCR and Voting disk are on SYSTEMDG diskgroup and it is not mounting resulting in crs and css not coming up.

1. You should have a backup of the ocr file. 

    Check the following location on all the db nodes and pick the latest backup of the ocr:
   

$GRID_HOME/crsdata//backup.ocr

    
   Copy this file to the node from where you will be performing the steps given in this note.
   
2. Start the crs in exclusive mode. Please note that you will not be able to start the crs in 
    normal mode since the diskgroup (SYSTEMDG) on which ocr is existing, does not exists or is not mounted.
 

 -- For 11.2.0.1:
      # crsctl start crs -excl   
      # crsctl stop res ora.crsd -init
 
   -- For 11.2.0.2 onwards:
      # crsctl start crs -excl -nocrs

 

3. Find out all the disks which were there in the SYSTEMDG diskgroup:
      
   

# dcli -g cell_group -l root "cellcli -e list griddisk where name like \'.*SYSTEM.*\' "
    
         SYSTEMDG_CD_02_dmorlcel01       active
         SYSTEMDG_CD_03_dmorlcel01       active
         SYSTEMDG_CD_04_dmorlcel01       active
         SYSTEMDG_CD_05_dmorlcel01       active
         SYSTEMDG_CD_06_dmorlcel01       active
         SYSTEMDG_CD_07_dmorlcel01       active
         SYSTEMDG_CD_08_dmorlcel01       active
         SYSTEMDG_CD_09_dmorlcel01       active
         SYSTEMDG_CD_10_dmorlcel01       active
         SYSTEMDG_CD_11_dmorlcel01       active

  Execute above command from any of the db node where cell_group file has ip or names of all the cell nodes. And ensure that all the griddisks are usable in status. Which means the status should not be ‘not present’. 
   
4. Recreate the diskgroup using following command:
   Login to the ASM instance and execute following command:

— Find out the compatible parameter setting for the diskgroup:
Go to the asm alert.log file and check for the following entry for this disk group:
“NOTE: Instance updated compatible.asm to 11.2.0.2.0 for grp “
The same value you need to specify in the below command.

   sql> create diskgroup SYSTEMDG normal redundancy disk 'o/*/SYSTEMDG_CD*' force attribute 'compatible.rdbms'='11.2', 'compatible.asm'='11.2', 'au_size'='4M',
           'cell.smart_scan_capable'='TRUE';

   

  

If you had replaced any disk after the SYSTEMDG was dismounted then you may get error stating that this particular disk does not require FORCE option. This is because all the other disk still have the ASM header but this disk is clean and would not have any header and so would not require force option. In this case for the time being drop the griddisk from the cell:

 

 cellcli > drop griddisk force

   After this re execute the disk group creation command and this time it will not detect the disk which was newly inserted (sine we have dropped it) and will create the disk group with the remaining disks. Once you finish of with steps given in this note, you can create this griddisk and add it to ASM. (step 11 and 12)

 
5. cd
 

 # ocrconfig -restore backup00.ocr   

 

6. Start the crs demon (only needed in 11.2.0.1)
   

# crsctl start res ora.crsd -init  

7. Recreate the Voting disk:
 

 # crsctl replace votedisk +SYSTEMDG

8. Recreate the ASM spfile: (Only if the ASM spfile was on SYSTEMDG diskgroup)
   Open ASM alert log and go to the location when the ASm instance was last started successfully before the SYSTEMDG got dismounted and find out the ASM parameters.
   
 

 processes                = 1000

 large_pool_size          = 16M
 instance_type            = "asm"
 cluster_interconnects    = "192.168.10.20"
 memory_target            = 1040M
 remote_login_passwordfile= "EXCLUSIVE"
 asm_diskstring           = "o/*/*"
 asm_diskgroups           = "DATA"
 asm_diskgroups           = "RECO"
 asm_power_limit          = 1024
 diagnostic_dest          = "/u01/app/oracle"

 
There are certain parameters which are instance specific for example cluster_interconnects. So you need to open ASM alert.log from all the nodes and find out correct value of such parameters in order to prepare the pfile.

  

After collecting all the parameters, prepare the final pfile which will look like following:
 

*.processes                 = 1000
 *.large_pool_size           = 16M
 *.instance_type             = "asm"
 +ASM1.cluster_interconnects = "192.168.10.20"
 +ASM2.cluster_interconnects = "192.168.10.21"
 *.memory_target             = 1040M
 *.remote_login_passwordfile = "EXCLUSIVE"
 *.asm_diskstring            = "o/*/*"
 +ASM1.asm_diskgroups        = "DATA"
 +ASM1.asm_diskgroups        = "RECO"
 *.asm_power_limit           = 4
 *.diagnostic_dest           = "/u01/app/oracle"   

   

Now the SPFILE can be created using this PFILE:

$ sqlplus / as sysasm
 
SQL> create spfile='+SYSTEMDG' from pfile='/tmp/asmpfile.ora';

9. Stop the crs 

   

# $CRS_HOME/bin/crsctl stop crs -f

10. Start crs

   

# $GRID_HOME/bin/crsctl start crs

Execute following step 11 and 12 only if you had dropped any griddisk in step 4.   

11. In the step 4 if you had to drop any griddisk (because of the newly inserted disk) then create it once agian:

 

   cellcli> create griddisk celldisk=
   
   For ex:
   cellcli> create griddisk SYSTEMDG_CD_02_dmorlcel01 celldisk=CD_02_dmorlcel01

12. Add this griddisk to the SYSTEMDG diskgroup:

 

    login to the asm instance:
    $ sqlplus / as sysasm
    Ex: 
    sql> alter diskgroup SYSTEMDG add disk 'o/192.168.10.6/SYSTEMDG_CD_02_dmorlcel01';