An alternative Suggestion – undrop ASM disk

Although this document covers the steps followed at the time for this case another suggestion came in once the work was completed.

 

alter diskgroup  undrop disk ;

This seems like a sensible approach given an error received earlier in the day trying to add the erroneous disk without clearing it out first :

 

ORA-15033: disk ‘/dev/oracleasm/disks/OCR_VOTE5’ belongs to diskgroup “OCR_VOTE”

Should this have been attempted and been successful that would probably have been enough once the final checks had been carried out. I say this as ocrcheck had shown a clean bill of health for the OCR through this process.

 

Reference

http://docs.oracle.com/cd/E11882_01/rac.112/e16794/votocr.htm#CHDHBBIJ

Steps Required to restore a Voting Disk

Make a note of the current Voting disk details.

[root]# /oracle/dbadmin/scripts/multipath_l.ksh -a

RAW Device Size ASM Disk Based on Minor,Major

==========      ====           ========    ========    ===========

:

VOTE1_01        2.0G           OCR_VOTE1   /dev/dm-54  [253,54]

VOTE2_01        2.0G           OCR_VOTE2   /dev/dm-56  [253,56]

VOTE3_01        2.0G           OCR_VOTE3   /dev/dm-58  [253,58]

VOTE4_01        2.0G           OCR_VOTE4   /dev/dm-59  [253,59]

VOTE5_01 2.0G           OCR_VOTE5   /dev/dm-60  [253,60]

 

Take a Manual Backup (just in case)

As root on one node

 

cd /oracle/GRID/11203/bin

./ocrconfig -manualbackup

:

2012/06/21 15:30:34     /oracle/GRID/11203/cdata/clustername/backup_20120621_153034.ocr

./ocrconfig -showbackup

:

wyclorah011     2012/06/21 15:30:34     /oracle/GRID/11203/cdata/racsaplp1a/backup_20120621_153034.ocr

Shutdown CRS and restart on one node in exclusive mode.

[root]# pwd

/oracle/GRID/11203/bin

[root]# ./crsctl stop crs

[root]# ./crsctl stop crs

[root]# ./crsctl stop crs

CRS-4000: Command Stop failed, or completed with errors.

So I forced the issue

[root]# ./crsctl stop crs -f

This hung trying to stop the ASM instance (alert log showed this). So I killed the ASM pmon process which immediately freed up the stop crs which, in turn, completed successfully.

Then restart on one node in exclusive mode.

[root]# ./crsctl start crs -excl -nocrs

ensure that the crsd process did not start

[root]# ./crsctl stat res -init -t

——————————————————————————–

NAME           TARGET  STATE        SERVER                   STATE_DETAILS

Cluster Resources

ora.asm 1 ONLINE ONLINE node1 Started

ora.cluster_interconnect.haip 1 ONLINE ONLINE wnode1

ora.crf 1 OFFLINE OFFLINE

ora.crsd 1        OFFLINE OFFLINE

ora.cssd 1 ONLINE ONLINE node1

ora.cssdmonitor 1 ONLINE ONLINE node1

ora.ctssd 1 ONLINE ONLINE node1 OBSERVER

ora.diskmon 1 OFFLINE OFFLINE

ora.drivers.acfs 1 ONLINE ONLINE node1

ora.evmd 1 OFFLINE OFFLINE

ora.gipcd 1 ONLINE ONLINE node1

ora.gpnpd 1 ONLINE ONLINE node1

ora.mdnsd 1 ONLINE ONLINE node1

Re-create the errant OCR Disk.

We can see from this query that the disk is still a valid ASM disk and marked as a Voting disk.

oracle wyclorah010> . ./crs_env

wyclorah010[+ASM1]>sqlplus / as sysasm

SQL> select group_number, name, failgroup, path from v$asm_disk where voting_file=’Y’;

GROUP_NUMBER NAME FAILGROUP PATH

0 /dev/oracleasm/disks/OCR_VOTE5

16 OCR_VOTE_0003          OCR_VOTE_0003               /dev/oracleasm/disks/OCR_VOTE4

16 OCR_VOTE_0002          OCR_VOTE_0002               /dev/oracleasm/disks/OCR_VOTE3

16 OCR_VOTE_0001          OCR_VOTE_0001               /dev/oracleasm/disks/OCR_VOTE2

16 OCR_VOTE_0000          OCR_VOTE_0000               /dev/oracleasm/disks/OCR_VOTE1

Earlier in the day I had tried to add it back into the diskgroup and was given short shrift.

From the ASM alert log :

ORA-15033: disk ‘/dev/oracleasm/disks/OCR_VOTE5’ belongs to diskgroup “OCR_VOTE”

ERROR: ALTER DISKGROUP OCR_VOTE ADD  DISK ‘/dev/oracleasm/disks/OCR_VOTE5’ SIZE 2048M /* ASMCA */

So a deleted it followed by a scan disks on the other nodes.

[root]# oracleasm querydisk ‘/dev/oracleasm/disks/OCR_VOTE5’

Device “/dev/oracleasm/disks/OCR_VOTE5” is marked an ASM disk with the label “OCR_VOTE5”

[root]# oracleasm deletedisk OCR_VOTE5

Clearing disk header: done

Dropping disk: done

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Cleaning disk “OCR_VOTE5”

Scanning system for ASM disks…

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Cleaning disk “OCR_VOTE5”

Scanning system for ASM disks…

And then re-created the ASM disk. Good job I made a note of this earlier.

[root]# oracleasm createdisk OCR_VOTE5 /dev/mapper/VOTE5_01

Writing disk header: done

Instantiating disk: done

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Scanning system for ASM disks…

Instantiating disk “OCR_VOTE5”

[root]# oracleasm scandisks

Reloading disk partitions: done

Cleaning any stale ASM disks…

Scanning system for ASM disks…

Instantiating disk “OCR_VOTE5”

Add the disk to the diskgroup.

[root]# su – oracle

Emergency Local Admin Environment configured

oracle > . ./crs_env

[+ASM1]>sqlplus / as sysasm

SQL> ALTER DISKGROUP OCR_VOTE ADD  DISK ‘/dev/oracleasm/disks/OCR_VOTE5’ SIZE 2048M;

Diskgroup altered.

Restore the OCR

I’m not sure that I needed to do this, ocrcheck always returned a valid status when run before attempting this fix. I wish I had run another ocrcheck and crsctl query css votedisk before doing this restore.

Anyway, the restore was run as follows :

[root]# ./ocrconfig -restore /oracle/GRID/11203/cdata/clustername/day.ocr

The note I was following suggested that I should run the following on the other nodes

ocrconfig -repair –replace

but I missed this, it doesn’t seem to have mattered.

Check Voting Diskgroup and OCR Integrity.

[+ASM1]>sqlplus / as sysasm

SQL> select group_number, name, failgroup, path from v$asm_disk where voting_file=’Y’;

GROUP_NUMBER NAME FAILGROUP PATH

16 OCR_VOTE_0004                  OCR_VOTE_0004                  /dev/oracleasm/disks/OCR_VOTE5

16 OCR_VOTE_0003                  OCR_VOTE_0003                  /dev/oracleasm/disks/OCR_VOTE4

16 OCR_VOTE_0002                  OCR_VOTE_0002                  /dev/oracleasm/disks/OCR_VOTE3

16 OCR_VOTE_0001                  OCR_VOTE_0001                  /dev/oracleasm/disks/OCR_VOTE2

16 OCR_VOTE_0000                  OCR_VOTE_0000                  /dev/oracleasm/disks/OCR_VOTE1

[root]# ./ocrcheck

Status of Oracle Cluster Registry is as follows :

Version                  :          3

Total space (kbytes)     :     262120

Used space (kbytes)      :       5260

Available space (kbytes) :     256860

ID                       :  207396515

Device/File Name         :  +OCR_VOTE

Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

[root]# ./crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

1. ONLINE   16ab9ac4f2d34f69bf4537800239bef7 (/dev/oracleasm/disks/OCR_VOTE1) [OCR_VOTE]

2. ONLINE   01d692b759e94f0cbf1bd86fb62b4ccf (/dev/oracleasm/disks/OCR_VOTE2) [OCR_VOTE]

3. ONLINE   a06ebbed329c4f7bbfc496b73d506d6f (/dev/oracleasm/disks/OCR_VOTE3) [OCR_VOTE]

4. ONLINE   32b346e3daed4f75bf54fc7628d02ae2 (/dev/oracleasm/disks/OCR_VOTE4) [OCR_VOTE]

5. ONLINE   1ff50824870d4ffdbf9d9cd4fe4df1dd (/dev/oracleasm/disks/OCR_VOTE5) [OCR_VOTE]

Located 5 (yes five) voting disk(s).

Stop CRS on the on exclusive node and restart on the other three.

[root]# ./crsctl stop crs

And then restart

[root]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

Check everything comes up on all nodes.

[root]# ./crsctl stat res -init –t

On all nodes

[root]# ./crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

1. ONLINE   16ab9ac4f2d34f69bf4537800239bef7 (/dev/oracleasm/disks/OCR_VOTE1) [OCR_VOTE]

2. ONLINE   01d692b759e94f0cbf1bd86fb62b4ccf (/dev/oracleasm/disks/OCR_VOTE2) [OCR_VOTE]

3. ONLINE   a06ebbed329c4f7bbfc496b73d506d6f (/dev/oracleasm/disks/OCR_VOTE3) [OCR_VOTE]

4. ONLINE   32b346e3daed4f75bf54fc7628d02ae2 (/dev/oracleasm/disks/OCR_VOTE4) [OCR_VOTE]

5. ONLINE   1ff50824870d4ffdbf9d9cd4fe4df1dd (/dev/oracleasm/disks/OCR_VOTE5) [OCR_VOTE]

Located 5 voting disk(s).

[root]# ./ocrcheck

Status of Oracle Cluster Registry is as follows :

Version                  :          3

Total space (kbytes)     :     262120

Used space (kbytes)      :       5260

Available space (kbytes) :     256860

ID                       :  207396515

Device/File Name         :  +OCR_VOTE

Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

oracle +ASM1 > cluvfy comp ocr -n all -verbose

Verifying OCR integrity

Checking OCR integrity…

Checking the absence of a non-clustered configuration…

All nodes free of non-clustered, local-only configurations

ASM Running check passed. ASM is running on all specified nodes

Checking OCR config file “/etc/oracle/ocr.loc”…

OCR config file “/etc/oracle/ocr.loc” check successful

Disk group for ocr location “+OCR_VOTE” available on all the nodes

NOTE:

This check does not verify the integrity of the OCR contents. Execute ‘ocrcheck’ as a privileged user to verify the contents of OCR.

OCR integrity check passed

Verification of OCR integrity was successful.

oracle +ASM2 > crsstat | grep OFFL

ora.gsd                        OFFLINE, OFFLINE, OFFLINE

OK