Oracle 11g RAC CRS磁盘丢失后恢复(一)

2015-07-16 12:07:26 · 作者: · 浏览: 5

一、概述


二、处理过程:
? ? 在启动虚拟机一段时间后,通过命令查看,信息如下:


[grid@rac01 ~]$ crs_stat -t


CRS-0184: Cannot communicate with the CRS daemon.
?[grid@rac01 ~]$ crsctl status res -t


CRS-4535: Cannot communicate with Cluster Ready Services


CRS-4000: Command Status failed, or completed with errors.
?


? ? 查看CRS服务状态


[root@rac01 rac-cluster]# crsctl check crs


CRS-4638: Oracle High Availability Services is online


CRS-4535: Cannot communicate with Cluster Ready Services


CRS-4530: Communications failure contacting Cluster Synchronization Services daemon


CRS-4534: Cannot communicate with Event Manager
?


? 启动Cluster资源


[root@rac01 bin]#crsctl start cluster


CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'rac01'


CRS-4000: Command Start failed, or completed with errors.
?


相关相关日志,获取到如下信息,并未在其他日志中找到更有效的参考信息,如果有好的建议,请联系在下:


---alter.log


[ohasd(2017)]CRS-2807:Resource 'ora.crsd' failed to start automatically.


---ocssd.log
? ? 2015-06-12 03:07:14.722: [? ? CLSF][2402883328]Allocated CLSF context


2015-06-12 03:07:14.723: [? SKGFD][2402883328]Handle 0x16f57d0 from lib :UFS:: for disk :/dev/asm-diskb:


2015-06-12 03:07:14.723: [? ? CSSD][2402883328]clssnmlalloccx:phyname rac01


2015-06-12 03:07:14.742: [? ? CSSD][2402883328]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskb now online


2015-06-12 03:07:14.742: [? ? CSSD][2402883328]clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01


2015-06-12 03:07:14.747: [? SKGFD][2381424384]NOTE: No asm libraries found in the system


2015-06-12 03:07:14.747: [? ? CLSF][2381424384]Allocated CLSF context


2015-06-12 03:07:14.748: [? SKGFD][2381424384]Handle 0x7f4d7008e6b0 from lib :UFS:: for disk :/dev/asm-diskb:


2015-06-12 03:07:14.748: [? SKGFD][2381424384]Lib :UFS:: closing handle 0x7f4d7008e6b0 for disk :/dev/asm-diskb:


2015-06-12 03:07:15.749: [? SKGFD][2381424384]NOTE: No asm libraries found in the system
?


查看CSS信息


[grid@rac01 ~]$ crsctl query css votedisk


##? STATE? ? File Universal Id? ? ? ? ? ? ? ? File Name Disk group


--? -----? ? -----------------? ? ? ? ? ? ? ? --------- ---------


? ? 1. ONLINE? aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]
?


下面我通过ASM实例查看相关ASM磁盘信息:


SQL> select NAME , STATE FROM V$ASM_DISKGROUP;?


NAME? ? ? ? ? ? ? ? ? ? ? ? ? STATE


------------------------------ -----------


DATA? ? ? ? ? ? ? ? ? ? ? ? ? DISMOUNTED


CRS? ? ? ? ? ? ? ? ? ? ? ? ? ? DISMOUNTED
?


OK,尝试MOUNT磁盘组(后续,整理是发现奇怪问题,既然前边我们查看css信息时 磁盘是online,那么这我们却无法mount,并未尝试强制mount,有待进一步研究)


SQL> alter diskgroup crs mount;


alter diskgroup crs mount


*


ERROR at line 1:


ORA-15032: not all alterations performed


ORA-15040: diskgroup is incomplete


ORA-15042: ASM disk "1" is missing from group number "1"
?


尝试MOUNT DATA磁盘组


SQL> alter diskgroup data mount;


Diskgroup altered.


SQL> select NAME , STATE FROM V$ASM_DISKGROUP;?


NAME? ? ? ? ? ? ? ? ? ? ? ? ? STATE


------------------------------ -----------


DATA? ? ? ? ? ? ? ? ? ? ? ? ? MOUNTED


CRS? ? ? ? ? ? ? ? ? ? ? ? ? ? DISMOUNTED
?


注:现在写下当时处理问题的过程,并未过多深入研究问题,在整理文档时有了更多思考,暂且不讨论。
? 既然磁盘组DATA可以用,那么我们先将CRS等信息存储到DATA磁盘组中,之前并未手动备份过CRS等信息,只能通过自动备份信息恢复。
? 停止CRS服务,两个节点都执行


[root@rac01 rac-cluster]# crsctl stop has -f
?


? 再次启动,以NOCRS方式启动CRS,节点1执行


[root@rac01 rac-cluster]# crsctl start crs -excl -nocrs


CRS-4123: Oracle High Availability Services has been started.


CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'


CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded


CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'


CRS-2676: