11.2.0.3 Linux RAC 报错 CRS-5018:(:CLSN00037:) Removed unused HAIP route:(一)

2014-11-24 17:21:30 · 作者: · 浏览: 14

某个支付系统11.2.0.3的rac系统,其中一个节点忽然无法启动


1.尝试关闭集群重新启动集群


[root@rac2 ~]$ crsctl stop crs -f


2.尝试重新启动集群


[root@rac2 ~]$ crsctl start crs


集群启动成功,无其他报错,此时感觉asm已经起来了


[grid@rac2 ~]$ ps -ef | grep smon


root 8472 1 1 14:14 00:00:01 /u01/app/11.2/product/crs_1/bin/osysmond.bin


grid 9238 1 0 14:15 00:00:00 asm_smon_+ASM2


grid 9500 6212 0 14:16 pts/5 00:00:00 grep smon


这时试图连接ASM查看dg状态,奇怪的事情发生了,asm可以登录,但是查询asmdisk报错


[grid@rac2 ~]$ sqlplus / as sysasm


SQL*Plus: Release 11.2.0.3.0 Production on Wed May 21 15:35:05 2014


Connected to:


Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production


With the Real Application Clusters and Automatic Storage Management options



SQL> select status from v$asm_disk;


select status from v$asm_disk


*


ERROR at line 1:


ORA-01034: ORACLE not available


Process ID: 14465


Session ID: 2707 Serial number: 3



查看d.bin进程,发现crsd并没有起来


[root@rac01 ~]# ps -ef | grep d.bin


root 4142 1 0 09:27 00:00:05 /u01/app/11.2.0/grid/product/db_1/bin/ohasd.bin reboot


grid 4548 1 0 09:28 00:00:00 /u01/app/11.2.0/grid/product/db_1/bin/mdnsd.bin


grid 4558 1 0 09:28 00:00:01 /u01/app/11.2.0/grid/product/db_1/bin/gpnpd.bin


grid 4568 1 0 09:28 00:00:05 /u01/app/11.2.0/grid/product/db_1/bin/gipcd.bin


root 4590 1 0 09:28 00:00:10 /u01/app/11.2.0/grid/product/db_1/bin/osysmond.bin




再次查看asm进程,还是存在相应的asm实例进程


[grid@rac2 ~]$ ps -ef | grep smon


root 17377 1 2 14:31 00:00:09 /u01/app/11.2/product/crs_1/bin/osysmond.bin


grid 21518 1 0 14:37 00:00:00 asm_smon_+ASM2


grid 21615 18834 0 14:37 pts/5 00:00:00 grep smon


--------------------------------------分割线 --------------------------------------


--------------------------------------分割线 --------------------------------------


登录asmcmd,也同样抛出异常


[grid@rac2 ~]$ asmcmd


ORA-01034: ORACLE not available


Process ID: 21716


Session ID: 2707 Serial number: 1 (DBD ERROR: OCIStmtExecute/Describe)



ocr无法进行正常check



[root@rac2 bin]# ./ocrcheck


PROT-602: Failed to retrieve data from the cluster registry


PROC-26: Error while accessing the physical storage


ORA-15077: could not locate ASM instance serving a required diskgroup



关闭asm,尝试手动启动asm实例,同样无法启动asm实例


[grid@rac2 ~]$ sqlplus / as sysasm


SQL*Plus: Release 11.2.0.3.0 Production on Wed May 21 14:33:37 2014


Copyright (c) 1982, 2011, Oracle. All rights reserved.



SQL> shutdown abort


ASM instance shutdown


SQL> startup


ORA-27103: internal error


Linux-x86_64 Error: 2: No such file or directory


Additional information: 1


Additional information: 25919497


Additional information: 2




通过+ASM1实例的spfile 创建新的pfile尝试启动+ASM2实例,同样无法启动+ASM2实例



SQL> startup nomount pfile='/tmp/init+asm2.ora';


ORA-24324: service handle not initialized


ORA-01041: internal error. hostdef extension doesn't exist




期初怀疑是存储的问题,于是开始检查存储状态,本套RAC使用multipath +asmlib的架构



查询multipath 状态,均为active



[root@rac2 bin]# multipath -ll



360a9800044336b327a24446172587864 dm-5 NETAPP,LUN


[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]


\_ round-robin 0 [prio=50][active]


\_ 4:0:1:0 sdau 66:224 [active][ready]


\_ 3:0:1:0 sdq 65:0 [active][ready]


\_ round-robin 0 [prio=10][enabled]


\_ 4:0:0:0 sdaf 65:240 [active][ready]


\_ 3:0:0:0 sdb 8:16 [active][ready]


360a9800044336b327a24446172587862 dm-6 NETAPP,LUN


[size=200G][features=3 queue_if_no_pat