某个支付系统11.2.0.3的rac系统,其中一个节点忽然无法启动
1.尝试关闭集群重新启动集群
[root@rac2 ~]$ crsctl stop crs -f
2.尝试重新启动集群
[root@rac2 ~]$ crsctl start crs
集群启动成功,无其他报错,此时感觉asm已经起来了
[grid@rac2 ~]$ ps -ef | grep smon
root 8472 1 1 14:14 00:00:01 /u01/app/11.2/product/crs_1/bin/osysmond.bin
grid 9238 1 0 14:15 00:00:00 asm_smon_+ASM2
grid 9500 6212 0 14:16 pts/5 00:00:00 grep smon
这时试图连接ASM查看dg状态,奇怪的事情发生了,asm可以登录,但是查询asmdisk报错
[grid@rac2 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Wed May 21 15:35:05 2014
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> select status from v$asm_disk;
select status from v$asm_disk
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 14465
Session ID: 2707 Serial number: 3
查看d.bin进程,发现crsd并没有起来
[root@rac01 ~]# ps -ef | grep d.bin
root 4142 1 0 09:27 00:00:05 /u01/app/11.2.0/grid/product/db_1/bin/ohasd.bin reboot
grid 4548 1 0 09:28 00:00:00 /u01/app/11.2.0/grid/product/db_1/bin/mdnsd.bin
grid 4558 1 0 09:28 00:00:01 /u01/app/11.2.0/grid/product/db_1/bin/gpnpd.bin
grid 4568 1 0 09:28 00:00:05 /u01/app/11.2.0/grid/product/db_1/bin/gipcd.bin
root 4590 1 0 09:28 00:00:10 /u01/app/11.2.0/grid/product/db_1/bin/osysmond.bin
再次查看asm进程,还是存在相应的asm实例进程
[grid@rac2 ~]$ ps -ef | grep smon
root 17377 1 2 14:31 00:00:09 /u01/app/11.2/product/crs_1/bin/osysmond.bin
grid 21518 1 0 14:37 00:00:00 asm_smon_+ASM2
grid 21615 18834 0 14:37 pts/5 00:00:00 grep smon
--------------------------------------分割线 --------------------------------------
--------------------------------------分割线 --------------------------------------
登录asmcmd,也同样抛出异常
[grid@rac2 ~]$ asmcmd
ORA-01034: ORACLE not available
Process ID: 21716
Session ID: 2707 Serial number: 1 (DBD ERROR: OCIStmtExecute/Describe)
ocr无法进行正常check
[root@rac2 bin]# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
关闭asm,尝试手动启动asm实例,同样无法启动asm实例
[grid@rac2 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Wed May 21 14:33:37 2014
Copyright (c) 1982, 2011, Oracle. All rights reserved.
SQL> shutdown abort
ASM instance shutdown
SQL> startup
ORA-27103: internal error
Linux-x86_64 Error: 2: No such file or directory
Additional information: 1
Additional information: 25919497
Additional information: 2
通过+ASM1实例的spfile 创建新的pfile尝试启动+ASM2实例,同样无法启动+ASM2实例
SQL> startup nomount pfile='/tmp/init+asm2.ora';
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist
期初怀疑是存储的问题,于是开始检查存储状态,本套RAC使用multipath +asmlib的架构
查询multipath 状态,均为active
[root@rac2 bin]# multipath -ll
360a9800044336b327a24446172587864 dm-5 NETAPP,LUN
[size=200G][features=3 queue_if_no_path pg_init_retries 50][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=50][active]
\_ 4:0:1:0 sdau 66:224 [active][ready]
\_ 3:0:1:0 sdq 65:0 [active][ready]
\_ round-robin 0 [prio=10][enabled]
\_ 4:0:0:0 sdaf 65:240 [active][ready]
\_ 3:0:0:0 sdb 8:16 [active][ready]
360a9800044336b327a24446172587862 dm-6 NETAPP,LUN
[size=200G][features=3 queue_if_no_pat