环境:AIX5.3+10.2.0.5 RAC
情境描述:RAC在关闭后重新启动时,一节点无法启动,二节点正常启动
排错过程:
1. 尝试启动节点1的crs服务
root# ./init.crs start crs
2. 监控启动过程中,crs的日志
OCSSD.log日志:
[ CSSD]2014-01-16 09:27:54.730 >USER: Copyright 2014, Oracle version 10.2.0.5.0
[ CSSD]2014-01-16 09:27:54.730 >USER: Starting CSS daemon on node nxjcdb1, number1, in cluster crs_dljc
[ clsdmt]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=nxjcdb1DBG_CSSD))
[ CSSD]2014-01-16 09:27:54.790 [1]>TRACE: clssscmain: RT queuesetting: ON
[ CSSD]2014-01-16 09:27:55.081 [1]>TRACE: clssscmain: local-only setto false
[ CSSD]2014-01-16 09:27:55.349 [1]>TRACE: clssnmReadNodeInfo: addednode 1 (nxjcdb1) to cluster
[ CSSD]2014-01-16 09:27:55.672 [1]>TRACE: clssnmReadNodeInfo: addednode 2 (nxjcdb2) to cluster
[ CSSD]2014-01-16 09:27:55.673 [1]>TRACE: clssnmInitNMInfo:Initialized with unique 1389835674
[ CSSD]2014-01-16 09:27:55.704 [1]>TRACE: clssNMInitialize:Initializing with OCR id (1516675067)
[ CSSD]2014-01-16 09:27:55.705 [1029] >TRACE: clssnm_skgxninit: HACMP clusterware detected
[ CSSD]2014-01-16 09:27:56.822 [1]>TRACE: clssnmNMInitialize:misscount set to (30)
[ CSSD]2014-01-16 09:27:56.900 [1]>TRACE: clssnmStartNM: reboottimeset to (3) sec
[ CSSD]2014-01-16 09:27:56.900 [1]>TRACE: clssnmNMInitialize: Networkheartbeat thresholds are: impending reconfig 15000 ms, reconfig start(misscount) 30000 ms
[ CSSD]2014-01-16 09:27:57.108 [1]>TRACE: clssnmDiskStateChange: statefrom 1 to 2 disk (0//dev/rlvjc_voting)
[ CSSD]2014-01-16 09:27:57.108 [1030]>TRACE: clssnmvDPT: spawned for disk0 (/dev/rlvjc_voting)
[ CSSD]2014-01-16 09:27:57.146 [1030]>TRACE: clssnmvDiskOpen: Overwrotekill block for voting disk /dev/rlvjc_voting
[ CSSD]2014-01-16 09:27:59.163 [1030]>TRACE: clssnmDiskStateChange: statefrom 2 to 4 disk (0//dev/rlvjc_voting)
[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: Internal Error Information:
Category: 1234
Operation: scls_scr_setval
Location: open
Other: cant open file
Dep: 2
[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: clssscSclsFatal: failure 8reading fatal mode
[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: ###################################
[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: clssscExit: CSSD abortingfrom thread Main
[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: ###################################
→根据报错信息,初步判定是因为节点1无法voting disk造成 OCSSD无法启动。
[ CSSD]--- DUMP GROCK STATE DB ---
[ CSSD]--- END OF GROCK STATE DUMP ---
[ CSSD]2014-01-16 09:27:59.169 [1030]>TRACE: clssnmvReadDskHeartbeat:read ALL for Joining
[ CSSD]2014-01-16 09:27:59.169 [1030]>TRACE: clssnmvReadDskHeartbeat:node(2) is down. rcfg(2) wrtcnt(126947) LATS(1038806686) Disk lastSeqNo(126947)
[ CSSD]------- Begin Dump -------
[ CSSD]
[ CSSD]
[ CSSD]
[ CSSD]
[ CSSD]
[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863c0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................
[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................
[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863e0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................
[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863f0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................
[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x110086400 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................
[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x110086410 00 0