【RAC】cssfatal缺少导致的节点1无法启动(一)

2014-11-24 16:56:09 · 作者: · 浏览: 2

环境:AIX5.3+10.2.0.5 RAC

情境描述:RAC在关闭后重新启动时,一节点无法启动,二节点正常启动

排错过程:

1. 尝试启动节点1的crs服务

root# ./init.crs start crs

2. 监控启动过程中,crs的日志

OCSSD.log日志:

[ CSSD]2014-01-16 09:27:54.730 >USER: Copyright 2014, Oracle version 10.2.0.5.0

[ CSSD]2014-01-16 09:27:54.730 >USER: Starting CSS daemon on node nxjcdb1, number1, in cluster crs_dljc

[ clsdmt]Listening to(ADDRESS=(PROTOCOL=ipc)(KEY=nxjcdb1DBG_CSSD))

[ CSSD]2014-01-16 09:27:54.790 [1]>TRACE: clssscmain: RT queuesetting: ON

[ CSSD]2014-01-16 09:27:55.081 [1]>TRACE: clssscmain: local-only setto false

[ CSSD]2014-01-16 09:27:55.349 [1]>TRACE: clssnmReadNodeInfo: addednode 1 (nxjcdb1) to cluster

[ CSSD]2014-01-16 09:27:55.672 [1]>TRACE: clssnmReadNodeInfo: addednode 2 (nxjcdb2) to cluster

[ CSSD]2014-01-16 09:27:55.673 [1]>TRACE: clssnmInitNMInfo:Initialized with unique 1389835674

[ CSSD]2014-01-16 09:27:55.704 [1]>TRACE: clssNMInitialize:Initializing with OCR id (1516675067)

[ CSSD]2014-01-16 09:27:55.705 [1029] >TRACE: clssnm_skgxninit: HACMP clusterware detected

[ CSSD]2014-01-16 09:27:56.822 [1]>TRACE: clssnmNMInitialize:misscount set to (30)

[ CSSD]2014-01-16 09:27:56.900 [1]>TRACE: clssnmStartNM: reboottimeset to (3) sec

[ CSSD]2014-01-16 09:27:56.900 [1]>TRACE: clssnmNMInitialize: Networkheartbeat thresholds are: impending reconfig 15000 ms, reconfig start(misscount) 30000 ms

[ CSSD]2014-01-16 09:27:57.108 [1]>TRACE: clssnmDiskStateChange: statefrom 1 to 2 disk (0//dev/rlvjc_voting)

[ CSSD]2014-01-16 09:27:57.108 [1030]>TRACE: clssnmvDPT: spawned for disk0 (/dev/rlvjc_voting)

[ CSSD]2014-01-16 09:27:57.146 [1030]>TRACE: clssnmvDiskOpen: Overwrotekill block for voting disk /dev/rlvjc_voting

[ CSSD]2014-01-16 09:27:59.163 [1030]>TRACE: clssnmDiskStateChange: statefrom 2 to 4 disk (0//dev/rlvjc_voting)

[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: Internal Error Information:

Category: 1234

Operation: scls_scr_setval

Location: open

Other: cant open file

Dep: 2

[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: clssscSclsFatal: failure 8reading fatal mode

[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: ###################################

[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: clssscExit: CSSD abortingfrom thread Main

[ CSSD]2014-01-16 09:27:59.164 [1]>ERROR: ###################################

根据报错信息,初步判定是因为节点1无法voting disk造成 OCSSD无法启动。

[ CSSD]--- DUMP GROCK STATE DB ---

[ CSSD]--- END OF GROCK STATE DUMP ---

[ CSSD]2014-01-16 09:27:59.169 [1030]>TRACE: clssnmvReadDskHeartbeat:read ALL for Joining

[ CSSD]2014-01-16 09:27:59.169 [1030]>TRACE: clssnmvReadDskHeartbeat:node(2) is down. rcfg(2) wrtcnt(126947) LATS(1038806686) Disk lastSeqNo(126947)

[ CSSD]------- Begin Dump -------

[ CSSD]

[ CSSD]

[ CSSD]

[ CSSD]

[ CSSD]

[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863c0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863d0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863e0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x1100863f0 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x110086400 00 00 00 00 0000 00 00 - 00 00 00 00 00 00 00 00 ................

[ CSSD]2014-01-16 09:28:00.166 [1]>TRACE: 0x110086410 00 0