3.定位故障原因
(1)查看操作系统日志
[oracle@rac2 ~]$ su - root
Password:
[root@rac2 ~]# tail -30f /var/log/messages
我又重新模拟了一遍,由于信息量很大,我从里面找出与网络有关的告警信息
Jul 17 20:05:25 rac2 avahi-daemon[3659]: Withdrawing address record for 192.168.2.102 on eth1.
收回eth1网卡的ip地址,导致节点1驱逐节点2,节点2自动重启
Jul 17 20:05:25 rac2 avahi-daemon[3659]: Leaving mDNS multicast group on interface eth1.IPv4 with address 192.168.2.102.
网卡eth1脱离多组播组
Jul 17 20:05:25 rac2 avahi-daemon[3659]: iface.c: interface_mdns_mcast_join() called but no local address available.
Jul 17 20:05:25 rac2 avahi-daemon[3659]: Interface eth1.IPv4 no longer relevant for mDNS.
网卡eth1不在与mDNS有关
Jul 17 20:09:54 rac2 logger: Oracle Cluster Ready Services starting up automatically.
Oracle集群自动启动
Jul 17 20:09:59 rac2 avahi-daemon[3664]: Registering new address record for fe80::20c:29ff:fe8f:f191 on eth1.
Jul 17 20:09:59 rac2 avahi-daemon[3664]: Registering new address record for 192.168.2.102 on eth1.
注册新ip地址
Jul 17 20:10:17 rac2 logger: Cluster Ready Services completed waiting on dependencies.
CRS完成等待依赖关系
从上面信息我们大体知道,是因为eth1网卡的问题导致节点2重启的,为了进一步分析问题我们还需要看一下CRS排错日志
[root@rac2 crsd]# tail -100f $ORA_CRS_HOME/log/rac2/crsd/crsd.log
Abnormal termination by CSS, ret = 8
异常终止CSS
2013-07-17 20:11:18.115: [ default][1244944]0CRS Daemon Starting
2013-07-17 20:11:18.116: [ CRSMAIN][1244944]0Checking the OCR device
2013-07-17 20:11:18.303: [ CRSMAIN][1244944]0Connecting to the CSS Daemon
重启CRS CSS进程
[root@rac2 cssd]# pwd
/u01/crs1020/log/rac2/cssd
[root@rac2 cssd]# more ocssd.log 查看cssd进程日志
[CSSD]2013-07-17 17:26:18.319 [86104976] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac2_crs))
这里可以看到rac2节点的cssd进程监听出了问题
[CSSD]2013-07-17 17:26:19.296 [75615120] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[rac1] seq[13] sync[12]
请确认两个节点的同步问题
从以上一系列信息可以分析出这是内联网通信问题,由于两个节点的信息无法同步导致信息无法共享从而引起脑裂现象
4.节点2重启自动恢复正常状态
[root@rac2 cssd]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:8F:F1:87
inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe8f:f187/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:567 errors:0 dropped:0 overruns:0 frame:0
TX packets:901 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:65402 (63.8 KiB) TX bytes:96107 (93.8 KiB)
Interrupt:185 Base address:0x14a4
eth0:1 Link encap:Ethernet HWaddr 00:0C:29:8F:F1:87
inet addr:192.168.1.202 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185 Base address:0x14a4
eth1 Link encap:Ethernet HWaddr 00:0C:29:8F:F1:91
inet addr:192.168.2.102 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe8f:f191/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:76659 errors:0 dropped:0 overruns:0 frame:0
TX packets:51882 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:61625763 (58.7 MiB) TX bytes:26779167 (25.5 MiB)
Interrupt:193 Base address:0x1824
eth2 Link encap:Ethernet HWaddr 00:0C:29:8F:F1:9B
inet addr:192.168.203.129 Bcast:192.168.203.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe8f:f19b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:409 error