设为首页 加入收藏

TOP

11.2.0.3RAC(VCS)节点crash以及hang的问题分析(七)
2014-11-23 23:30:24 来源: 作者: 【 】 浏览:56
Tags:11.2.0.3RAC VCS )节点 crash 以及 hang 问题 分析
r 22 21:36:03 xhdb-server4 AgentFramework[5814]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 Thread(4) Agent is calling clean for resource(cssd_oaora) because the resource became OFFLINE unexpectedly, on its own. Apr 22 21:36:03 xhdb-server4 Had[5704]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13067 (xhdb-server4) Agent is calling clean for resource(cssd_oaora) because the resource became OFFLINE unexpectedly, on its own. Apr 22 21:36:06 xhdb-server4 AgentFramework[5814]: [ID 702911 daemon.notice] VCS ERROR V-16-2-13068 Thread(4) Resource(cssd_oaora) - clean completed successfully. Apr 22 21:40:51 xhdb-server4 reboot: [ID 662345 auth.crit] rebooted by root Apr 22 21:41:14 xhdb-server4 amf: [ID 451996 kern.notice] AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'CFSMount'. Returning. Apr 22 21:41:14 xhdb-server4 amf: [ID 451996 kern.notice] AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'CFSfsckd'. Returning. Apr 23 00:31:59 xhdb-server4 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_147440-25 64-bit

我们可以看到,在17:19:16秒时vertias的VVR出现了异常。这也就是为什么我们在前面分析看到Node2在17:19:18时出现control file sequential read
等待的原因。虽然从vcs的日志什么信息:

1 2 3 4 5 6 2014/04/22 14:39:12 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/04/22 18:39:13 VCS INFO V-16-1-53504 VCS Engine Alive message!! 2014/04/22 21:36:03 VCS ERROR V-16-2-13067 (xhdb-server4) Agent is calling clean for resource(cssd_oaora) because the resource became OFFLINE unexpectedly, on its own. 2014/04/22 21:36:06 VCS INFO V-16-2-13068 (xhdb-server4) Resource(cssd_oaora) - clean completed successfully. 2014/04/22 21:36:09 VCS INFO V-16-1-10307 Resource cssd_oaora (Owner: Unspecified, Group: sg_oaora) is offline on xhdb-server4 (Not initiated by VCS) 2014/04/22 21:36:09 VCS NOTICE V-16-1-10446 Group sg_oaora is offline on system xhdb-server4

所以,最后我的感觉是根本原因是vcs的问题。虽然vertias的工程师一直解释这里的Log overflow protection没有太大的关系。

针对这个问题,欢迎大家探讨。

补充:关于ora-29770导致实例crash的问题,Oracle确实有不少的bug,但是针对这个情况,目前没有发现符合的。如下是来自Oracle MOS的搜索结果:

Bug 11890804:LMHB crashes instance with ORA-29770 after long “control file sequential read” waits
Bug 8888434: LMHB crashes the instance with LMON waiting on controlfile read
Bug 11890804: LMHB TERMINATE INSTANCE WHEN LMON WAIT CHANGE FROM CF READ AFTER 60 SEC
Bug 13467673: CSS MISSCOUNT AND ALL ASM DOWN WITH ORA-29770 BY LMHB
Bug 13390052: KJFMGCR_HBCHECKALL MESSAGES ARE CONTINUOUSLY LOGGED IN LMHB TRACE FILE.
Bug 13322797: LMHB TERMINATES THE INSTANCE DUE TO ERROR 29770
Bug 13061883: LMHB IS TERMINATING THE INSTANCE DURING SHUTDOWN IMMEDIATE
Bug 12886605: ESSC: LMHB TERMINATE INSTANCE DUE TO 29770 LMON WAIT ENQ: AM DISK OFFLINE
Bug 12757321: LMHB TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10296263: LMHB (OSPID: 15872): TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10431752: SINGLE NODE RAC: LMHB TERMINATES INSTANCE DUE TO 29770
Bug 11656856: LMHB (OSPID: 27701): TERMINATING THE INSTANCE DUE TO ERROR 29770
Bug 10411143: INSTANCE CRASHES WITH IPC SEND TIMEOUT AND LMHB TERMINATES WITH ORA-29770
Bug 11704041: DATABASE INSTANCE CRASH BY LMHB PROCESS
Bug 10412545: ORA-29770 LMHB TERMINATE INSTANCE DUE TO VARIOUS LONG CSS WAIT
Bug 10147827: INSTANCE TERMINATED BY LMHB WITH ERROR ORA-29770
Bug 10016974: ORA-29770 LMD IS HUNG FOR MORE THAN 70 SECONDS AND LMHB TERMINATE INSTANCE
Bug 9376100: LMHB TERMINATING INSTANCE DUE ERROR 29770

首页 上一页 4 5 6 7 下一页 尾页 7/7/7
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇表结构设计 下一篇ASM相关视图

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容: