设为首页 加入收藏

TOP

11.2.0.3RAC(VCS)节点crash以及hang的问题分析(四)
2014-11-23 23:30:24 来源: 作者: 【 】 浏览:57
Tags:11.2.0.3RAC VCS )节点 crash 以及 hang 问题 分析
36:31 2014

我们可以看到Node2 在Apr 22 17:26:59 2014 节点Node1的LMBH终止instance的信息了。然后在后面抛出hung的信息,
不过Oracle自动解决了hung的session。 下面我们来看下Node2上lmon进程的trace内容:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 *** 2014-04-22 17:26:59.377 Process diagnostic dump for oracle@xhdb-server4 (LMON), OS id=13752, pid: 11, proc_ser: 1, sid: 353, sess_ser: 1 ------------------------------------------------------------------------------- current sql: Current Wait Stack: 0: waiting for 'control file sequential read' file#=0x0, block#=0x23, blocks=0x1 wait_id=272969233 seq_num=24337 snap_id=1 wait times: snap=7 min 42 sec, exc=7 min 42 sec, total=7 min 42 sec ---已经等待了7分42秒 wait times: max=infinite, heur=7 min 42 sec wait counts: calls=0 os=0 in_wait=1 iflags=0x5a0 There are 1 sessions blocked by this session. Dumping one waiter: inst: 2, sid: 1092, ser: 49369 wait event: 'name-service call wait' p1: 'waittime'=0x32 p2: ''=0x0 p3: ''=0x0 row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 min_blocked_time: 0 secs, waiter_cache_ver: 6248 Wait State: fixed_waits=0 flags=0x22 boundary=0x0/-1 Session Wait History:

从lmon的trace信息我们可以看出,该进程正在等待control file sequential read,且已经等待了7分42秒。

根据trace的时间点,我们可以向前推进7分42秒,换句话讲,从17:19:18秒就开始等待了。

既然是controlfile的等待,那么我们就有必要来看下Node2节点上的ckpt进程在干什么了? 如下是ckpt进程的信息:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Redo thread mounted by this instance: 2 Oracle process number: 26 Unix process pid: 13788, image: oracle@xhdb-server4 (CKPT) *** 2014-04-22 17:26:59.882 *** SESSION ID:(833.1) 2014-04-22 17:26:59.882 *** 2014-04-22 17:26:59.882 Process diagnostic dump for oracle@xhdb-server4 (CKPT), OS id=13788, pid: 26, proc_ser: 1, sid: 833, sess_ser: 1 ------------------------------------------------------------------------------- current sql: Current Wait Stack: 0: waiting for 'control file sequential read' file#=0x0, block#=0x1, blocks=0x1 wait_id=14858985 seq_num=48092 snap_id=1 wait times: snap=7 min 40 sec, exc=7 min 40 sec, total=7 min 40 sec ----等待了7分40秒 wait times: max=infinite, heur=7 min 40 sec wait counts: calls=0 os=0 in_wait=1 iflags=0x5a0 There are 2 sessions blocked by this session. Dumping one waiter: inst: 2, sid: 291, ser: 59157 wait event: 'DFS lock handle' p1: 'type|mode'=0x43490005 p2: 'id1'=0xa p3: 'id2'=0x2 row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 min_blocked_time: 352 secs, waiter_cache_ver: 6248 Wait State: fixed_waits=0 flags=0x22 boundary=0x0/-1

我们可以看到,Node2的ckpt进程等待control file sequential read,等待了7分40秒。同时大家还可以看到,ckpt
进程阻塞了2个进程,也就是说ckpt进程有2个waiter,其中一个waiter的信息是:sid:291,ser:59157
且该waiter进程的等待事件居然是DFS lock handle,这是一个比较危险的event。 这里我们还无法确认这个waiter是什么?
同时ckpt进程为啥等待这么长的时间 ?

大家知道11g引入的hung auto resolution,那么我们就来看下Node1上的diag的信息:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 Unix process pid: 27571, image: oracle@xhdb-server3 (DIA0) *** 2014-04-22 17:22:01.536 *** SESSION ID:(961.1) 2014-04-22 17:22:01.536 *** CLIENT ID:() 2014-04-22 17:22:01.536 *** SERVICE NAME:(SYS$BACKGROUND) 2014-04-22 17:22:01.536 *** MODULE NAME:() 2014-04-22 17:22:01.536 *** ACTION NAME:() 2014-04-22 17:22:01.536 One or more possible hangs have been detected on your system. These could be genuine hangs in which no further progress will be made without intervention, or it may be very slow progress in the system d
首页 上一页 1 2 3 4 5 6 7 下一页 尾页 4/7/7
】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇表结构设计 下一篇ASM相关视图

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容: