11.2.0.3RAC(VCS)节点crash以及hang的问题分析(六)

2014-11-23 23:30:24 · 作者: · 浏览: 89
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 *** 2014-04-22 17:24:08.363 Verified Hangs in the System Root Chain Total Hang Hang Hang Inst Root #hung #hung Hang Hang Resolution ID Type Status Num Sess Sess Sess Conf Span Action ----- ---- -------- ---- ----- ----- ----- ------ ------ ------------------- 7 HANG VICSELTD 2 801 2 3 HIGH LOCAL IGNRD:InstKillNotA Hang Ignored Reason: Since instance termination is not allowed, automatic hang resolution cannot be performed. inst# SessId Ser# OSPID PrcNm Event ----- ------ ----- --------- ----- ----- 2 996 39917 6598 FG log file sync 2 801 1 13786 LGWR log file parallel write 。。。。。省略部分内容 ------------------------------------------------------------------------------- Chain 2: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (authorcl.authorcl1) os id: 22323 process id: 64, oracle@xhdb-server3 session id: 14 session serial #: 29257 } is waiting for 'log file sync' with wait info: { p1: 'buffer#'=0x2ab p2: 'sync scn'=0x1ed4d8be time in wait: 4 min 18 sec timeout after: never wait id: 13287 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.000388 sec 1. event: 'SQL*Net message from client' time waited: 0.000486 sec wait id: 13286 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000027 sec 2. event: 'SQL*Net message to client' time waited: 0.000003 sec wait id: 13285 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #2 and #3: 0.001494 sec 3. event: 'SQL*Net message from client' time waited: 0.002089 sec wait id: 13284 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } and is blocked by => Oracle session identified by: { instance: 1 (authorcl.authorcl1) os id: 27634 process id: 20, oracle@xhdb-server3 (LGWR) session id: 386 session serial #: 1 } which is waiting for 'log file parallel write' with wait info: { p1: 'files'=0x1 p2: 'blocks'=0x2 p3: 'requests'=0x1 time in wait: 4 min 32 sec -----等待了4分32秒 timeout after: never wait id: 51742574 blocking: 17 sessions ------阻塞了17个Session wait history: * time between current wait and wait #1: 0.000176 sec 1. event: 'rdbms ipc message' time waited: 0.047194 sec wait id: 51742573 p1: 'timeout'=0x22 * time between wait #1 and #2: 0.000399 sec 2. event: 'log file parallel write' time waited: 0.004006 sec wait id: 51742572 p1: 'files'=0x1 p2: 'blocks'=0x2 p3: 'requests'=0x1 * time between wait #2 and #3: 0.000318 sec 3. event: 'rdbms ipc message' time waited: 2.654606 sec wait id: 51742571 p1: 'timeout'=0x12c } Chain 2 Signature: 'log file parallel write'<='log file sync'

我们可以看到,Node1上,lgwr进程阻塞了17个会话,等待log file parallel write,一直持续了4分32秒。

如果将时间点2014-04-22 17:24:08,向前推进4分32秒,那么就是2014-04-22 17:19:36.

我们再来检查Node2的操作系统日志你会发现有意思的事情:

1 2 3 4 5 6 7 8 9 10 11 Apr 22 17:19:16 xhdb-server4 vxio: [ID 442312 kern.notice] NOTICE: VxVM VVR vxio V-5-0-209 Log overflow protection on rlink oa_zs_bj triggered DCM protection Apr 22 17:19:16 xhdb-server4 vxio: [ID 636438 kern.warning] WARNING: VxVM VVR vxio V-5-0-1905 Replication stopped for RVG rvg_oa due to SRL overflow, DCM protection is triggered. To start replication, perform DCM resynchronization using "vradmin resync" command. Apr 22 19:41:19 xhdb-server4 su: [ID 810491 auth.crit] 'su root' failed for oa on /dev/pts/5 Apr 22 21:31:38 xhdb-server4 su: [ID 810491 auth.crit] 'su grid' failed for root on /dev/pts/14 Ap