NDMCDB数据库hang住故障分析-cursor:pinSwaitonX(一)

2015-07-24 11:34:38 · 作者: · 浏览: 16

问题描述:

上午刚刚到办公室,就有监控人员邮件反馈,昨晚NDMCDB407数据库被重启过,让我分析一下数据库重启的原因。由于昨晚业务有版本上线,所以短信警告关闭了,所以没有短信下发到我手机上,而且故障时相关人员也没有通知到我。

1 检查alert日志

从alert日志中,可以看到,先是在03:29时有一个job运行失败了:
Fri Aug 22 03:29:29 2014
Errors in file/opt/oracle/diag/rdbms/ndmcdb/NDMCDB/trace/NDMCDB_j000_28856.trc:
ORA-12012: error on auto execute of job 31
ORA-04023: ObjectNDMC.DELETE_ANONY_RSHARE_INFO could not be validated or authorized
ORA-06512: at "NDMC.PROC_NDMC_CANCEL_OPEN",line 5
ORA-06512: at line 1
然后在03:49时,出现了连接超时失败,而且一直持续到05:00:08:
Fri Aug 22 03:49:43 2014
***********************************************************************
 
Fatal NI connect error 12170.
 
 VERSION INFORMATION:
       TNS for Linux: Version 11.1.0.7.0 - Production
       Oracle Bequeath NT Protocol Adapter for Linux: Version 11.1.0.7.0 -Production
       TCP/IP NT Protocol Adapter for Linux: Version 11.1.0.7.0 - Production
 Time: 22-AUG-2014 03:49:43
 Tracing not turned on.
  Tnserror struct:
   ns main err code: 12535
   
TNS-12535: TNS:operation timed out
   ns secondary err code: 12606
   nt main err code: 0
   nt secondary err code: 0
   nt OS err code: 0
 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.130.87)(PORT=36628))
WARNING: inbound connection timed out(ORA-3136)
Fri Aug 22 03:49:44 2014
……
而且出现了连接数耗尽了:
Fri Aug 22 03:49:50 2014
ORA-00020: maximum number of processes 0exceeded
   ns secondary err code: 12560
   
ns secondary err code: 12560 ns main err code: 12537 Fri Aug 22 03:49:50 2014 …… Fri Aug 22 03:51:48 2014 *********************************************************************** Fatal NI connect error 12537, connectingto: (LOCAL=NO) VERSION INFORMATION: TNS for Linux: Version 11.1.0.7.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.1.0.7.0 -Production TCP/IP NT Protocol Adapter for Linux: Version 11.1.0.7.0 - Production Time: 22-AUG-2014 03:51:48 Tracing not turned on. Tnserror struct: ns main err code: 12537 TNS-12537: TNS:connection closed ns secondaryerr code: 12560 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 ORA-609 : opiodr aborting process unknownospid (30476_47044991385184) Fri Aug 22 04:14:15 2014 ORA-28 : opiodr aborting process unknownospid (24925_46986315964000) Fri Aug 22 04:16:27 2014 ORA-28 : opiodr aborting process unknownospid (22475_47013891882592) Fri Aug 22 04:16:28 2014 ORA-28 : opiodr aborting process unknownospid (21356_47116835528288) Fri Aug 22 04:16:29 2014 ORA-28 : opiodr aborting process unknownospid (24947_47774766210656) ORA-28 : opiodr aborting process unknownospid (14958_47053435166304) …… Fri Aug 22 05:00:05 2014 ORA-28 : opiodr aborting process unknownospid (25765_46941307182688) Fri Aug 22 05:00:08 2014 ORA-28 : opiodr aborting process unknownospid (4949_47396524895840) 于是在05:04数据库被关闭,从日志来看,这是正常关闭的,初步怀疑是人为关闭或是VCS双机自动将数据库关闭了: Fri Aug 22 05:04:10 2014 Stopping background process SMCO Stopping background process FBDA Shutting down instance: further logonsdisabled Fri Aug 22 05:04:12 2014 Stopping background process CJQ0 Stopping background process QMNC Stopping background process MMNL Stopping background process MMON Shutting down instance (immediate) License high water mark = 1220 Stopping Job queue slave processes, flags =7 Fri Aug 22 05:04:20 2014 Waiting for Job queue sl