Oracle BUG导致实例宕机:ORA-07445

2014-11-24 17:12:04 · 作者: · 浏览: 0

分析:
Ora-07445通常是Oracle自身的BUG导致的,
首先使用IPS收集了alert中的错误信息(IPS使用方法见我的另一篇文章《IPS简单使用方法》):
搜寻了一下metalink,发现客户的问题跟以下三篇Note中描述的BUG类似:
ORA-7445 (kcbw_get_bh) [ID 1341402.1]
Bug 9728912 [https://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top rptno=9728912] - PMON terminates instance due to ORA-7445 [kcbw_numperchunk] / ORA-7445 [kcbw_get_bh]] [ID 9728912.8]
Instance Crashed On ORA-7445 kcbw_numperchunk [ID 1364264.1]
但根据Note可以看到,相关的BUG已经在11.1.0.6中fix掉了。
看看客户数据库中的其余严重错误信息:
Node1:
adrci> show problem


ADR Home = /oracle/app/11gR1/diag/rdbms/xij/xij1:
*************************************************************************
PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME
-------------------- ----------------------------------------------------------- -------------------- ----------------------------------------
5 ORA 7445 [kcbw_get_bh()+67] 298938 2013-06-23 01:00:06.373716 +08:00
11 ORA 600 276161 2013-06-04 18:12:12.709933 +08:00
10 ORA 600 [729] 276160 2013-06-04 18:09:27.857128 +08:00
7 ORA 7445 [kgghash()+367] 253234 2013-06-03 15:27:04.349337 +08:00
9 ORA 7445 [kksMapCursor()+323] 256538 2013-05-27 09:54:58.684956 +08:00
8 ORA 7445 [qkabxo()+22] 251194 2013-05-01 22:03:37.715416 +08:00
2 ORA 600 [kghfrh:ds] 238818 2013-01-28 11:35:23.755034 +08:00
6 ORA 7445 [eoa_pm_push()+31] 239218 2013-01-28 11:24:42.835685 +08:00
3 ORA 7445 [ioei_get_method_counts()+39] 71129 2012-10-17 11:17:39.735719 +08:00
4 ORA 7445 [jol_calculate_transitive_interface_set()+1165] 74233 2012-10-17 11:05:51.570021 +08:00
1 ORA 600 [kghfru:ds] 6369 2012-09-07 17:35:55.001585 +08:00
11 rows fetched
Node2:
[oracle@XIJ02 ~]$ adrci


ADRCI: Release 11.1.0.6.0 - Beta on Mon Jun 24 14:59:37 2013


Copyright (c) 1982, 2007, Oracle. All rights reserved.
ADR base = "/oracle/app/11gR1"
adrci>
adrci>
adrci> set homepath diag/rdbms/xij/xij2
adrci>
adrci> show problem
ADR Home = /oracle/app/11gR1/diag/rdbms/xij/xij2:
*************************************************************************
PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME
-------------------- ----------------------------------------------------------- -------------------- ----------------------------------------
1 ORA 7445 [kgghash()+367] 209965 2013-06-16 23:34:39.333982 +08:00
2 ORA 7445 [kksMapCursor()+323] 190129 2013-05-27 09:54:56.121652 +08:00
2 rows fetched
adrci>
解决方法:
在客户的2个节点中一共发现了13个疑似BUG引起的数据库故障,总体而言,Oracle 11.1.0.6不算太稳定的版本,存在着各种BUG,
Oracle在11.1.0.7中Fix掉了11.1.0.6中发现的大部分BUG,所以相对而言要稳定得多,因此建议客户升级数据库至11.1.0.7或者11.2.0.3。



(Triage Tool 3.01, routed by file analysis):
Failing Function: kcbw_get_bh
Route To: BUFFER CACHE:MANAGEABILITY
Error Argument: [kcbw_get_bh]
Type of Error: ORA-07445
File Name: xij1_mman_2015_i298938.trc
Comment: Routed by Error Argument, Conventional routing
DB Version: 11.1.0.6.0
Platform: Linux CPU: x86_64
OS Version: 2.6.18-194.el5
Stack Trace: kcbw_get_bh kcbw_get_first_buffer kcbw_next_free kmgs_extract_mem_from_granule kmgs_process_request_immediate kmgs_process_request kmgsdrv ksbabs ksbrdp opirip