今天遇到一起ORACLE
数据库宕机案例,下面是对这起
数据库宕机案例的原因进行分析、解读。分析过程中顺便记录一下这个案例的前因后果,攒点经验值,培养一下分析、解决问题的能力。
?
案例环境:
?
操作系统 :
Oracle Linux Server release 5.7 64 bit
?
数据库版本:Oracle Database 10g Release 10.2.0.4.0 - 64bit Production
?
案例分析:
?
收到告警去检查数据库时,发现实例已经宕机。检查告警日志,发现下面错误信息:
?
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
Mon Nov 2 11:43:00 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
Mon Nov 2 11:43:00 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
Mon Nov 2 11:43:05 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
Mon Nov 2 11:43:05 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
Mon Nov 2 11:43:08 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select host,userid,password,...","sql area","tmp")
Mon Nov 2 11:43:08 2015
RECO: terminating instance due to error 4031
Mon Nov 2 11:43:08 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:
ORA-04031: unable to allocate bytes of shared memory ("","","","")
Instance terminated by RECO, pid = 6569
?
从告警日志我们可以看到ORA-00604与ORA-04031错误导致了这次宕机事故(RECO: terminating instance due to error 4031):
?
$ oerr ora 4031
?
04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\",\"%s\")"
?
// *Cause: More shared memory is needed than was allocated in the shared
?
// pool.
?
// *Action: If the shared pool is out of memory, either use the
?
// dbms_shared_pool package to pin large packages,
?
// reduce your use of shared memory, or increase the amount of
?
// available shared memory by increasing the value of the
?
// INIT.ORA parameters "shared_pool_reserved_size" and
?
// "shared_pool_size".
?
// If the large pool is out of memory, increase the INIT.ORA
?
// parameter "large_pool_size".
?
一般出现ORA-04031错误可能由两个原因引起:
?
1:内存中存在大量碎片,导致在分配内存的时候,没有连续的内存可存放,此问题一般是需要在开发的角度上入手,比如增加绑定变量,减少硬解析来改善和避免;
?
2.内存容量不足,需要扩大内存。
?
这台机器分配的物理内存为8G,结果检查发现SGA只分配了1168M,不到2G,瞬时碉堡了。此时真是很无语。ASH Report分析宕机前后的Buffer Cache和Shared Pool大小如下所示。
?
查看跟踪文件,可以看到SGA: allocation forcing component growth等待事件,可以确认的是由于SGA无法增长导致,也就是SGA被撑爆了,结合ASH Report我们可以看到当时Shared Pool的大小已经接近SGA的69.6%大小。
?
SO: 0xa617d9c0, type: 4, owner: 0xa8a26c68, flag: INIT/-/-/0x00
(session) sid: 932 trans: (nil), creator: 0xa