原文博客链接地址:数据库open报错ORA-01555: snapshot too old
今天正在东莞蜜月的时候,一个学生说他管理的测试库出问题了,无法open,我们先来看看是什么问题:
Recovery of Online Redo Log: Thread 1 Group 4 Seq 4 Reading mem 0
Mem# 0: /onlinelog/shr/redo04.log
Completed redo application of 0.00MB
Completed crash recovery at
Thread 1: logseq 4, block 3, scn 7755957
0 data blocks read, 0 data blocks written, 0 redo k-bytes read
Thread 1 advanced to log sequence 5 (thread open)
Thread 1 opened at log sequence 5
Current log# 5 seq# 5 mem# 0: /onlinelog/shr/redo05.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Thu Jun 19 13:31:35 2014
SMON: enabling cache recovery
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0000.007658ba):
select ctime, mtime, stime from obj$ where obj# = :1
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_ora_5262.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6_1263032392$" too small
Errors in file /oracle/diag/rdbms/shr/shr/trace/shr_ora_5262.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 6 with name "_SYSSMU6_1263032392$" too small
Error 704 happened during db open, shutting down database
USER (ospid: 5262): terminating the instance due to error 704
Instance terminated by USER, pid = 5262
ORA-1092 signalled during: ALTER DATABASE OPEN...
opiodr aborting process unknown ospid (5262) as a result of ORA-1092
Thu Jun 19 13:31:37 2014
ORA-1092 : opitsk aborting process |
从上面的错误来看,该数据库之所以open失败,是由于Oracle在bootstrap阶段执行递归SQL时出现ora-01555错误,
这样bootstrap过程无法继续下去,也就导致数据库无法open。我们可以看到报错的SQL语句如下:
select ctime, mtime, stime from obj$ where obj# = :1
这是很熟悉的一个SQL,通过10046 trace跟踪Oracle open的过程你会发现该SQL。
针对该错误,或许有人以为是回滚段的问题,实际上并不是,这种情况下推进下SCN 就可以很顺利的把数据库open。
但是这里有个问题:该兄弟的数据库是Oracle 11.2.0.4,已经不支持传统的10015 event的方式了。
下面我们通过oradebug 来解决该问题:
SQL> conn /as sysdba
Connected to an idle instance.
SQL> startup mount
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2260088 bytes
Variable Size 989856648 bytes
Database Buffers 3271557120 bytes
Redo Buffers 12107776 bytes
Database mounted.
SQL>
SQL> oradebug poke 0x06001AE70 4 0x859AFA
ORA-00074: no process has been specified
SQL> oradebug setmypid
Statement processed.
SQL> oradebug DUMPvar SGA kcsgscn_
kcslf kcsgscn_ [06001AE70, 06001AEA0) = 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 6001AB50 00000000
SQL> oradebug poke 0x06001AE70 4 0x859AFA
BEFORE: [06001AE70, 06001AE74) = 00000000
AFTER: [06001AE70, 06001AE74) = 00859AFA
SQL> alter database open;
Database altered.
SQL> |
这里简单解释一下,4 为长度,0x859AFA是16进制,我在原来的v$datafile_header.checkpoint_change#的基础之上
加上上1000000得到该值。
我们可以看到,顺利打开了数据库。最后再出观察下alert log发现居然有ora-00600 4194错误。
Thu Jun 19 14:48:43 2014
Dumping diagnostic data in directory=[cdmp_20140619144843], requested by (instance=1, osid=9140 (MMON)), summary=[incident=132122].
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x3D6C3836] [PC:0x97F4DF6, kgegpa()+40] [flags: 0x0, count: 1]
Exception [type: SIGSEGV, Address not mapped to obj |