D PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g
[Thread debugging using libthread_db enabled]
0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#1 0x0000000003be9b09 in sskgpwwait ()
#2 0x0000000003bccdf0 in skgpwwait ()
#3 0x0000000000855f4a in ksdxsus ()
#4 0x0000000000857118 in ksdxffrz ()
#5 0x0000000000853003 in ksdxcb ()
#6 0x0000000001ebd0cf in sspuser ()
#7
#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#9 0x0000000003be9b09 in sskgpwwait ()
#10 0x0000000003bccdf0 in skgpwwait ()
#11 0x0000000000798319 in kslwaitns_timed ()
#12 0x00000000008c3b1d in kskthbwt ()
#13 0x0000000000797e54 in kslwait ()
#14 0x00000000029f3f3b in ksarcv ()
#15 0x000000000082e8bf in ksbabs ()
#16 0x0000000000835822 in ksbrdp ()
#17 0x0000000002f4d840 in opirip ()
#18 0x000000000132b016 in opidrv ()
#19 0x0000000001eb3146 in sou2o ()
#20 0x0000000000723245 in opimai_real ()
#21 0x00000000007230fc in main ()
A debugging session is active.
Inferior 1 [process 3046] will be detached.
Quit anyway? (y or n) [answered Y; input not from terminal]
*** 2015-03-21 17:34:29.895
*** 2015-03-21 17:34:39.899
Waited for detached process: CKPT for 320 seconds:
*** 2015-03-21 17:34:39.899
Dumping diagnostic information for CKPT:
OS pid = 3046
loadavg : 0.06 0.01 0.00
memory info: free memory = 0.00M
swap info: free = 0.00M alloc = 0.00M total = 0.00M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S oracle 3046 1 0 75 0 - 184266 - 17:24 ? 00:00:00 ora_ckpt_ora10g
[Thread debugging using libthread_db enabled]
0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#0 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#1 0x0000000003be9b09 in sskgpwwait ()
#2 0x0000000003bccdf0 in skgpwwait ()
#3 0x0000000000855f4a in ksdxsus ()
#4 0x0000000000857118 in ksdxffrz ()
#5 0x0000000000853003 in ksdxcb ()
#6 0x0000000001ebd0cf in sspuser ()
#7
#8 0x00000035cc2d517a in semtimedop () from /lib64/libc.so.6
#9 0x0000000003be9b09 in sskgpwwait ()
#10 0x0000000003bccdf0 in skgpwwait ()
#11 0x0000000000798319 in kslwaitns_timed ()
#12 0x00000000008c3b1d in kskthbwt ()
#13 0x0000000000797e54 in kslwait ()
#14 0x00000000029f3f3b in ksarcv ()
#15 0x000000000082e8bf in ksbabs ()
#16 0x0000000000835822 in ksbrdp ()
#17 0x0000000002f4d840 in opirip ()
#18 0x000000000132b016 in opidrv ()
#19 0x0000000001eb3146 in sou2o ()
#20 0x0000000000723245 in opimai_real ()
#21 0x00000000007230fc in main ()
A debugging session is active.
Inferior 1 [process 3046] will be detached.
Quit anyway? (y or n) [answered Y; input not from terminal]
*** 2015-03-21 17:34:40.598
从日志文件看出,SMON进程在等待与CKPT进程通信,难道CKPT进程异常吗?再将注意力转移至CKPT,查看CKPT的trace文件: *** 2015-03-21 17:26:15.972
*** SERVICE NAME:(SYS$BACKGROUND) 2015-03-21 17:26:15.972
*** SESSION ID:(165.1) 2015-03-21 17:26:15.972
Received ORADEBUG command 'suspend' from process Unix process pid: 3062, image: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<问题出在这里!!!!!
Received ORADEBUG command 'tracefile_name' from process Unix process pid: 3062, image:
自此,我们就明白了事情的缘由经过:
1. 由于CKPT进程被挂起,使得检查点无法完成、SMON进程长时间持有资源Enqueue CI-00000001-00000005,最后导致数据库HANG住; 2. 随着资源Enqueue CI-00000001-00000005被SMON进程持有,导致一系列等待事件library cache lock,enq: CI - contention,log file switch (checkpoint incomplete)的出现
重新把CKPT进程恢复:
SQL> oradebug setospid 3046
Oracl |