CASE:DBshutdown/open过程中发生异常导致JOB不能自动执行(一)

2014-11-24 15:34:30 · 作者: · 浏览: 4
CASE:DB shutdown/open 过程中发生异常导致JOB不能自动执行

现象:
一个DB中的所有JOB在3月25日之后就不再自动运行,查询DBA_JOBS,发现LAST_DATE定格在3月25日之前。
SQL> show parameter job

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes integer 5

SQL> select job,LAST_DATE,LAST_SEC,NEXT_DATE,NEXT_SEC,TOTAL_TIME,BROKEN,INTERVAL,what from dba_jobs;

JOB LAST_DATE LAST_SEC NEXT_DATE NEXT_SEC TOTAL_TIME BROKEN INTERVAL WHAT
---------- ----------- ---------------- ----------- ---------------- ---------- ------ ---------------------------------------- --------------------------------------------------
1 2013/3/25 1 17:26:39 2013/3/25 1 17:27:39 180875 N sysdate + 1 / (24 * 60) EMD_MAINTENANCE.EXECUTE_EM_DBMS_JOB_PROCS();
3 2013/3/25 6 06:10:18 2013/5/3 11 11:22:47 758193 N sysdate+1 P_IMG_CSU_STAT_INFO;
4 2013/3/22 1 12:07:20 2013/5/3 10 10:33:37 4165016 N sysdate + 1 p_IMG_csu_stat_daily_ex_exec;
5 2013/3/23 1 14:27:31 2013/4/10 6 06:00:00 13238996 N sysdate + 1 P_IMG_CSU_TWS_MONTHLY_EXEC;
122 2013/3/25 1 10:55:58 2013/4/10 1 10:55:58 140067 N sysdate+1 p_IMG_csu_stat_wt_daily_exec;
42 2013/3/25 1 17:23:29 2013/4/10 8 08:00:00 6803650 N SYSDATE+1/288 qpg_kj_tj;
62 2013/3/22 0 00:10:05 2013/4/10 0 00:10:00 7147390 N TRUNC(SYSDATE)+1/(24*6)+1 neimg.pack_qpg_IMG_dailytj.DAILY_STATISTICS;
63 2013/3/22 0 00:00:01

分析:
Step1,查看dba_jobs等视图
Step2,查看alert日志:
Mon Mar 25 09:05:22 2013
Errors in file /app/oracle/admin/neimg/bdump/neimg_j006_27877.trc: <===
ORA-12012: error on auto execute of job 42
ORA-01403: no data found
ORA-06512: at "neimg.qpg_KJ_TJ", line 33
ORA-06512: at line 1
Mon Mar 25 09:08:29 2013

通过alert日志的报错,追溯到neimg_j006_27877.trc,但没有有价值的信息。
Step3,从OS层面trace进程cjq0
[oracle@IMG_PIDB udump]$ ps -ef|grep cjq
oracle 4161 4051 0 08:30 pts/0 00:00:00 grep cjq
oracle 11984 1 0 Mar25 00:00:02 ora_cjq0_neimg
[oracle@IMG_PIDB udump]$ strace -p 11984
Process 11984 attached - interrupt to quit
times(NULL) = 790492277
semtimedop(393217, 0x7fff39107150, 1, {1, 840000000}) = -1 EAGAIN (Resource temporarily unavailable)
times(NULL) = 790492461
times(NULL) = 790492461
semtimedop(393217, 0x7fff39107150, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable) <<====
getrusage(RUSAGE_SELF, {ru_utime={1, 664746}, ru_stime={0, 500923}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={1, 664746}, ru_stime={0, 500923}, ...}) = 0
times(NULL) = 790492761
times(NULL) = 790492761
semtimedop(393217, 0x7fff39107150, 1, {3, 0}
Process 11984 detached

semtimedop(393217, 0x7fff39107150, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
通过trace,发现自从3月25日DB重启后一直在semtimedop内存段’0x7fff39107150'

Step4,删除job 4,并重建,新的job号是292,但是新建的JOB 292仍然不能自动执行。列THIS_DATE,THIS_SEC值为空。

Step5,做10046,如下:
SQL> conn /as sysdba
Connected.
SQL> oradebug setospid 11984;
oradebug unlimit;
oradebug event 10046 trace name context forever,level 12;
oradebug tracefile_name Oracle pid: 13, Unix process pid: 11984, image: oracle@IMG_PIDB (CJQ0)
SQL> Statement processed.
SQL> Statement processed.
SQL>
/app/oracle/admin/neimg/bdump/neimg_cjq0_11984.trc
SQL> conn neimg
Enter password:
Connected.
SQL> exec DBMS_JOB.NEXT_DATE(292,sysdate+1/24/60); <===设定为1分钟后执行

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL>

Unix process pid: 11984, image: oracle@IMG_PIDB (CJQ0)

*** 2013