Several of our customers have had issues lately that were related to Oracle Job Queue processes not running properly. The symptoms were jobs in the job queue which were not broken, and they hadn’t run during their scheduled time to start. In addition, there weren’t any Oracle trace files reporting job processing errors, so the problem was difficult to debug. Since all of Ignite’s 6.5 monitoring and alerting processes are scheduled using Oracle jobs, it is important that job processing runs automatically and consistently. To view Ignite’s jobs in the job_queue, the following command can be issued in the repository database as the repository owner:
select job,what, next_date, broken, interval, failures from user_jobs order by next_date;

Figure 1: Ignite’s Repository Job Queue
Brief Description of Ignite’s Jobs
Concap.agent_scheduler is a job that runs every 5 minutes throughout the day to determine if the quickpoll monitor for each monitored database is running or is scheduled to be run. The agent_scheduler checks the monitoring schedule, then manually runs the quickpoll job with a ‘dbms_job.run(job)’ command, for each specific database where monitoring needs started.
Concap.alert_job(‘database’,True) is a job that runs every 5 minutes for a specific database and checks for alerts to run. If an alert is scheduled to run, the alert_job will then execute the specific alert.
The Concap.quickpoll(‘database’,True) basically runs continously, executing the quickpoll monitor and summary process. This job is never automatically started as the agent_scheduler will manually start it if it’s scheduled to run. Note: The next_date and interval fields in user_jobs table for this job should never be used when trying to diagnose quickpoll monitoring problems as these jobs are never automatically started.
Also, there may be other important background jobs running in the database as well. To view the full job queue for the entire database
instance, the DBA_JOBS and DBA_SCHEDULER_JOBS tables can be queried similarly to the above query.
There are several reasons why a job does not run as scheduled, including several documented Oracle Bugs. Oracle’s Metalink site has a wonderful troubleshooting document which describes the most common reasons for jobs not executing and it steps you through solutions in order to resolve the issue. The document, Note:313102.1, is for Oracle Versions 9.2.0.4 to 10.1.0.2. However, the checklist is a great reference regardless of the version if you are having issues with job processing. For convenience, I’ve included the brief checklist here:
Checklist For Job Issues:
- Is the Database Instance in RESTRICTED SESSIONS mode?
- Is the parameter JOB_QUEUE_PROCESSES set to 0?
- Is the hidden parameter ‘_SYSTEM_TRIG_ENABLED’ set to FALSE?
- Is the job marked ‘BROKEN’ in the DBA_JOBS table?
- Have you issued a commit after submitting the job?
- Have you tried setting the hidden parameter, “_job_queue_interval” to a different value (default is 5 seconds)?
- Has the database server (machine) been up for more than 497 days
- Check Internal Oracle bug: 3427424 “SLGCSF / SLGCS STOP INCREMENTING AFTER 497 DAYS UPTIME”
- It is fixed in 10.1.0.2.0. – workaround is to reboot of the database server.
- Is the job still running in DBA_JOBS_RUNNING?
- Does the LAST_DATE and NEXT_DATE fields in DBA_JOBS make sense for the particular job?
- If LAST_DATE is null, the job has never executed automatically.
- Does the NEXT_DATE field change per the INTERVAL field in DBA_JOBS?
- If not, it’s not automatically working?
- Have you tried changing the value for JOB_QUEUE_PROCESSES to ’0′ and then back again?
- This restarts the CJQ process.
- See Oracle Bug 2649244 (fixed by: 9015, 9203, 10201)
- Finally, if you have upgraded or refreshed the database, try executing ‘exec dbms_ijob.set_enabled(true);’ as sysadm.
As I stated earlier, several customers ran into the job queue processing problem lately. Coincidently, they all encountered the 497 day bug (one of the checks, above) and had to reboot their database server in order to fix the issue. It is interesting to note that one customer looked at the uptime on their server (which only had been up 126 days) and initially dismissed the 497 day bug issue. It wasn’t until they worked through the entire checklist before they decided to try rebooting the server which fixed the issue. I recall earlier Oracle 8 versions having a similar 248 day bug which required a reboot of the system, so maybe the issue presents itself differently on depending on OS and/or OS versions, etc…
Solution for the 497 day Bug
- Shutdown all applications, including databases on database server.
- Shutdown the database server (machine).
- Restart all applications, including databases.
- Check that jobs are executing automatically.

