Get Failed Jobs with T-SQL

It’s been quite some time since I did a post relating to SQL. I’ve just been sharing a lot of PowerShell scripts lately because almost all my views are related to PowerShell.

Monitoring SQL Jobs can be done in many ways, and I highly suggest your jobs send an email notification to your DBA group on failure. No not just you. What happens when you are on vacation or you leave permanently and your account gets disabled? No notifications. Don’t do that. You can also set up fancy reports to display information about your job status using SSRS, but I always like to have something that is easily portable and can be ran in SSMS quickly.

Uses

In the middle of the night when email isn’t loading and a call comes in that something important failed, but the helpdesk can’t articulate what exactly it is over the phone…this script can help a lot. Suddenly confusion and trying to piece together a string of letters and numbers over the phone turns into a simple, “Oh, yes, I see Server X has failed Job Y 3 times. It ran successfully last time, I’ll go ahead and connect to that server to look into it but I suspect someone might have been doing Z.”

I developed this failed job script to run on a single SQL server or through a multi-server query to see all failed jobs across all servers. I commonly run this script in the mornings as a summary when there are multiple failed job emails that came in over night. It’s a much easier way for me to see the history this way. If a job is continuing to fail and it’s high priority, I start there.

Explanation

The job provides the name, run date, and run time for each failed job, but it also provides some extra information for sorting priority. I’ve used ROW_NUMBER to sort job history and retrieve the latest run results of each failed job. That’s the only purpose of using a CTE here, getting Row 1, or the most recent run result for each job. Using the latest result helps determine how problematic the reported job failure is. Did the latest trans log backup run successfully even though one of them failed in the past? OK, that’s probably a minor issue then. Are they still failing? Well, then we have something a bit more worrisome to consider.

I always limit the results to a 24 hour period to limit the result set. I’ve also included a clause in the past to exclude a specific job that was prone to failing due to network issues or a colleague testing something.

WITH JobInfo AS
(
 SELECT
  j.job_id
  ,j.name
  ,jh.run_status
  ,jh.run_date
  ,jh.run_time
  ,jh.[message]
  ,[PriorStatus] = --'NA' --2008
    LAG(jh.run_status,1) OVER (ORDER BY jh.job_id,jh.run_date,jh.run_time)  --2012
  ,[NextStatus] = --'NA' --2008
    LEAD(jh.run_status,1) OVER (ORDER BY jh.job_id,jh.run_date,jh.run_time) --2012
  ,[RN] = ROW_NUMBER() OVER (PARTITION BY jh.job_id
 ORDER BY jh.job_id DESC, jh.run_date DESC, jh.run_time DESC)
 FROM msdb.dbo.sysjobs j
  INNER JOIN msdb.dbo.sysjobhistory jh ON jh.job_id = j.job_id
 WHERE jh.run_date >= CONVERT(INT,CONVERT(VARCHAR,GETDATE()-1,112))
  AND jh.step_id = 0
)
SELECT
  j.Name
  ,[RunDate] = CONVERT(DATE,CONVERT(VARCHAR,j.run_date,112))
  ,[RunTime] = CONVERT(TIME(0),STUFF(STUFF(REPLACE(STR(RTRIM(j.run_time),6,0),' ','0'),3,0,':'),6,0,':'))
  ,[PriorStatus] = CASE j.PriorStatus WHEN '1' THEN 'SUCCESS' WHEN '0' THEN 'FAIL' ELSE 'N/A' END
  ,[NextStatus] = CASE j.NextStatus WHEN '1' THEN 'SUCCESS' WHEN '0' THEN 'FAIL' ELSE 'N/A' END
  ,[LatestStatus] = CASE ji.run_status WHEN '1' THEN 'SUCCESS' ELSE 'FAIL' END
  ,j.[message]
 FROM JobInfo j
   INNER JOIN JobInfo ji ON ji.job_id = j.job_id AND ji.RN = 1
 WHERE j.run_status = 0
 ORDER BY j.name,[RunDate],[RunTime]

Exceptions

If you are running the script on SQL 2012, you can get a bit more information using LEAD/LAG in the form of the completion status of the job before and after the reported failure. I use this to pattern failures and to further prioritize multiple job failures. All other things being equal, I’m going to be more concerned about the job that has consecutive failure than a job that fails intermittently.

Sadly, IF statements, LEAD/LAG, and SQL 2008R2 or older do not play well with each other. I’ve tried making IF statements that run the LEAD/LAG version of the job only on SQL 2012 while running the NA version of the job otherwise, but SQL always wants to parse the 2012 functions, resulting in errors. I’ve resigned to just commenting out the LEAD/LAG functions when running this script on SQL 2008R2 and older. The benefits of the PriorStatus and NextStatus is fairly minimal, so it’s not a huge loss. I find the LatestStatus to be the most important of the three by far.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s