It’s been quite some time since I did a post relating to SQL. I’ve just been sharing a lot of PowerShell scripts lately because almost all my views are related to PowerShell.
Monitoring SQL Jobs can be done in many ways, and I highly suggest your jobs send an email notification to your DBA group on failure. No not just you. What happens when you are on vacation or you leave permanently and your account gets disabled? No notifications. Don’t do that. You can also set up fancy reports to display information about your job status using SSRS, but I always like to have something that is easily portable and can be ran in SSMS quickly.
In the middle of the night when email isn’t loading and a call comes in that something important failed, but the helpdesk can’t articulate what exactly it is over the phone…this script can help a lot. Suddenly confusion and trying to piece together a string of letters and numbers over the phone turns into a simple, “Oh, yes, I see Server X has failed Job Y 3 times. It ran successfully last time, I’ll go ahead and connect to that server to look into it but I suspect someone might have been doing Z.”
I developed this failed job script to run on a single SQL server or through a multi-server query to see all failed jobs across all servers. I commonly run this script in the mornings as a summary when there are multiple failed job emails that came in over night. It’s a much easier way for me to see the history this way. If a job is continuing to fail and it’s high priority, I start there.
The job provides the name, run date, and run time for each failed job, but it also provides some extra information for sorting priority. I’ve used
ROW_NUMBER to sort job history and retrieve the latest run results of each failed job. That’s the only purpose of using a CTE here, getting Row 1, or the most recent run result for each job. Using the latest result helps determine how problematic the reported job failure is. Did the latest trans log backup run successfully even though one of them failed in the past? OK, that’s probably a minor issue then. Are they still failing? Well, then we have something a bit more worrisome to consider.
I always limit the results to a 24 hour period to limit the result set. I’ve also included a clause in the past to exclude a specific job that was prone to failing due to network issues or a colleague testing something.
WITH JobInfo AS ( SELECT j.job_id ,j.name ,jh.run_status ,jh.run_date ,jh.run_time ,jh.[message] ,[PriorStatus] = --'NA' --2008 LAG(jh.run_status,1) OVER (ORDER BY jh.job_id,jh.run_date,jh.run_time) --2012 ,[NextStatus] = --'NA' --2008 LEAD(jh.run_status,1) OVER (ORDER BY jh.job_id,jh.run_date,jh.run_time) --2012 ,[RN] = ROW_NUMBER() OVER (PARTITION BY jh.job_id ORDER BY jh.job_id DESC, jh.run_date DESC, jh.run_time DESC) FROM msdb.dbo.sysjobs j INNER JOIN msdb.dbo.sysjobhistory jh ON jh.job_id = j.job_id WHERE jh.run_date &amp;gt;= CONVERT(INT,CONVERT(VARCHAR,GETDATE()-1,112)) AND jh.step_id = 0 ) SELECT j.Name ,[RunDate] = CONVERT(DATE,CONVERT(VARCHAR,j.run_date,112)) ,[RunTime] = CONVERT(TIME(0),STUFF(STUFF(REPLACE(STR(RTRIM(j.run_time),6,0),' ','0'),3,0,':'),6,0,':')) ,[PriorStatus] = CASE j.PriorStatus WHEN '1' THEN 'SUCCESS' WHEN '0' THEN 'FAIL' ELSE 'N/A' END ,[NextStatus] = CASE j.NextStatus WHEN '1' THEN 'SUCCESS' WHEN '0' THEN 'FAIL' ELSE 'N/A' END ,[LatestStatus] = CASE ji.run_status WHEN '1' THEN 'SUCCESS' ELSE 'FAIL' END ,j.[message] FROM JobInfo j INNER JOIN JobInfo ji ON ji.job_id = j.job_id AND ji.RN = 1 WHERE j.run_status = 0 ORDER BY j.name,[RunDate],[RunTime]
If you are running the script on SQL 2012, you can get a bit more information using
LEAD/LAG in the form of the completion status of the job before and after the reported failure. I use this to pattern failures and to further prioritize multiple job failures. All other things being equal, I’m going to be more concerned about the job that has consecutive failure than a job that fails intermittently.
LEAD/LAG, and SQL 2008R2 or older do not play well with each other. I’ve tried making
IF statements that run the
LEAD/LAG version of the job only on SQL 2012 while running the
NA version of the job otherwise, but SQL always wants to parse the 2012 functions, resulting in errors. I’ve resigned to just commenting out the
LEAD/LAG functions when running this script on SQL 2008R2 and older. The benefits of the
NextStatus is fairly minimal, so it’s not a huge loss. I find the
LatestStatus to be the most important of the three by far.