Multi-Server Job Scripting Failure

Central Management Servers are probably the single most important thing in my SQL life right now. I manage a lot of servers, so having an organized way to keep track of them is paramount. Multi-Server Jobs are a great way to centralize jobs that need to be on all your boxes. If you are not familiar with the idea, I suggest you look into it if you have more than a handful of servers.

I was building a new CMS the other day and had to get dozens of Multi-Server Jobs transferred to the new instance. Some of the jobs have 30 steps, so using Object Explorer to script out the jobs was the obvious choice. However, I was immediately hit with an ambiguous error stating that SQL Server was “Unable to cast object of type ‘System.DBNull’ to type ‘System.String’.”

Test

My original error had another, equally unhelpful line.

Knowing that some object is Null doesn’t help very much. After browsing through the job steps, everything looked in order. Internet searches provided surprisingly little on this subject, and nothing I found really explained how it happens. I know I was going to be migrating a CMS again, and soon, so I wanted to know how to fix it. Surely it wasn’t just a bug in Management Studio? On to the test case!

First of all, you need to create a Multi-Server job. I’m assuming you already have a CMS set up, if not, you need one.
Right-click Multi-Server Jobs, create a new job, and then assign at least two targets for testing. I’m going to use the name TestJob for the rest of this scenario.

test

Make sure you have at least 2 target servers

Once TestJob is made, (it does not matter what it does; mine just performed SELECT @@VERSION) you need to run the job. Execute the following code:
msdb.dbo.sp_start_job 'TestJob' or just start the job using the GUI.

Test

Your job can fail, I don’t care. Just run it.

Now that you have some job history for 2 servers, let’s look at that history:

SELECT *
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
WHERE j.name = 'TestJob'

This will display one line per targeted server. This query is included for your own reference, we are concerned about the next one’s more targeted results. The query is checking your multi-server job history, specifically the server list, and comparing that to the servers that exist on the CMS. Right now, everything looks fine, and matches.

SELECT
[JobName] = j.Name
,[JobServerId] = js.Server_id
,[TargetServerId] = ts.Server_id
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
LEFT JOIN msdb.dbo.systargetservers ts ON ts.server_id = js.server_id
WHERE j.name = 'TestJob'

Test

Job History Server_Id matches Target Server_Id, awesome!

Again, at this point, all is well, and you can script out the multi-server job to your heart’s content. That would be fine if you saved a copy and never, ever changed the job while it sat on your CMS. Fast forward a few years though, and weird things may have happened. Let’s simulate a weird thing. I’m going to change the Server_Id in the history to an invalid value. That’s right, there are no constraints preventing me from doing this! This is going to simulate removing a server from the CMS after history already exists for it, or someone just going into your history and changing things because they hate you.

You will need to change the numeric values of the server_ids to match invalid and valid server_ids on your CMS, respectively.
UPDATE js
SET Server_id = 7
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
WHERE j.name = 'TestJob' and server_id = 5

Now we are going to rerun the script from earlier to verify the job history information.

SELECT
[JobName] = j.Name
,[JobServerId] = js.Server_id
,[TargetServerId] = ts.Server_id
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
LEFT JOIN msdb.dbo.systargetservers ts ON ts.server_id = js.server_id
WHERE j.name = 'TestJob'

Test

The history now has a server_id that doesn’t exist on the CMS

Great…now we have a null server. If you guessed this is a problem, you guessed correctly. If you try to script the job out now, you will receive a failure notification like the one at the top. Now that we have recreated the issue, we know how to fix it though!

Run the following code to find any null servers and just clean out that history. I’m going to assume if your server doesn’t exist, you don’t care about the history. I didn’t at least.

DELETE FROM js
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
LEFT JOIN msdb.dbo.systargetservers ts ON ts.server_id = js.server_id
WHERE j.name = 'TestJob' AND ts.server_id IS NULL

Finally, let’s look at the job history again, I’ve supplied the code for a third time, just in case you forgot it.

SELECT
[JobName] = j.Name
,[JobServerId] = js.Server_id
,[TargetServerId] = ts.Server_id
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
LEFT JOIN msdb.dbo.systargetservers ts ON ts.server_id = js.server_id
WHERE j.name = 'TestJob'

Test

Now we have clean history with no null values

If you try to script the job, it will work, and you can copy your multi-server job over to the new server.

Let’s take it a step further, and expand the script to check all the multi-server jobs. Notice that it is only looking at Category_Id=2, or Multi-Server jobs, and only those with invalid Server_Ids.

SELECT
[JobName] = j.Name
,[JobServerId] = js.Server_id
,[TargetServerId] = ts.Server_id
FROM msdb.dbo.sysjobservers js
INNER JOIN msdb.dbo.sysjobs j ON j.job_id = js.job_id
LEFT JOIN msdb.dbo.systargetservers ts ON ts.server_id = js.server_id
WHERE j.category_id = 2 AND ts.server_id IS NULL

Hopefully you got 0 results, and everything on your CMS is happy.
Even though I changed the Server_Id to recreate this scenario, the original problem occurred due to servers defecting from the CMS, but leaving behind their server history. Anytime a server defects, it needs to automatically remove the history from the sysjobservers table as well, otherwise people will continue to have this problem.

Advertisements

One thought on “Multi-Server Job Scripting Failure

  1. Pingback: Migração do MSX SQL Server 2005 para SQL Server 2014 | Comunidade SQL Server

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s