Failover Cluster Manager Connection Error Fix

A few days ago I encountered a new error with Failover Cluster Manager.  A couple of servers had been rebuilt to upgrade them from Windows Server 2008 to 2012. They were added back to the cluster successfully. However, one of the servers would not open Failover Cluster Manager properly, and tracking down the solution took a long time.

The problem server successfully joined the cluster, but now it would not connect to the cluster using Failover Cluster Manager. If you opened up the application, it didn’t try to automatically connect, and manually connecting with the fully qualified name failed too. Below is the generated error.

failoverclustermanager_wmierror

I love how this error has absolutely no useful information to it. Luckily I was able to track Error 0x80010002 down online.

Research indicated that there was some sort of WMI error on the computer. Rebooting didn’t help anything, and after numerous attempts to correct/rebuild the WMI repository, not much was accomplished. Eventually, the server could connect to the cluster, but that only worked about 30% of the time, and it nearly timed out even when it did succeed! The cluster still never connected automatically.

After further poking around on the internet, I found a few suggested solutions, with my ultimate fix closely following this post. I still had to combine everything together and run scripts all over the cluster before things returned to normal.

First of all, this is a condensed version of the Cluster Query from the TechNet post linked above.

1) Cluster Query


$Nodes = Get-ClusterNode
ForEach ($Node in $Nodes)
{
 If($Node.State -eq "Down")
  { Write-Host "$Node : Node down skipping" }
 Else
 {
  Try
  {
   $Result = (Get-WmiObject -Class "MSCluster_CLUSTER" -NameSpace "root\MSCluster" -Authentication PacketPrivacy -ComputerName $Node -ErrorAction Stop).__SERVER
   Write-Host -ForegroundColor Green "$Node : WMI query succeeded"
  }
  Catch
  {
   Write-Host -ForegroundColor Red "$Node : WMI Query failed" -NoNewline
   Write-Host  " //"$_.Exception.Message
  }
 }
}

Any server that throws an error with the above query needs to have the following scripts ran on it:

2) MOF Parser
This will parse data for the cluster file. 

cd c:\windows\system32\wbem
mofcomp.exe cluswmi.mof

FCM was still not working correctly, so I reset WMI with the following command.

3) Reset WMI Repository


Winmgmt /resetrepository

That will restart the WMI service, so you’ll probably have to try running it multiple times until all the dependent services are stopped. The command shouldn’t take more than a few seconds to process either way though.

After that, the server that failed the Cluster Query (1) was reporting good connections, but FCM still wouldn’t open properly!

I decided to try the two WMI commands (2 & 3) again on the original server that couldn’t connect to FCM. I had already ran those commands there during the initial troubleshooting, so I was starting to think this was a dead end. Still, it couldn’t hurt, so I gave it a shot.

I reopened FCM and voila! Now the cluster was automatically connecting and looking normal.

As a further note, after everything appeared to be working correctly, SQL was having trouble validating connections to each node in the cluster during install, and I had to run commands 2 & 3 on yet another node in the cluster before things worked 100%, even though that node never had a connection error using the Cluster Query (1).

Advertisements

Getting Cluster Nodes through T-SQL

One of my most frustrating moments is when I need to log into the active node of a cluster. I inevitably choose a passive node first, then wind up opening the Failover Cluster Manager to find the active node then finally log in to the correct server. Well, I finally took some time to get it right by using my trusty Central Management Server and T-SQL.

Running all of these scripts on a CMS will allow you to hit every SQL server in your domain. When I first decided to pull Server data using a script, I immediately thought of @@SERVERNAME. However, this returns the cluster name (for clusters), not the active server node. What I really needed took a minor amount of research.

Another possible choice included sys.dm_os_cluster_nodes, but this had surprising limitations. While it does provide node names, if you try to query all servers using a CMS, the script can fail to merge the result sets, displaying the following error message. In my case, it also failed to return results for one entire cluster.

An error occurred while executing batch. Error message is: The result set could not be merged because the result schema did not match the schema from the first responding server.

The next option was to use SERVERPROPERTY to find the actual server name hosting SQL. When using a CMS, the script will also return the results of @@SERVERNAME as a bonus.

SELECT [ActiveServer] = SERVERPROPERTY('ComputerNamePhysicalNetBIOS')

While this is good enough for some scenarios, I wanted a more targeted answer, so I kept digging. First, I wanted a script that would only show the active nodes in each cluster. I’ll mostly use this query to ensure I’m avoiding the primary server during maintenance and to verify which server is the active node when developers or management start asking questions and want immediate answers. (You know, as opposed to when they aren’t in a hurry…)

To get the exact results I want will require a bit of string parsing. I suggest you review SUBSTRING and CHARINDEX if you aren’t positive on how they work. In short, Substring will display the server name up to where Charindex determines the ‘\’ appears. This will provide a column consisting of only the cluster name. The next column will display the active node. I then filter out all servers other than clusters and exclude default instance names as well.

SELECT
[ClusterName] = SUBSTRING(@@SERVERNAME,0,CHARINDEX('\',@@SERVERNAME))
,[ActiveNode] = SERVERPROPERTY('ComputerNamePhysicalNetBIOS')
WHERE SERVERPROPERTY('ComputerNamePhysicalNetBIOS') <> SUBSTRING(@@SERVERNAME,0,CHARINDEX('\',@@SERVERNAME))
AND @@SERVERNAME <> SERVERPROPERTY('ComputerNamePhysicalNetBIOS')

This final query returns every node in each cluster, with the last column denoting the active node. Useful when you need to remember which servers belong to what and your documentation has gone missing – or doesn’t exist yet.

SELECT
[ClusterName] = SUBSTRING(@@SERVERNAME,0,CHARINDEX('\',@@SERVERNAME))
,[Nodes] = NodeName
,[IsActiveNode] = CASE WHEN NodeName = SERVERPROPERTY('ComputerNamePhysicalNetBIOS') THEN '1' ELSE '' END
FROM sys.dm_os_cluster_nodes
WHERE SERVERPROPERTY('ComputerNamePhysicalNetBIOS') <> SUBSTRING(@@SERVERNAME,0,CHARINDEX('\',@@SERVERNAME))
AND @@SERVERNAME <> SERVERPROPERTY('ComputerNamePhysicalNetBIOS')

Summary I shared these scripts with my colleagues, who were very excited about not having to play “Guess the Active Node” anymore. There might be more eloquent ways to determine cluster nodes, but this method is perfect for my environment. I suggest saving the query to a code repository project in Solution Explorer or as a Snippet so that it’s always easily accessible.

I’m participating in the many SQL Blog challenges this month, and I’d like to thank each of them for the push and community support they provide.