Failover Cluster Manager Connection Error Fix

A few days ago I encountered a new error with Failover Cluster Manager.  A couple of servers had been rebuilt to upgrade them from Windows Server 2008 to 2012. They were added back to the cluster successfully. However, one of the servers would not open Failover Cluster Manager properly, and tracking down the solution took a long time.

The problem server successfully joined the cluster, but now it would not connect to the cluster using Failover Cluster Manager. If you opened up the application, it didn’t try to automatically connect, and manually connecting with the fully qualified name failed too. Below is the generated error.

failoverclustermanager_wmierror

I love how this error has absolutely no useful information to it. Luckily I was able to track Error 0x80010002 down online.

Research indicated that there was some sort of WMI error on the computer. Rebooting didn’t help anything, and after numerous attempts to correct/rebuild the WMI repository, not much was accomplished. Eventually, the server could connect to the cluster, but that only worked about 30% of the time, and it nearly timed out even when it did succeed! The cluster still never connected automatically.

After further poking around on the internet, I found a few suggested solutions, with my ultimate fix closely following this post. I still had to combine everything together and run scripts all over the cluster before things returned to normal.

First of all, this is a condensed version of the Cluster Query from the TechNet post linked above.

1) Cluster Query


$Nodes = Get-ClusterNode
ForEach ($Node in $Nodes)
{
 If($Node.State -eq "Down")
  { Write-Host "$Node : Node down skipping" }
 Else
 {
  Try
  {
   $Result = (Get-WmiObject -Class "MSCluster_CLUSTER" -NameSpace "root\MSCluster" -Authentication PacketPrivacy -ComputerName $Node -ErrorAction Stop).__SERVER
   Write-Host -ForegroundColor Green "$Node : WMI query succeeded"
  }
  Catch
  {
   Write-Host -ForegroundColor Red "$Node : WMI Query failed" -NoNewline
   Write-Host  " //"$_.Exception.Message
  }
 }
}

Any server that throws an error with the above query needs to have the following scripts ran on it:

2) MOF Parser
This will parse data for the cluster file. 

cd c:\windows\system32\wbem
mofcomp.exe cluswmi.mof

FCM was still not working correctly, so I reset WMI with the following command.

3) Reset WMI Repository


Winmgmt /resetrepository

That will restart the WMI service, so you’ll probably have to try running it multiple times until all the dependent services are stopped. The command shouldn’t take more than a few seconds to process either way though.

After that, the server that failed the Cluster Query (1) was reporting good connections, but FCM still wouldn’t open properly!

I decided to try the two WMI commands (2 & 3) again on the original server that couldn’t connect to FCM. I had already ran those commands there during the initial troubleshooting, so I was starting to think this was a dead end. Still, it couldn’t hurt, so I gave it a shot.

I reopened FCM and voila! Now the cluster was automatically connecting and looking normal.

As a further note, after everything appeared to be working correctly, SQL was having trouble validating connections to each node in the cluster during install, and I had to run commands 2 & 3 on yet another node in the cluster before things worked 100%, even though that node never had a connection error using the Cluster Query (1).

Advertisements

SQL Server Storage: Reading Block Size

This is the final post in the SQL Server Storage line of blog posts I’ve made. First we discussed Pages & Extents, then how to Read A Page using T-SQL, and finally this post will be about Disk Partition Offsets and Allocation.

Knowing that Pages are 8 KB and Extents are 64 KB, it’s understandable that SQL would have the best performance when the disks are aligned in the same manner. However, in earlier versions of Windows, this was never the case. Instead, Windows used an alignment configuration of 63 512 Byte sectors for a total of 31.5 KB. Since the way SQL Server reads and writes was not aligned with Windows, havoc ensued in the form of increased I/O operations. As of Windows 2008, partitions are aligned at 1024 KB by default, providing fewer I/O operations since 1024 is a multiple of 64.

I tried to keep the above explanation simple, but it still got wordy. The takeaway should be: you want SQL Server disks to have a Starting Offset that is a multiple of 64 KB, with the desirable default being 1024 KB. Higher values are fine for special circumstances, just so long as your Starting Offset is evenly divisible by 64.
As for your block size, that should also be 64 KB, or the size of a single Extent, to improve performance.

There is a single script to find both Block Size and Starting Offset, and it almost even works!

Get-WmiObject win32_DiskPartition |
	SELECT SystemName, Name, BlockSize, StartingOffSet |
	FT -Auto

That script returns the correct Starting Offset, but that Block Size is a screwy number that should not be trusted. To make matters worse, you get a Partition Number, but no Drive Letter. That doesn’t help me understand if my data disk is formatted correctly. In comes another query.

Get-WmiObject -Class win32_Volume |
    FT DriveLetter, Label, BlockSize -Auto

Great, now we have an accurate BlockSize and a Drive Letter! The only problem is, now there is no Partition Number to match up with the OffSet from the first query. Getting ALL the information at one time is a pain. Luckily there are Custom Objects to help us smash results together and Win32_LogicalDisk to help us relate the drives and partitions.

$Report = @()
$Disks = Get-WMIObject Win32_logicaldisk | WHERE { $_.DriveType -eq '3' }
$Result = ForEach ( $Disk in $Disks)
{
    $OffSet = Get-WmiObject -Query "Associators of {Win32_LogicalDisk.DeviceID='$($Disk.DeviceID)'} WHERE ResultRole=Antecedent"
    $Report = [PSCustomObject] @{
		ComputerName = $Disk.SystemName
		DriveLetter = $Disk.DeviceID
		Partition = $OffSet.Name
		BlockSize = ( Get-WmiObject Win32_Volume | WHERE {$_.DriveLetter -eq "$($Disk.DeviceID"} | SELECT -expand BlockSize )
		StartingOffSet = $OffSet.StartingOffSet
    }
    $Report
}
$Result | FT -Auto

Now you can determine the Block Size and Starting Offset of your drives easily. Changing those values, well that’s another story. Normally that will require formatting the drive, so its best to get it right in the first place. Make sure all your SQL Server drives are formatted correctly before you get started. Just educate your friendly Storage Admin on what SQL Server needs so everything is correct by the time you get it.

SQL Server Storage: Reading Pages with T-SQL

This is a continuation post from last weeks’ SQL Server Storage: Pages and Extents explanation. Since the description was a bit longer than I originally expected it to be, I decided to split the scripts into more posts.

First, I’m going to quickly cover how you can see information about a specific page using T-SQL. Now, normally this won’t be of much use to you, but it’s fun to play around with a bit just to see how things are actually stored. You might have need to read a page during some heavy troubleshooting at some point in the future too.  This procedure is undocumented though, so information is scarce and the feature could disappear without notice in future versions.

To read a page, you’ll need to utilize DBCC PAGE which I’ve listed the basic layout for below.


DBCC PAGE
(
  'DbName' OR DbId -- Database name or Database ID, either one!
  ,FileNumber -- File Number of the Page
  ,PageNumber -- Page Number in the File
  ,PrintOption -- Display option ranging from 0-3 with different info from each
);

Now we need to know what FileNumber and PageNumber to supply to DBCC PAGE though. Random numbers might work, but if you are actually trying to do anything halfway useful, they won’t get you far. To solve this problem, we have to utilize another procedure…DBCC IND


DBCC IND
(
  'DbName' OR DbId -- Database name or Database ID, either one!
  ,TableName -- Table Name...I don't think this really needs a comment
  ,IndexId -- index_id from sys.indexes; -1 = indexes and IAMs, -2 = IAMs
);

Using DBCC IND we can get some relevant data to pass into DBCC PAGE. The problem is, you still need a relevant Index_ID  for the selected table. The following query can help with that. Just supply the desired TableName in the WHERE clause.


--Get the index_id relating to your desired TableName to pass into DBCC IND
SELECT * FROM sys.indexes
WHERE OBJECT_NAME(object_id) = 'TableName'

SysIndexesResults

A sample result of sys.indexes. Grab the index_id and plug that into DBCC IND

 

With an index_id and a TableName in mind, we can get some results from DBCC IND.

DBCC_Ind_Results

If you are just testing, index_id = 1 is not a bad idea to check.

Awesome, now we have meaningful ids to use with DBCC PAGE. You’ll need to do one more thing before you run it though. Trace flag 3604 has to be set for SQL to provide output. Without it, you won’t get any results at all.

NoTraceFlagSet

No Trace Flag means no results


DBCC TRACEON(3604)
DBCC PAGE('DbName',1,100,3) WITH TABLERESULTS;
GO

With the Trace Flag on, you’ll get more data than you probably know what to do with.

DBCC_Page_Results

DBCC Page provides LOTS of information, I only captured a tiny snippet for your visual delight

Now you know how to get page information about your tables, so long as these undocumented procedures are available. Since this post again got longer than I originally expected, I’ll cover reading and setting block sizes on your disks next week. That will involve some fun PowerShell too!