Remotely Enable Always On

Always On Availability Groups is the new feature for High Availability in SQL Server 2012. It’s been out for awhile now, but unless you have Enterprise Edition SQL, you might not have been able to use it much.

Of course you need a cluster to utilize Always On, but once that is complete, you also have to enable Always On in Configuration Manager on all your servers that will be participating in the AG as well.

Continuing on with my lazy, automated DBA goals of logging into computers as rarely as possible, I developed the below PowerShell script to connect to SQL Servers, enable Always On, and then restart the SQL Service in order for the changes to take effect.

The only thing you need to change below is the computer names, it should automatically detect your SQL instance names. If that doesn’t work (I haven’t been able to test every possible name parsing possibility), you can supply the instance names yourself.


## List Servers in AG Here ##
$Computers = 'Computer1','Computer2'
## Everything Else is Automated ##

# Finds the servers running the services, and the services' names
Invoke-Command -ComputerName $Computers -Scriptblock {
$Services = (Get-Service -Include MsSql* | Where { $_.Status -eq 'Running' } )
$Nodes = @()

# Parses the names of the SQL Instances
ForEach( $Service in $Services )
{ $Nodes += $Env:ComputerName'\'+$Service.DisplayName.Split('(')[1].Replace(')','') }

# Loops through each instance and enables AlwaysOn, restarting with -Force
ForEach ( $Node in $Nodes )
{ Enable-SQLAlwaysOn -ServerInstance $Node -Force }

# Starts SQL Service on the affected server(s) if it's still stopped
If($Services -ne $NULL)
{ Start-Service -DisplayName ($Services.DisplayName) }
}

The key code here for enabling Always On is this snippet below.

Enable-SQLAlwaysOn -ServerInstance $Node -Force

If the longer script cannot automatically detect your ServerInstance, you can provide it manually and run the command. Restart your SQL Service for the change to take affect.

SQL Server Changing Passwords and an SSPI Context Error

The other day I encountered a login error when connecting to a SQL Server. The circumstance seemed strange compared to similar errors described online with many of those seeming rather complicated to find the real solution. Since this server had been around for awhile, it was unlikely that some major Active Directory change would be necessary to resolve the issue.

This SQL Server was part of an Availability Group, and the connection worked fine when connecting using the Server/Instance name, however, when attempting to connect via the Listener, the following error occurred.

Cannot connect to Server/Instance.
The target principal name is incorrect.
Cannot generate SSPI context (Microsoft SQL Server)

Articles online indicated this was an SPN, Kerberos, and/or Active Directory issue, and something needed to be reset, but the only way to know for sure was to continue down a long troubleshooting list. Luckily, the problem was simpler than that, but still very strange.

I had reset the service account passwords the afternoon before this error became apparent. Each service was restarted afterwards to verify the change worked properly and SQL had been successfully connected to. Everything seemed fine from my perspective.

The next day, some users attempted to connect using the Listener and that’s when the errors started. I don’t normally connect via the Listener, so I hadn’t thought to check that, didn’t think it would be necessary.

Troubleshooting the easy solutions first seemed like a good idea, so I decided to try restarting the SQL service, which failed everything to another server in the cluster immediately. The services came online, and now both the instance and Listener could be connected to. OK, well probably sort of solved.

I failed it to a third node in the cluster, everything still worked great. Cool. This was looking even better.

Next I failed it back to the original node.  This time, the SQL Service came online, but not the Listener. Strange, how did it work in the first place? Everything was running on that server before I restarted the service, even if it wasn’t running correctly. I reset the passwords in SQL Configuration Manager, and then restarted the services. Everything worked perfectly now.

In summary, somehow all the services restarted on the server after the password change, but the Listener had a bad password and was not allowing connections. When I attempted to restart the Listener again, it failed until the password was corrected. I still don’t know how this happened, but it’s a good reminder to be especially careful when changing service passwords.  Changing passwords on a cluster can be even more dangerous since you have extra services to update that may not even be running on the server at the time, so verifying everything went smoothly can take a few extra steps.

Tales of when a Log Fails to Shrink in an Availability Group

I received a report that one of my servers had 7% free space on its log drive. Sounded like something fun to resolve. I checked on what was going on and found a log file that was 99% free and a hundred gb in size. While shrinking a log file is not a good practice and I’m not advocating it by any means because it’s just going to grow again and your storage is there specifically to hold logs, this situation was a out of the ordinary and we needed space.

The problem was, this log would not shrink. It was being extremely uncooperative. I took a backup, log backups, multiple shrink attempts, but it wouldn’t budge. The message returned was a big clue though.

<code>The log for database ‘dbname’ cannot be shrunk until all secondaries have moved past the point where the log was added.</code>

As you might have guessed, this server was a SQL Server 2012 instance and in an Always On Availability Group. The database in question could not shrink because it was participating in the AG.

It wasn’t an ideal fix, but by removing the database from the Availability Group, I was able to perform a log shrink to get the size to a more manageable amount. No, I did not truncate it to minimum size, I adjusted it to a reasonable amount based on its normal work. I didn’t want the log to just have to grow again. The shrink worked flawlessly, and with adequate drive space, I attempted to add the database back to the AG via the wizard.

The AG wizard refused to help. The database was encrypted and the AG wizard will not let you add a database if it is encrypted. No explanation why, it just doesn’t like that. You can add an encrypted database to an AG via script though. You can even script the change from the wizard by using a non-encrypted database then changing the database name in the scripted result. The resulting script is exactly what the AG wizard would do, it just cannot execute it automatically.


ALTER AVAILABILITY GROUP AgName
ADD DATABASE DbName;
GO

With free space and an encrypted database safely back in my AG, I was off to new adventures!

Database Owner Verification

There’s often low hanging fruit once you start looking into servers, or you just need to verify settings are correct.  Recently I had to verify that all databases had the correct owner for auditing purposes. There are many scripts out there to do this, but I wanted to make a few exceptions, and I didn’t need a lot of extra fluff for this specific task either. I also had a plan to convert this to a policy afterwards to ensure settings were not changed later.

I used two different scripts, one for 2012 servers that might be utilizing Availability Groups and one for 2008 servers that might be using Mirroring. I wanted to allow exceptions for databases that were mirrors or secondaries, because without failing them over, I could not update the owner. These will be fixed on the next scheduled failover, but until then, I don’t want the exceptions mucking up my results.

The WHERE clause excludes my allowed owner while also verifying that I don’t have any databases where the ownership has been broken and listed as NULL.  Server roles for AGs and Mirrors are included in the result set so that when I want everything, I can exclude the WHERE clause and quickly see what’s what.

SQL 2012 (Availability Groups)

SELECT
d.name
,[Owner] = suser_sname(owner_sid)
,s.role_desc
FROM sys.databases d
,msdb.sys.dm_hadr_availability_replica_states s
WHERE (suser_name(owner_sid) <> 'SA' OR SUSER_NAME(owner_sid) IS NULL)
AND s.role_desc = 'Primary'

SQL 2008 (Mirrors)

Same as the above query, but while that query above will error when you try to run it on an older version of SQL than 2012 because of the HADR view, this one works while determining if any of your servers have the wrong database owner. Just exclude the AND statement to show the mirrors.


SELECT
d.name
,[Owner] = suser_sname(owner_sid)
,m.mirroring_role_desc
,d.state_desc
FROM sys.databases d
LEFT JOIN sys.database_mirroring m ON m.database_id = d.database_id
WHERE
(suser_name(owner_sid) <> 'SA' OR SUSER_NAME(owner_sid) IS NULL)
AND (m.mirroring_role_desc <> 'mirror' OR m.mirroring_role_desc IS NULL)

Afterwards, I used Policy Based Management to create a policy checking for DB ownership.  I chose the simple way of creating a Database Facet checking that @Owner = 'SA'. (Rename your account, but for demonstration purposes this is the most obvious). The policy will check every database, but automatically skip mirrors/secondaries because it cannot read those databases. The script remains useful for checking the databases that the policy cannot read.

Windows Server 2012 Training

This week is exciting because instead of an average work week, I will be attending some free Windows Server 2012 training provided by my employer. I love free training; I wouldn’t care if it was in fletching or how to make home-made soap (actually those sound neat). Something actually related to my career though? That’s awesome training! I’ve never used Server 2012 outside of a lab environment, and even then I’ve only touched it a few times in Virtual Machines since no one seems fast to move to it around here. I’m looking forward to checking it out seriously and uncovering all its mysteries.

I think that I have a slight advantage over some of the people who will be in the training, because I actually have Windows 8 on all my home computers. Obviously a lot of people have been slow to adopt the unfamiliar operating system and Server 2012 follows the same style as Windows 8. I’m one of the many with hopes for Windows 10 improvements (obviously so amazing it required a whole number skipped) but I also have realistic expectations of mediocrity for that.

Although, on second thought about my advantage, I also modified Windows 8 to avoid the Metro/Start screen and always boot to desktop. I then installed Classic Shell to get the start menu back. Maybe I’m not so savvy with the new OS after all. I did try to use Windows 8 without any modifications for a few months, hoping that 8.1 would have enough corrections to keep me happy. I caved after too many minutes spent in frustration trying to track down various programs and administrative settings. The second time I had to resort to the Run command to adjust a setting, I was done.

I know Server 2012 improved clustering considerably, and that’s what I’m hoping to demo the most while in the training class. I’ve been using Availability Clusters with Server 2008 and I know that’s probably causing issues that could be fixed with an upgrade. I can only hope that this ~30 hour class will cover clustering in detail.

On the other hand, I’m most concerned that the title of the training is “Capabilities, Administration and Support.” Learning how to troubleshoot the OS is great, but I don’t want to be stuck learning how to provide support for it to others. Since I don’t really know what I’m getting into today, I’m more than a little afraid too much time will be spent on that.

I plan to take extensive notes on the training so that I can report the highlights and disappointments, which I will then report next week.