SQL Server Troubleshooting: Token-based Login Failure

In continuation of the forced mirroring failover procedure I posted last week, this post describes the another level of pain you may encounter.

After forcibly failing a mirroring session to the secondary server, users were unable to connect to the SQL Server. The SQL Error Log was full of the same error message.

Error_TokenBasedAuthentication

Login failed for user

 

Our SQL Server uses Windows Based Authentication so that was a major hint. The solution was actually incredibly easy. Originally I assumed that an account was locked out or perhaps missing from the mirror server – who knew how long ago everything had been correctly synched.

There are two likely solutions to this issue.

UAC is blocking the connection attempts

I find this to be a less likely scenario, and since changing UAC settings require a server restart, I highly suggest testing the next method first. It will probably solve your problem without restart. If the issue is UAC, the server is probably a recent setup, otherwise I feel you would have noticed this.

Recreate the User Account

More likely, the account in question needs to be dropped and recreated in SQL Server. If the user connects though SQL using a group or service account, you’ll need to track down the appropriate account that they are a member of in Active Directory. Ask your AD administrator to look up the groups which the user is a member of if you don’t have access yourself, or intuitively already know the answer.

Once you have the correct account, you’ll need to take note of the existing permissions via screenshot, scripting the login as a create to statement, or just remembering everything. Delete the existing account in the server-wide Security as well as Database-level Security. Deleting an account in server Security does not cascade to the Database level. In fact, you’ll be warned of this by SQL Server when you attempt the delete.

Now you’ll just need to recreate the account providing the correct permissions and any relevant mappings. The token-based server access validation errors should be solved now.

Advertisements

SQL Server Forced Mirror Failover

Failing a SQL Server Mirror over is normally an incredibly simple process. If you use the GUI, you can accomplish the failover in moments. However, it is not always a planned event. Yesterday we had just such an issue.

If your principal server goes down and you do not have a witness setup for automatic failovers, you will need to perform a forced manual failover on the mirror server. There is no way to do this via the GUI, scripts only. Opening the database properties on a mirror results in a greyed out pane of information and zero options.

MirrorProperties.png

Not very helpful when this is all you have to work with.

Forced Failover Method 1 (PREFERRED)

In a perfect world, you will be able to connect to the mirror database and quickly run the following script, substituting in your database name. This will force the mirror to become the principal, and voila, you are up and running again. If you use an alias in Active Directory, just make sure the alias is changed to point to the new server.

ALTER DATABASE dbname
SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS

However, in the less perfect world we live in, you may receive the following error:

Cannot alter database because database is not in the correct state to become the principal

This error results for one of two reasons:

a. Full Safety Mode

If the database is not in Full Safety Mode, otherwise known as Synchronous commit, the failover will error. If your mirror database is in a remote location, it’s probably not in Full Safety. Asynchronous mirrors are High Performance for data transfers, but Low Performance when it comes to forced failovers. Try running the following to force the database into Full Safety mode, which will then allow the failover to occur.

ALTER DATABASE dbname
SET PARTNER SAFETY FULL

However, this script will fail too, if the principal database is still available to SQL. It might still fail regardless. It did for me, so I had to resort to Method 2 below.

b. Principal Database can still be reached by Mirror

If your principal database is still accessible by the SQL mirror, but having too many connectivity issues for you or users to actually reach it, you may have to shut down the principal in order to run commands from the mirror database. This is a bit of a strange scenario, but if your principal cannot be connected to, you’ll need to failover somehow. In this situation, someone will have to make the call of what to do.

Note: If you failover using the forced manual failover script, you will be able to manually resume mirroring after the principal server becomes available again. All your mirroring settings will be saved for the database, making failback relatively painless.

Break Mirror Method 2 (LAST RESORT)

If Method 1 of forcing the failover does not work, even after trying to set the database to Full Safety Mode, you can run the following script and break the mirroring session and restore the mirror with recovery. This is the quick and dirty method.

ALTER DATABASE dbname
SET PARTNER OFF
GO
RESTORE DATABASE dbname
WITH RECOVERY
GO

If you destroy the mirroring session in this manner, you will not be able to resume mirroring. All the settings will be erased and you will have to recreate mirroring from scratch. You’ll have a usable database though, and if the principal died suddenly, you shouldn’t actually have much, if any, data loss.

RAID Storage and SQL Server

If you are a SQL Server Administrator, eventually you are going to need to request specific storage for your servers. Depending on the setup at your current company, that is all handled in the background by your Storage Administrators, but if you have the power or are asked for your opinion, knowing about RAID (Redundant Array of Independent Disks) technology is important. You can find full technical explanations on the web for this, but I’ll cover the basics from a SQL Server perspective.

RAID Overview

RAID uses multiple hard drives to improve availability and/or performance. RAID can overcome I/O bottlenecks that would result from using a single disk, provide resiliency from data loss through mirroring, and remove a single point of failure from a single drive being used.

To understand RAID, there are three terms we need to define first.

Mirroring is an idea you should understand intuitively, but perhaps not the exact details in relation to RAID. Disk Mirroring replicates data from one disk to another, providing a second copy for disaster recovery, and thus requiring at least two disks. RAID mirroring is performed synchronously and mirrored data can be read from either disk. More on Mirroring.

Striping means that the data is being separated onto multiple drives in a consecutive fashion. By separating the data onto different drives, the I/O load can be balanced across the disks and read times are faster. The more disks that the data is striped across, the faster the data throughput will be; however, if one device fails, since the data is spread evenly across all the disks involved in the striping, all the data will be corrupted and unable to be read. More on Striping.

Parity is probably the hardest term to understand or explain. In the most basic, mathematical sense, parity refers to whether an integer is even or odd. For computing, the definition is specifically whether the total value of 1’s occurring in a given binary number is even or odd. For RAID, parity data bits are a combination of existing data bits to provide redundancy. In the case of any single drive failure, the remaining data can be combined with the parity data to reconstruct the missing data. More on parity.

RAID Levels

Although there are more levels of RAID, for instance, 2, 3, 4…etc., they are rarely used, especially for SQL Server. I’ll just be explaining the four main types here.

 

Raid0RAID 0 (Striping) This basic form of RAID stripes data across multiple disks. Reads and writes occur simultaneously across all disks involved and thus provides faster reads and writes compared to a single disk. The more disks involved, the faster the performance. This creates multiple points of failure though, and is not really recommended for database use due to the increased vulnerability.

Pros: Improved Performance of Read and Writes
Cons: No Redundancy and any drive failure corrupts all data.

Raid1RAID 1 (Mirroring) This RAID level mirrors, or duplicates, data between a minimum of two disks. Mirroring requires 50% more storage since the mirror is an exact copy of the original data. Read speeds are faster since any disk can respond to a read request. Write speeds are reduced due to copying the data to multiple locations. Read times can be as fast as the fastest drive, while write times are often as slow as the slowest drive. If you need a relatively cheap method to protect your data, this is a good option. If one drive fails, you still have a perfect copy of the data on the other.

Pros: Redundancy, Faster Reads
Cons: 50% Extra Storage, Slower Writes

Raid5RAID 5 (Striping & Parity) This is the likely the most common type of RAID used, but requires at least three disks. Data and parity calculations are striped across all the disks. Since data is not mirrored, less storage is ‘wasted’, resulting in only a minimum of 1/3rd (1 / total # of drives) of the storage space used for redundancy. In the event of a drive failure, the data can be reconstructed using the parity data, but at a cost. There is a significant impact when one disk fails due to the parity data reconstruction overhead. Losing the 2nd drive in a three disk RAID 5 configuration will result in the entire array going offline and data being lost. Always replace after the first failure as soon as possible! Since write speeds are slower with RAID 5, it is not the best choice for Transaction Logs or Data Files. Backup drives are a prime candidate for this RAID level though since write speed is not as important.

RAID 6 is growing in popularity; it’s identical to RAID 5, except it adds an extra disk with another set of parity data. This RAID level requires a minimum of four disks and can handle up to two drive failures.

Pros: Fault Tolerance, Fast Reads (until a drive fails)
Cons: Slower Write Speeds, More Expensive than RAID 1 or 0

RAID 0+1 (Mirroring & Striping) Data is striped and subsequently mirrored in this RAID level. This incurs the 50% storage redundancy loss. There is fault tolerance for a single drive failure, but this reduces the RAID to essentially RAID 0 with no redundancy.

Raid10RAID 10 (Striping & Mirroring) Data is mirrored first and then striped with this method. As long as one drive from each side of the mirror remains, no outage will occur. With more fault tolerance, this is the preferred high-end method over RAID 0+1.

Pros: Fault Tolerance, Best Read/Write Speeds
Cons: Lots of Disks and Lots of Money

Both RAID 0+1 and 10 are hybrid RAID levels and provide the best read and write performance, but double the required storage. This is what everyone wants but not everyone can afford. It combines the best of the previous configurations but requires many disks and therefore a big budget. These RAID levels are best suited for high traffic, like your Data and Log files. If you cannot afford RAID 10, RAID 5 is a decent alternative though.

Database Mirror Monitoring with Scripts

The database mirror monitor is a nice tool, but registering all your databases and clicking through each one can be time consuming when you have dozens. I want quick, easy to read results when an error comes in. If things look bad, I’ll open the monitor for a specific database to start an in depth look.

And yes, I do have reasonable alerts set up, but scripts are for proactive searching, dealing with new servers, and, most importantly, a jumping off point when an alert is generated. My normal process is:

  1. Read alert email
  2. Open multi-server query and run all the mirroring scripts below
  3. Investigate specific databases that are still reporting errors from the scripts

Sometimes I get to stop at step 2 because the issue has already been resolved due to a network glitch or someone else responding faster than me. I keep all the scripts saved in a single file which is always open and I run periodically.

The scripts are not as refined as I’d like, so please tweak them. They do the job when errors are coming in though. The only time I can think of any changes are when they are showing a slew of horrific errors, and at that time I just don’t have time to think about changing a script to make it prettier.

Find un-Synchronized Mirrors
Running this as a multi-server query will provide a list of all mirrored databases not currently synchronized. This is my most used method for verifying that a mirror is still in trouble.

SELECT *
FROM sys.database_mirroring
WHERE mirroring_state_desc IS NOT NULL
    AND mirroring_state_desc <> 'SYNCHRONIZED'

Find Databases that are not Online or are Restoring
This script is a bit more specialized. I rarely get results (which is good), but it has indicated anomalies before. This will return any databases that are offline or in a restoring state, but not part of a mirror. Sometimes this indicates a test database or one left in the process of being mirrored, but at least in my environment, I never want to let a database be left in that mode for very long.

The worst scenario I’ve had with this script was a time that it found a database that had mirroring removed, so the partner had just been left in restoring state. Somehow that had been overlooked until I found it with this.

SELECT
   d.name
  ,d.user_access_desc
  ,d.state_desc
  ,m.mirroring_state_desc
  ,m.mirroring_role_desc
FROM sys.databases d
   LEFT JOIN sys.database_mirroring m ON m.database_id = d.database_id
WHERE (d.state_desc <> 'ONLINE'
OR d.state_desc IS NULL)
   AND (m.mirroring_role_desc IS NULL
OR m.mirroring_role_desc <> 'MIRROR')

DB Mirror Monitor Script
This mimics the database mirroring monitor. Main benefits here are you do not have to preregister every single database to the monitor and then wait for each one to refresh and click through each database. Just run the script, and apply any filters that you want. When run as a multi-server query it will automatically increment through each mirrored database.

If you want to see your entire mirroring monitor in one moment, this is for you.

The reason for this script is because sometimes I get mirrors that are still considered synchronized, but are not pushing data fast enough. They throw strange alerts, if any, when this happens. Resolving this normally involves finding out why the network is moving slowly. Ultimately there is little I can do while I wait for a fix, but it’s good to know what is going on in the meantime.

DECLARE @Mirror TABLE
 (database_name SYSNAME,[role] INT,mirroring_state INT
,witness_status INT,log_generation_rate INT,unsent_log INT
,send_rate INT,unrestored_log INT,recovery_rate INT
,transaction_delay INT,transactions_per_sec INT,average_delay INT
,time_recorded DATETIME,time_behind DATETIME,local_time DATETIME)

-- @id will increment and pull each mirrored database
DECLARE  @id        INT
 ,@db        VARCHAR(256)
 ,@command    VARCHAR(2000)
SELECT @id = MIN(database_id) FROM sys.database_mirroring
WHERE mirroring_guid IS NOT NULL

-- Loop through each database to pull mirror monitor information
WHILE @id IS NOT NULL
BEGIN
  SELECT @db = d.name FROM sys.databases d WHERE d.database_id = @id
  SELECT @command = 'EXEC msdb.dbo.sp_dbmmonitorresults @database_name='+QUOTENAME(@db)+''
  PRINT @command
  INSERT INTO @Mirror
    EXEC (@command)
--Increment @id
  SELECT @id = MIN(database_id) FROM sys.database_mirroring
  WHERE mirroring_guid IS NOT NULL AND database_id > @id
END
-- Your WHERE clause/values will vary
SELECT * FROM @Mirror
WHERE unsent_log > 10

Database Owner Verification

There’s often low hanging fruit once you start looking into servers, or you just need to verify settings are correct.  Recently I had to verify that all databases had the correct owner for auditing purposes. There are many scripts out there to do this, but I wanted to make a few exceptions, and I didn’t need a lot of extra fluff for this specific task either. I also had a plan to convert this to a policy afterwards to ensure settings were not changed later.

I used two different scripts, one for 2012 servers that might be utilizing Availability Groups and one for 2008 servers that might be using Mirroring. I wanted to allow exceptions for databases that were mirrors or secondaries, because without failing them over, I could not update the owner. These will be fixed on the next scheduled failover, but until then, I don’t want the exceptions mucking up my results.

The WHERE clause excludes my allowed owner while also verifying that I don’t have any databases where the ownership has been broken and listed as NULL.  Server roles for AGs and Mirrors are included in the result set so that when I want everything, I can exclude the WHERE clause and quickly see what’s what.

SQL 2012 (Availability Groups)

SELECT
d.name
,[Owner] = suser_sname(owner_sid)
,s.role_desc
FROM sys.databases d
,msdb.sys.dm_hadr_availability_replica_states s
WHERE (suser_name(owner_sid) <> 'SA' OR SUSER_NAME(owner_sid) IS NULL)
AND s.role_desc = 'Primary'

SQL 2008 (Mirrors)

Same as the above query, but while that query above will error when you try to run it on an older version of SQL than 2012 because of the HADR view, this one works while determining if any of your servers have the wrong database owner. Just exclude the AND statement to show the mirrors.


SELECT
d.name
,[Owner] = suser_sname(owner_sid)
,m.mirroring_role_desc
,d.state_desc
FROM sys.databases d
LEFT JOIN sys.database_mirroring m ON m.database_id = d.database_id
WHERE
(suser_name(owner_sid) <> 'SA' OR SUSER_NAME(owner_sid) IS NULL)
AND (m.mirroring_role_desc <> 'mirror' OR m.mirroring_role_desc IS NULL)

Afterwards, I used Policy Based Management to create a policy checking for DB ownership.  I chose the simple way of creating a Database Facet checking that @Owner = 'SA'. (Rename your account, but for demonstration purposes this is the most obvious). The policy will check every database, but automatically skip mirrors/secondaries because it cannot read those databases. The script remains useful for checking the databases that the policy cannot read.

Mirroring with TDE

Recently I’ve had to set up a lot of database mirroring, and about half of those databases were encrypted. Setting up mirroring on unencrypted databases is normally a quick process, but encrypted mirroring requires a bit more work. While there are many examples of mirroring out there, it can be hard to find an example of encrypted mirroring that is not missing a step or two. You cannot use Object Explorer to do everything, so below are the necessary scripts to not only set up encrypted mirroring, but also how to setup Transparent Data Encryption.

1. Create and Backup Master Key and Certificate TDE requires a master key and a certificate, so those need to be created first and then immediately backed up. Backing these up is best practice, but the files will also need to be restored to the secondary server. If you already have TDE on the primary server, you can skip this step and go straight to Step 3.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/********** On Principal **********/

--Create Master Key
 USE MASTER
 GO
 CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'StongP@ssword'
 GO


--Backup Master Key
 BACKUP MASTER KEY TO FILE = 'FilePath\MasterKey.KEY'
 ENCRYPTION BY PASSWORD = 'StrongP@ssword'
 GO

--Create Certificate
 CREATE CERTIFICATE CertName
 WITH SUBJECT = 'CertDesc', EXPIRY_DATE = '1/1/2900'
 GO

--Backup Cert; using a different key name than the original to avoid errors
 BACKUP CERTIFICATE CertName
 TO FILE = 'FilePath\CertName.CER'
 WITH PRIVATE KEY (FILE='FilePath\CertKey.KEY',
 ENCRYPTION BY PASSWORD = 'StrongP@ssword')
 GO
 CLOSE MASTER KEY
 GO


2. Enable Transparent Data Encryption
 The next step is to enable TDE using the new certificate, and then turn TDE on. If TDE has been enabled previously, you should have already skipped to Step 3.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
--Enable TDE on database
USE DbName
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_256 ENCRYPTION BY SERVER CERTIFICATE CertName
GO

--Turn TDE On
ALTER DATABASE DbName
SET ENCRYPTION ON
GO


3. Restore Key and Certificate to Secondary
This is your first step if you already have TDE turned on. Just make sure you copy the backup files to the secondary server so you can restore them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
/********** ON MIRROR **********/

--Restore Master key to Secondary server
USE MASTER
GO
RESTORE MASTER KEY
FROM FILE = 'FilePath\MasterKey.KEY'
DECRYPTION BY PASSWORD = 'StrongP@ssword'
ENCRYPTION BY PASSWORD = 'StrongP@ssword'
GO

--Restore the Certificate to Secondary server
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'StrongP@ssword'
GO
CREATE CERTIFICATE CertName
FROM FILE = 'FilePath\CertName.CER'
WITH PRIVATE KEY ( FILE = 'FilePath\CertKey.key',
DECRYPTION BY PASSWORD = 'StrongP@ssword')
GO
CLOSE MASTER KEY
GO


4. Create Endpoints on Principal and Secondary
You can do this at any time that makes sense to you, but it makes sense to me to ensure all the settings are up to date before moving onto the mirroring setup. Encrypted mirroring requires an encrypted endpoint (seems logical enough) so notice the specified encryption algorithm in the script below. Many other mirroring examples leave this critical piece out.

1
2
3
4
5
6
7
8
/********** ON MIRROR & Principal **********/

--Create Mirroring Endpoint on Principal & Secondary
CREATE ENDPOINT Name
STATE = STARTED
AS TCP (LISTENER_PORT=5022)
FOR DATABASE_MIRRORING (ROLE=PARTNER,ENCRYPTION=SUPPORTED ALGORITHM AES)
GO


5. Backup Database on Principal
You will need to perform a full backup along with a Log backup on the database you plan to mirror. There is nothing special about the backup, just make sure that you have the latest log backups and that you finish before the next scheduled backup starts, otherwise you will have to restore another log. I hate having to do that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
/********** On Principal **********/

--Backup Primary Database - Full
BACKUP DATABASE [DbName]
TO DISK = 'FilePath\DbNameFULL.BAK'
WITH NOFORMAT,INIT,COMPRESSION,SKIP,NOREWIND,NOUNLOAD,STATS=10,
NAME = 'DbFullBackup'
GO

--Backup Primary Database - Log
BACKUP LOG [DbName]
TO DISK = 'FilePath\DbNameLOG.TRN'
WITH NOFORMAT,INIT,COMPRESSION,SKIP,NOREWIND,NOUNLOAD,STATS=10,
NAME = 'DbLogBackup'
GO


6. Restore Backup to Secondary
Restoring the backups for an encrypted database is only slightly more involved than normal. You will have to Open the Master Key before the restore, otherwise it will fail.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/********** ON MIRROR **********/

--Restore the Full Backup on the Secondary
USE MASTER
GO
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'StrongP@ssword' --Don't forget this!
RESTORE DATABASE [DbName]
FROM DISK = 'FilePath\DbNameFULL.BAK'
WITH NORECOVERY,NOUNLOAD,STATS=10
GO

--Restore the Log Backup on the Secondary
RESTORE LOG [DbName]
FROM DISK = 'FileName\DbNameLOG.TRN'
WITH NORECOVERY,NOUNLOAD,STATS=10
GO
CLOSE MASTER KEY --Remember to close the master key once complete
GO


7. Enable Mirroring on Secondary
 Once the database is in place, you may want to verify that the secondary database is in a Restoring state. This is required to mirror successfully. The scripts I’ve provided will restore correctly, but restoring the database improperly is one of the most common and embarrassing issues you’ll encounter while trying to turn on mirroring. Set the partner by using the fully qualified server name or by the IP address. Theoretically either will work, but it depends on your network.

The most important thing in the script below is to include the line to Add Encryption by Service. This can be done earlier and in a separate script, but I find it easiest to include while turning on mirroring. If you leave this step out, you’ll receive a misleading error after you run the script in Step 8 that will leave you depressed and frustrated because it won’t tell you anything useful. Something like:

Msg 1416, Level 16, State 31
Database "DbName" is not configured for database mirroring

So make sure you run the scripts with all the necessary lines!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/********** ON MIRROR **********/

--Enable Encrypted Mirroring on Secondary
USE MASTER
GO
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'StrongP@ssword'
ALTER MASTER KEY ADD ENCRYPTION BY SERVICE MASTER KEY  --Most Important step!!
ALTER DATABASE DbName
SET PARTNER = 'TCP://PrincipalServer:5022'
GO
CLOSE MASTER KEY
GO


8. Enable Mirroring on Principal
This is the moment of truth. Did you follow all the steps correctly? If so, you should be fine. If you receive a database mirroring is not configured error, refer back to Step 7 and verify that encryption was added when you enabled mirroring. There is also a chance you will need to do another log restore to bring the log chain up to date. That’s purely based on how often you do log backups and how big the database is though.

1
2
3
4
5
6
7
8
/********** On Principal **********/

--Enable Encrypted Mirroring on Principal
USE MASTER
GO
ALTER DATABASE DbName
SET PARTNER = 'TCP://SecondaryServer:5022'
GO


Summary
Setting up Encrypted Mirroring is not overly difficult, but there are some quirks to it, and you can easily forget an integral step. I always refer to saved scripts to keep track of where I am in the process, and to make the overall setup faster.