Troubleshooting Policys and xp_dirtree

Today we had a policy failure on a newly built SQL server. The policy verifies that every SQL server has a backup file stored on a remote directory. Simple enough, but in the past, this policy has been prone to permissions errors.

In case you would like to implement something like this on your system, the dumbed-down condition of the policy is below:


ExecuteSql('NUMERIC', '
--Get DBs on server
DECLARE @DBCount INT,@File VARCHAR(250)
SELECT  @DBCount = COUNT(*) FROM sys.databases WHERE database_id <> 2
SELECT  @File = ''\\BACKUPSERVER\BACKUP\''+CONVERT(VARCHAR(100),SERVERPROPERTY(''MachineName''))
--Insert DirTree results into a table variable to return a numeric result
DECLARE @Check TABLE (SubDirectory VARCHAR(250), Depth BIT, [File] BIT)
INSERT  @Check
EXEC master.dbo.xp_dirtree @File,1,1
--Numeric result for easy policy evaluation 1=Pass if( # files = # dbs )
IF((SELECT COUNT(*) FROM @Check) = @DBCount)
SELECT 1
ELSE
SELECT 0
')

Policy Explanation

The dumbed-down policy checks a backup server, which has a folder named for each SQL server, to see if it has the correct count of backups.  Each folder should have a backup file for each database, except for TempDB. This isn’t exactly how we do it, but it gets my point across and it provides a good example of how a moderately complex policy works using ExecuteSQL. There aren’t a lot of examples on the web, so I try to provide as many as I can.

The Troubleshooting

The new server wasn’t processing the policy correctly. My first idea was, “Well, there’s no files!” Easy! Problem solved!  Or not…the files were in the folder, and the number was even correct. So it wasn’t the backups or copy failing. That meant that there was something wrong with the permissions, the problem was, what?

I verified the logins were all present in SQL by reviewing Security -> Logins through Management Studio. This was a new server, so I assumed it would be something simple like that. The account in question was the SQL Server service account.  Policy Based Management and xp_dirtree use this account. The account was in SQL and it had all the proper permissions. OK, next thing.

The folder where all the files are stored is a shared folder, so I went into the Properties of the folder and verified the account that was shared. Everything looked right there. Every time I’ve had an issue with the permissions of xp_dirtree reading servers, it’s been because the service account in the shared folder was wrong, so my next stop was checking out Active Directory.

FileSharing

Sharing a folder makes the file path easier. Just make sure your service account has permission here.

Drilling down through Active Directory, I found the account used for sharing the folder. This account had every SQL service account as a member. It looked fine in AD, it looked fine on the Shared folder. Everything worked perfectly except on this new server. What was wrong?!?

At this point, I just decided to throw a hammer at it and throw in extra service accounts until hopefully one worked. Eventually I added the SQL Agent account. Suddenly, the policy passed. The new server could read the backup directory! So I took the normal account out of the permissions. Now everything failed except this new server. Wait, what…?

I logged into the new server and loaded up SQL Configuration Manager…

Oh…

When the server was setup, someone switched the accounts. SQL Server was running using the SQL Agent service account. SQL Agent was running using the SQL Server service account. I restarted each with the correctly corresponding service and account. I then fixed the permissions for the shared backup directory back to the original setup.

Voila! Everything worked perfectly!

The Moral of the Story

Make sure new servers and settings are configured properly before trying to fix things that weren’t broken before the new stuff existed. I’m sad to say it took me so long to understand the problem. Even with other DBAs chiming in. None of us thought about checking the Configuration Manager…that or they assumed I had already checked that.

Sometimes, it’s the little things that get you…OK, normally its the little things that get you.

Enterpise Policy Management Framework – Skip the Report

Policy-Based Management in SQL Server is a wonderful and underused feature. The Enterprise Policy Management Framework available through Codeplex further enhances this with reports and history tables, but if you have ever used the main report, you know how distressingly slow it can be. Not only that, but drilling through the report to find failed policies on instances is an extremely painful process. Visual reports are pretty, and the aggregation is a really cool feature, but expanding nodes are not practical for a large environment.

Test

It depressed me just pulling up the slow report to screen shot this. Expanding nodes is even worse.

So while I use EPMF, I do not check the reports daily. Instead, I script the daily failures and errors to quickly review them. Below are three scripts that can provide policy results, use whichever you prefer.

Script 1: Short and sweet. This provides all failures and errors for the current day, including the unshredded XML evaluation results. This is my go to script each morning. Find failures and investigate.


SELECT
phd.EvaluatedServer
,phd.EvaluationDateTime
,phd.EvaluatedPolicy
,CategoryName
,PolicyResult
,ph.EvaluationResults
FROM EESqlReports.policy.PolicyHistoryDetail phd
INNER JOIN EESqlReports.policy.PolicyHistory ph ON ph.PolicyHistoryID = phd.PolicyHistoryID
WHERE
PolicyResult <> 'PASS'
AND CONVERT(DATE,phd.EvaluationDateTime) = CONVERT(DATE,GETDATE())
ORDER BY PolicyResult DESC,Policy_id,phd.EvaluatedServer;

Script 2: This script is essentially the same as above, but it includes the evaluated object or database along with a cleaner XML result. While still XML, it is easier to read, and the best solution I’ve found to read the evaluation results reliably for any policy. Sometimes the XML will not provide an actual value on a failed result. I’ve yet to discover what causes this. Theories include network issues or unexpected values.


WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/DMF/2007/08'AS DMF)
SELECT
phd.EvaluatedServer
,phd.EvaluationDateTime
,phd.EvaluatedPolicy
,phd.CategoryName
,phd.PolicyResult
,[CheckParam1] = res.Expr.value('Function[1]/Constant[2]/Value[1]','nvarchar(100)')
,[ExpectedValue1] = res.Expr.value('Constant[1]/Value[1]','nvarchar(100)')
,[ActualValue1] = res.Expr.value('Function[1]/ResultValue[1]','nvarchar(100)')
FROM policy.PolicyHistoryDetail phd
INNER JOIN policy.PolicyHistory ph on ph.PolicyHistoryID = phd.PolicyHistoryID
CROSS APPLY ResultDetail.nodes('Operator') AS Res(Expr)
WHERE
PolicyResult <> 'PASS'
AND CONVERT(DATE,phd.EvaluationDateTime) = CONVERT(DATE,GETDATE())
ORDER BY PolicyResult DESC,EvaluatedServer;

Script 3: This script attempts to shred the XML results, showing you the check parameter, expected value, and actual value. This sample script is very short because shredding a policy result gets complicated fast. I’m only shredding the first line of a policy here. Since most policies have more than one parameter and thus values, this script gets excessive fast.  Further, if the policy doesn’t fit the XML format scripted perfectly, you’ll get NULL results. Before you know it, you have to tailor-make a script for almost every policy in your environment. It’s just not effective, and why I suggest just reading the XML from the above script if you really need to view results. As terrible as XML is to read, it’s better than crafting a script for dozens or even hundreds of policies.


<span style="color: #808080; font-family: Consolas; font-size: small;">Consolas; font-size: small;"><span style="color: #808080; font-family: Consolas; font-size: small;">WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/DMF/2007/08' AS DMF)
SELECT
PH.EvaluatedServer
,PH.EvaluationDateTime
,PH.EvaluatedPolicy
,Res.Expr.value('(../DMF:TargetQueryExpression)[1]', 'nvarchar(150)') AS EvaluatedObject
,(CASE  WHEN Res.Expr.value('(../DMF:Result)[1]','nvarchar(150)') = 'FALSE'
AND Expr.value('(../DMF:Exception)[1]','nvarchar(max)') = '' THEN 'FAIL'
WHEN Res.Expr.value('(../DMF:Result)[1]','nvarchar(150)') = 'FALSE'
AND Expr.value('(../DMF:Exception)[1]','nvarchar(max)') <> '' THEN 'ERROR'
ELSE 'PASS'
END) AS PolicyResult
,CAST(Expr.value('(../DMF:ResultDetail)[1]','nvarchar(max)') AS XML) AS ResultDetail
FROM policy.PolicyHistory PH
INNER JOIN policy.PolicyHistoryDetail phd ON ph.PolicyHistoryID = phd.PolicyHistoryID
INNER JOIN msdb.dbo.syspolicy_policies p ON p.name = PH.EvaluatedPolicy
CROSS APPLY EvaluationResults.nodes('
declare default element namespace "http://schemas.microsoft.com/sqlserver/DMF/2007/08";
//TargetQueryExpression') AS Res(Expr)
WHERE phd.PolicyResult = 'FAIL'
AND CONVERT(DATE,phd.EvaluationDateTime) = CONVERT(DATE,GETDATE())
ORDER BY PolicyResult DESC,EvaluatedServer;</span>

 

Database Owner Verification

There’s often low hanging fruit once you start looking into servers, or you just need to verify settings are correct.  Recently I had to verify that all databases had the correct owner for auditing purposes. There are many scripts out there to do this, but I wanted to make a few exceptions, and I didn’t need a lot of extra fluff for this specific task either. I also had a plan to convert this to a policy afterwards to ensure settings were not changed later.

I used two different scripts, one for 2012 servers that might be utilizing Availability Groups and one for 2008 servers that might be using Mirroring. I wanted to allow exceptions for databases that were mirrors or secondaries, because without failing them over, I could not update the owner. These will be fixed on the next scheduled failover, but until then, I don’t want the exceptions mucking up my results.

The WHERE clause excludes my allowed owner while also verifying that I don’t have any databases where the ownership has been broken and listed as NULL.  Server roles for AGs and Mirrors are included in the result set so that when I want everything, I can exclude the WHERE clause and quickly see what’s what.

SQL 2012 (Availability Groups)

SELECT
d.name
,[Owner] = suser_sname(owner_sid)
,s.role_desc
FROM sys.databases d
,msdb.sys.dm_hadr_availability_replica_states s
WHERE (suser_name(owner_sid) <> 'SA' OR SUSER_NAME(owner_sid) IS NULL)
AND s.role_desc = 'Primary'

SQL 2008 (Mirrors)

Same as the above query, but while that query above will error when you try to run it on an older version of SQL than 2012 because of the HADR view, this one works while determining if any of your servers have the wrong database owner. Just exclude the AND statement to show the mirrors.


SELECT
d.name
,[Owner] = suser_sname(owner_sid)
,m.mirroring_role_desc
,d.state_desc
FROM sys.databases d
LEFT JOIN sys.database_mirroring m ON m.database_id = d.database_id
WHERE
(suser_name(owner_sid) <> 'SA' OR SUSER_NAME(owner_sid) IS NULL)
AND (m.mirroring_role_desc <> 'mirror' OR m.mirroring_role_desc IS NULL)

Afterwards, I used Policy Based Management to create a policy checking for DB ownership.  I chose the simple way of creating a Database Facet checking that @Owner = 'SA'. (Rename your account, but for demonstration purposes this is the most obvious). The policy will check every database, but automatically skip mirrors/secondaries because it cannot read those databases. The script remains useful for checking the databases that the policy cannot read.

Renaming Policies, Conditions, and Categories

I have a habit of creating a new policy with only a vague idea of what I want to accomplish with it, and by the time I’m finished, the policy may have taken on a new tone. Other times, I make a test policy that turns out to be a great idea that I want to keep around. The problem with this? I start by naming the policy something terrible like TestPolicyDoThisThing. After making a few grouped expressions and half a dozen lines or more in the condition, I really don’t want to recreate it by using the GUI or script. I just want to rename the policy to something better, like RealPolicyDescriptiveName.

Renaming Policies is not easy or intuitive.

Right click on the policy, Rename…or not.

Rename Policy Missing

Where is Rename?

OK, click on the policy, then slowly click again to edit the name…

Why is there no interface for renaming a policy? I can click on a table to rename it, but not a policy? Explain to me how it is less dangerous to rename a table accidentally (by misclicking and typing something without paying attention) than to rename a policy. It isn’t. In fact, it’s much worse to randomly rename your tables. I hear that might break some applications.

So I poked around a bit. I really had no desire to recreate policies from scratch, and it absolutely should not be necessary. Luckily, there is a tsql command to rename a policy, and it’s fairly simple to use. SP_SysPolicy_Rename_Policy can rename a policy, just supply the @name parameter and the @new_name parameter.

Execution Failed

Wait! I know this exists!

It will help if you specify the correct database to use. Try executing the procedure with the fully qualified name, msdb.dbo.sp_syspolicy_rename_policy, and it will work successfully.

Execution Successful with FQN

Fully Qualified Names are best practice for a reason

Renaming Conditions works exactly the same way as policies when it comes to renaming them, Rename Condition is oddly absent, but a similarly named command will solve the problem. You can execute msdb.dbo.sp_syspolicy_rename_condition using the same parameters as the policy rename.

Missing Rename Condition

Renaming Categories is easy, strangely enough. There is a user interface for policy Categories. Click on Policy Management, then Manage Categories. This window will allow you to add new, and edit existing, categories. This is a great feature, and helps you keep track of Condition names. Ironically, it is also the feature I would use the least out of the renaming options.

Manage Categories Exists

So how did this sneak into the interface?

Edit Categories with Manage

Renaming and adding categories is as simple as typing in the Name field.

Summary As much as I like Policy Based Management, I wish that Microsoft would have finished building the feature – at least it feels like PBM is unfinished every time I work with it. So many facets are apparently missing, PowerShell execution is convoluted at best, and interface options are just strangely absent. Something as simple as renaming policies and conditions should not be available only through a hidden stored procedure. Policy Based Management is a great feature of SQL Server, and it’s one of the few standard edition features they have added lately. Why couldn’t Microsoft put a bit more effort into it and make it as powerful and easy to use as its potential implies?

Policy Based Management – Tricks & Tips

In continuation of my original post for TSQL Tuesday on the benefits and issues with Policy Based Management, I’d like to discuss options and suggestions for a basic, yet powerful, policy for evaluating the last backup date of each database. While this policy sounds simple enough, successfully evaluating it against every server in your environment may not be so easy. Each server might have special rules, and if you use high availability, you have an even bigger mess to deal with.

The first step is to create a blank policy, which I covered in detail in my original post. To verify a last backup, we need to use the Database Facet. This will cause the policy to evaluate against all databases by default, with some exceptions – see Targets below, but that can be altered per your whims, which I will address at the end.

Once the Create Policy dialogue box is open, we need to create a detailed condition for evaluation. This is going to require the easy-to-use but hard-to-figure-out-which-one-you-really-need built in fields, as well as some specialized ExecuteSql statements to make a robust policy.

Starting off, we only want to check the backups on databases that are in a normal state. We do this by selecting the Field, Operator, and Value of @Status = Normal. Simple enough, but now let’s dive into the trickier parts.

CreateCondition

The meat of the policy are the next two steps. Select AND @LastBackupDate = DATEADD(‘DAY’,-1,GETDATE()). Immediately create another line OR LastDifferentialBackupDate = DATEADD(‘DAY’,-1,GETDATE()). Note that unlike the normal sql script, you need single quotes around DAY. The Condition will error and remind you about this if you forget though. These expressions will check that a full or differential backup has been taken within the last 24 hours. Modify the date to fit your needs and remove the differential test if you never use them. You can then group the full and differential backup clauses together, and insert another clause to verify that a full backup has been taken within the last week. You can try using DATEADD(‘WEEK’) and that will not even error, however, it does not evaluate successfully. Stick with days and make your life easier.  Getting comfortable making grouped clauses is an imperative to successfully implementing complicated policies and this one ensures that some form of backup has been done every day, but still verifies that a full backup was done at least each week.

AddBackUpExpressions

If you need to apply different rules for specific databases or instances, you can do that too, but it won’t be fun. Create an ExecuteSQL statement that will test the Database name using @Name = ‘YourDatabase’ or create a custom field using ExecuteSql(‘String’, ‘SELECT SERVERPROPERTY(”InstanceName”)’) = ‘InstanceNotToBackup’. You can also evaluate based on database size if necessary. Perhaps you only perform full backups nightly on databases that are less than 100g in size, while larger databases only receive a differential backup. You can use the @Size Field (which is in megabytes) to check this value. Perform some grouping and you have a targeted policy.

Troubleshooting ExecuteSql If you have any problems with your ExecuteSql statements, test them by running the query in SSMS without the ExecuteSql wrapping. Get the query working first, then move onto evaluating the ExecuteSql string in the policy. When you write statements this way, don’t forget to add double quotes to your query since it will reside in a string when evaluated from the policy.

Mirroring By default, policies target Every Database. However, this is a lie. Policy Based Management actually targets every non system database that is accessible. This means that Mirroring takes care of itself. It will only evaulate against principal databases. The secondaries will not show up, and good luck trying to convince a policy to run against them. Even if you expicitly tell a database target to look at restoring or databases using mirroring, it will still skip them.

Availability Groups Since Avaibility Groups and their corresponding views did not exist prior to SQL2012, they provide a problem when you query a variety of SQL Server versions. In a multi-server query, you deal with this issue by first using the test: IF(SERVERPROPERTY('IsHadrEnabled')=1)

You’ll need to include that line in the condition to test AG queries successfully, so add it to your ExecuteSql string. If you forget, you will pay the price whenever you evaluate the policy against an older version of SQL.

Hadr

In verions prior to 2012, SQL cannot use sys.fn_hadr_backup_is_preferred_replica because it does not exist. Using an IF statement will let you bypass this error.

If you perform your backups on a preferred replica, you can use the following expression, evaluating to equal two. This passes the policy for non preferred replicas.

ExecuteSql('Numeric','
IF(SERVERPROPERTY(''IsHadrEnabled'')=1)
BEGIN
'SELECT sys.fn_hadr_backup_is_preferred_replica(DB_NAME())
END')

Perhaps you don’t care about preferred replicas though; you have no performance issues ever and live on the wild side – always performing your backups on the primary replica. You can allow all secondaries to automatically pass using the below condition, still checking for value two.

ExecuteSql('Numeric','
IF(SERVERPROPERTY(''IsHadrEnabled'')=1)
BEGIN
SELECT a.role FROM sys.dm_hadr_availability_replica_states a WHERE is_local = 1
END')

This all hinges on you having readable secondaries for your AGs, if you do not, the policy will skip over them regardless since the databases are not accessible.

AddAGExpression

Grouping the backup expressions again ensures they function properly

Arrays All the above conditions assume that you want to see every database, even if you are automatically passing them. Well, obviously TempDB is never going to be backed up, so we need to provide an expression for that, luckily this is simple and done via the @Name Field with the Value ‘Tempdb‘. If you want to exclude other databases, you can do so using the Operator IN, but if you think this works exactly like SQL and write (‘DB1′,’DB2’), it will never work. After much frustration, I realized that you need to set the value to Array(‘DB1′,’DB2’). Without the word Array, the IN clause will not evaluate correctly. Aren’t you glad I saved you all that frustration?

Test

I used an array to demonstrate how they are used, not because it was necessary for one value.

Targets Finally, you have the option of including all of these automatic passes not in the Condition you are checking, but in a custom Condition. Yes, it is very unfortunate that everything is a condition in PBM. So to make things easier, let’s call them Target Conditions and Evaluation Conditions. The same rules apply for creating an Evaluation Condition, but any databases that evaluate to True in the expression will be excluded from the evaluation results entirely. The main thing you need to be concious of is that when evaluating on the Database Facet, the default Every condition is false. It does not evaluate against nonaccessible databases (as I mentioned with Mirroring), and it does not target System databases. If you really want everything to be targeted, you need to create a Every Database Target Condition. This will seem rather redundant, but it works, and without explicitly stating you want both System and NonSystem objects, the target will try to do as little work as possible. It’s lazier than I am.

Don't even try making this a one line array, conditions hate that if you use standard fields.

Don’t even try making this a one line array, conditions hate that if you use standard fields.

Testing and Evaluation I’ve already mentiond that you should test ExecuteSql statements by running the query seperately. You’ll still have to do a lot of grouping testing and evaluating to get a complicated policy to work the way you want. When you evaluate a new policy, always view the results on a few databases and servers to verify that the actual results are in line with what your expected results are. If you get a bad statement, you might not be getting any actual results at all!

Test, test, test new policies!

Test, test, test new policies!

Automatic Passes vs. Targets For most policies I like to see every server and automatically pass the exceptions, hence why I presented conditions with so many automatic passes. Evaluating like this allows me to quickly determine that the server count is correct. It also quells the fear whenever anyone new evaluates a policy. Normally when someone notices a database is not evaluating, alarms go off in their head that the policy is broken and thus that it has never worked properly before. It can be quite a chore to calm them down and explain that was the intended result. That being said, all the Evaluation Conditions can be moved into the Target conditions instead, removing any unwanted results. You know TempDb is never going to be backed up, so moving it to the Target condition is of no consequence.

The main point to be stressed about Policy Management is that it is a powerful feature, but it does require you to learn how to use it through trial and error.

T-SQL Tuesday #63: How do you Manage Security

TSQL2sDay150x150_388014A5

This is my first attempt at contributing to T-SQL Tuesday. This month’s topic is hosted by Kenneth Fisher (blog | twitter) with the subject of security.

 

 

Security is one of those subjects that most DBAs have to deal with regardless of specialty. So as something we all have to work with at some point or another what are some tips you’d like to share? What’s the best security design? You’ve picked up a legacy system and the security is awful, how do you fix it? Any great tools out there you’d like to share? Hate it or love it I’m betting we all have something to say.

Security is an important subject that can be addressed on countless levels. I’m going to focus on an incredible way to monitor the security settings on any number of servers. This is a topic that I think is very poorly documented, very powerful, and largely ignored by Microsoft and the SQL Server Community. Today’s topic will be Policy Based Management.

Policy Based Management was introduced in SQL Server 2008, and since then has seen very little support. There is basically one book over the topic that only skims the surface over the myriad of options that it provides.  While the book explains the idea very well, there remains very little documentation over what exactly PBM can do.

Basic policy creation is very simple. The problem is wading through the dozens of facets for a condition (what piece of the server the policy is checking) and creating a valid target to evaluate against.

Test

The Policy Creation Window appears simple at first

The difficult part of policy based management is deciding which facet you should use for your condition. Microsoft provides many facets, each with many drop down options. You can stack boolean operators to combine a few fields from one facet, but you cannot evaluate more than one facet in a single policy.

Test

Creating a condition to check that each database is using the default SA account, please rename your account!

Once you have your condition created, you can create a specific target or just evaluate against everything within the facet. Target conditions are grouped together by facet, so if you have a dozen target conditions created for one facet, then create a policy using another facet and are suddenly distressed that your target conditions are missing, remember that targets are unique to a facet.

Test

Policy creation with a valid condition and target selected

Once your policy is created, you can evaluate it against a single server, or use a CMS to evaluate against a selection of servers. That is the real purpose for policy based management. Create a handful of policies on your CMS, then evaluate them all targeting a group of servers. In moments you have verified that your security is correct for all your servers!

Test

Policy evaluation against a single server on all databases

The way that most of my policies are created is through Advanced Edit. This allows you to perform an ExecuteSQL or WMI query targeting anything you can script out for evaluation. The downside of this is that SQL Server will not be able to automatically correct failed policies, but I rarely use that feature. By using Advanced Edit, you can essentially combine facets, and you don’t have to hunt through each facet for the one field you need. It’s much faster, and you can easily convert existing jobs or scripts into a policy by modifying them to return an expected numeric or string value. Below I’ve used an ExecuteSQL statement to verify that each database owner is the default SA account (which you should rename in your real environment) and that the account is disabled. Using existing facets, this would have taken two facets and thus two policies. If  you have figured out a way to do this in a single, built-in facet, please feel free to correct me, but even if you do, creating an ExecuteSQL statement was faster I bet!

Test

Advanced Edit is faster and easier than guessing which facet to use

Currently we use over 700 security policies targeting over a hundred servers. It took forever to make, but audits are a breeze now since we have instant verification of security policies and a history of how often they have been met. Most of the policies are ExecuteSQL statements verifying SQL settings (including Availability Groups, Mirroring, Drive Space, etc.) and Windows settings. We also use Enterprise Policy Management Framework to provide easily read reports and statistics on all of our policies. I highly suggest you take the time to look into this project if you use Policy Management.