Hyper-V VM Troubleshooting

I’ve made VMs before in Hyper-V, it’s a nice way to keep things separate from your main OS and test out configurations. When you haven’t used it lately, it can also be a lesson in frustration.

My solution? It was just embarrassing.

I had a VM set up working fine, however, I didn’t need that OS anymore, and wanted a brand new VM to play with. I spun up a new VM with the same configuration settings as last time, just a different OS. Every time that I tried to boot the VM, I got the same error though.

 

bootfailure

The boot loader failed – time out.

 

Maybe the new ISO file was corrupt? I switched back to the original that worked for Server 2012R2 in my old VM. That didn’t make a difference.

I hunted online, I asked around. There were a few suggestions.

Review Configuration Settings. Maybe I screwed up the configuration? I rebuilt the VM and made sure all the file paths were perfect, with a new Virtual Hard Disk, just in case I had moved files or changed some folders. That didn’t change anything though.

Disable Secure Boot. I heard that caused OS boot failures. Except that didn’t change anything, and it didn’t really apply to my situation.

Unblock the files. I hear that’s always a problem on new downloads, but I”ve never seen it actually happen to me. My problems are never that simple. This was the first time I actually checked the file properties and – they were blocked! I was very excited, but this did not make a difference. It’s still a good idea to check this anytime you run a new file as it is a common issue.

unblock

The Solution

Finally, at wits end, I reopened the VM console and started the machine, and tried it again. I smashed the keyboard in frustration as it came up. This time, it went straight to installing Windows.

My nemesis in this case was a simple five word phrase that disappeared almost instantly.

Press any key to continue...

It only shows up for a couple seconds at most, and if you start the VM before you connect to it, you’ll never have a chance to hit a key. VMs don’t automatically go into boot mode, instead they just try to load the (non)existing OS.

So after all that confusion, I just wasn’t hitting a key FAST enough. Sure all those other things can be important and you should always verify your settings, but it shouldn’t have been this difficult.

Next week I’ll share the fun I had trying to get internet connectivity on my VM…

 

 

Advertisements

Changing Roles

I’ve made a major career change as of last month. I didn’t even get a blog posted last week due to the uncertainty and travel I’ve been doing. It’s already been a huge change, and I haven’t done much more than my new hire orientation yet!

As of last week, I am now a Microsoft employee. I’ve accepted a position as a Premier Field Engineer in SQL Server. Microsoft has been on my list as kind of a “capstone” company that I would like to work for, so when I got the chance to actually apply, I couldn’t pass it up. Working for the company the produces the product I work on will be an amazing experience, and I count myself extremely lucky to have achieved this at such a relatively young age for a SQL Server professional.

Normally this type of role would entail a great deal of travel, but I expressed my distaste for flying and the company was willing to work with me. Instead, I’ve opted to relocate about 1,000 miles, all the way to Arizona. This new experience is both exciting and stressful. It’s a new climate and a smaller town. I’m not a very outgoing person, so meeting new people here is going to be tough, and frankly I’m not even sure how to go about it. That’s going to be an ongoing challenge…

I already have, and will continue to do, a lot of flying around the country as my onboarding with Microsoft continues. The consequence (other than having to fly) is that blogs may continue to be a bit haphazard for the next month or so. Hopefully I will find some spare time between unpacking, stocking the house, and learning the area to find a good subject and queue up a stock of scheduled posts. That is the only reason I had any posts while I was moving!

I’m hoping that as I brush up on some skills and build some new test environments, I’ll have some good topics to cover in the upcoming weeks. I am very excited to start this new role that I am sure will provide me with a wealth of knowledge in the coming years.

 

SQL Server Forced Mirror Failover

Failing a SQL Server Mirror over is normally an incredibly simple process. If you use the GUI, you can accomplish the failover in moments. However, it is not always a planned event. Yesterday we had just such an issue.

If your principal server goes down and you do not have a witness setup for automatic failovers, you will need to perform a forced manual failover on the mirror server. There is no way to do this via the GUI, scripts only. Opening the database properties on a mirror results in a greyed out pane of information and zero options.

MirrorProperties.png

Not very helpful when this is all you have to work with.

Forced Failover Method 1 (PREFERRED)

In a perfect world, you will be able to connect to the mirror database and quickly run the following script, substituting in your database name. This will force the mirror to become the principal, and voila, you are up and running again. If you use an alias in Active Directory, just make sure the alias is changed to point to the new server.

ALTER DATABASE dbname
SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS

However, in the less perfect world we live in, you may receive the following error:

Cannot alter database because database is not in the correct state to become the principal

This error results for one of two reasons:

a. Full Safety Mode

If the database is not in Full Safety Mode, otherwise known as Synchronous commit, the failover will error. If your mirror database is in a remote location, it’s probably not in Full Safety. Asynchronous mirrors are High Performance for data transfers, but Low Performance when it comes to forced failovers. Try running the following to force the database into Full Safety mode, which will then allow the failover to occur.

ALTER DATABASE dbname
SET PARTNER SAFETY FULL

However, this script will fail too, if the principal database is still available to SQL. It might still fail regardless. It did for me, so I had to resort to Method 2 below.

b. Principal Database can still be reached by Mirror

If your principal database is still accessible by the SQL mirror, but having too many connectivity issues for you or users to actually reach it, you may have to shut down the principal in order to run commands from the mirror database. This is a bit of a strange scenario, but if your principal cannot be connected to, you’ll need to failover somehow. In this situation, someone will have to make the call of what to do.

Note: If you failover using the forced manual failover script, you will be able to manually resume mirroring after the principal server becomes available again. All your mirroring settings will be saved for the database, making failback relatively painless.

Break Mirror Method 2 (LAST RESORT)

If Method 1 of forcing the failover does not work, even after trying to set the database to Full Safety Mode, you can run the following script and break the mirroring session and restore the mirror with recovery. This is the quick and dirty method.

ALTER DATABASE dbname
SET PARTNER OFF
GO
RESTORE DATABASE dbname
WITH RECOVERY
GO

If you destroy the mirroring session in this manner, you will not be able to resume mirroring. All the settings will be erased and you will have to recreate mirroring from scratch. You’ll have a usable database though, and if the principal died suddenly, you shouldn’t actually have much, if any, data loss.

String Manipulation in T-SQL or PowerShell

I don’t often have to do string manipulation in T-SQL, but when I do, it’s rare enough that I don’t remember the syntax. I’ve done it in the past, so I keep a script handy with a working example though. I recently added to that example after wanting to know not only the file names, but also the folder location.  This article will cover getting a string after a specific character. This can be used to get a folder path or just a file name.

The scripts are cumbersome, to say the least, in SQL. Some people use this problem as an interview question, but I feel like the only way you’d be able to answer this off the top of your head is if you did it on a daily basis. While I remember the general idea of switching the order and checking the text after a specific character, I couldn’t write this from scratch if my life depended on it. Even with intellisense helping me out, it’s hard for me to remember how CHARINDEX can be manipulated to get the correct answer here, since I use it so rarely. I can pull it from my script library quickly though, and now I’ll be able to refer to my own blog as well. (I do that every time I set up Transparent Data Encryption).

I’ve commented into the code a quick explanation of what each command does in order to find the correct answer. Then I found out WordPress refused to color code the comments correctly. So I removed them. Now I’ll just explain it instead.

SQL can do it

The below script will find the File Name of each master file by reversing the direction of the text and using CHARINDEX to get the length of the string until it reaches a specific character, in this case, the \. The RIGHT command then returns the number of characters at the end of the original string based on the inner index command. The -1 removes the \ from the returned results.

To get just the folder path, perform the same command again, but this time issue a REPLACE command for the FileName, replacing it with an empty string.

In order to get a single folder, you combine the steps above to get just the FolderPath, and then once again get the last value after the \. This returns the final folder in the list. This may not be the optimal way to find the results, but it uses the same methods in order to keep it relatively simple.


SELECT
mf.name
,mf.physical_name
,[FileName] = RIGHT(physical_name,CHARINDEX( '\', REVERSE(physical_name)) -1)
,[FolderPath] = REPLACE(physical_name, '\' +RIGHT(physical_name,CHARINDEX( '\' ,REVERSE(physical_name)) -1), '')
,[FileFolder] = RIGHT(REPLACE(physical_name, '\' +RIGHT(physical_name,CHARINDEX( '\' ,REVERSE(physical_name)) -1),''),CHARINDEX( '\' ,REVERSE(REPLACE(physical_name, '\' +RIGHT(physical_name,CHARINDEX( '\' ,REVERSE(physical_name)) -1), ''))) -1)
FROM sys.master_files mf

PowerShell just does it better

On the other hand, I feel that the syntax to perform the same string manipulation is much more intuitive in PowerShell. It’s definitely less code, although I will concede that I assigned variables here, while with the SQL approach, I chose to do it all in a single statement. No variable assignments makes it much harder to read.

Leveraging the SPLIT command and LENGTH is about all you need for PowerShell. SPLIT creates an array from the string, with each new value separated by the character supplied in the parenthesis, that \ again. Returning the array value based on the length of the newly split variable returns any folder level or the file name alone, it just depends on the number you subtract from the length of the split.


$FullPath = "C:\Users\Test\Desktop\Text.txt"

$Split = $FullPath.Split('\')
$File = $Split[$Split.Length-1]
$LastFolder = $Split[$Split.Length-2]
$FolderPath = $FullPath.Replace($File,'')

$FullPath
$FolderPath
$LastFolder
$File

 

Getting Jobs by Run Time

New month, so I have to make sure I write something about SQL again, instead of babbling about PowerShell all the time! I manage a lot of servers, so scripting out something that I can run on all the servers rather than pulling up the GUI on one server at a time is always important.

Checking to see when jobs ran on a single server is a pretty quick task. You just have to pull up the Job Monitor, and you’ll have times and even more information easily accessible. Doing that on a dozen servers is tedious though. Doing it on a hundred would be a nightmare. Beyond that, checking specific times through the Job Monitor can be a bit of a hassle, as it displays a summary of the last execution times. What if you want to know what happened 5 hours ago, regardless of whether jobs have ran since then?

You can query SQL for job history, and with a lot of time conversions,  you can get a list of jobs that ran within a specific time period. I use this to narrow down what was happening on a box while troubleshooting. If you need to review long running jobs over a period of time, compare run times of jobs across servers, or just see what jobs might have been slowing performance at 3:03 am, this script can help you out.

The script takes a start and end time and checks for any jobs that started or started earlier but were still running by your supplied start time.

Calculating the Start and End times from job history is ugly. You’ll see that the datetime conversions below are disgustingly complicated. The short and sweet explanation is that date, time, and duration are all integer values. Because Microsoft thinks everything should be a number. To properly calculate date and time, then display it in a readable format, you have to stuff in colons and and use modulus division to get the hours, minutes, and seconds displaying correctly. If you really like math (I do!), that’s a technical explanation of modulo.

The run status will tell you if the job succeeded or not. Now, according to TechNet SQL Server 2000, there is a run_status of 4 meaning the job is still In Progress. Newer versions don’t mention a run_status of 4 and I’ve never seen run_status of 4 in a result set, even though I had jobs running at the time. Will it work for you? I really don’t know. I don’t trust it though. For newer versions of SQL, I’d suggest running a system stored procedure instead to get currently executing jobs, the code for which I’ve provided after my script.


DECLARE  @StartTime INT = '195500' --HHMMSS
,@EndTime INT = '195600' --HHMMSS
,@Date  INT = CONVERT(INT,CONVERT(VARCHAR,GETDATE(),112))
,@MinTime INT = 10  --Job Run Time Minimum in Seconds

SELECT
[JobName] = j.name
,[StartDate] = CONVERT(VARCHAR,CONVERT(DATETIME,RTRIM(jh.run_date)),101)
,[StartTime] = CONVERT(TIME(0),STUFF(STUFF(REPLACE(STR(RTRIM(jh.run_time),6,0),' ','0'),3,0,':'),6,0,':'))
,[EndTime] = DATEADD(second,run_duration%100,DATEADD(minute,run_duration / 100%100,DATEADD(hour,run_duration / 10000,CONVERT(TIME(0),STUFF(STUFF(REPLACE(STR(RTRIM(jh.run_time),6,0),' ','0'),3,0,':'),6,0,':')))))
,[Duration] = STUFF(STUFF(REPLACE(STR(jh.run_duration,7,0),' ','0'),4,0,':'),7,0,':')
,[RunStatus] =
CASE jh.run_status
WHEN 0 THEN 'Failed'
WHEN 1 THEN 'Succeeded'
WHEN 2 THEN 'Retry'
WHEN 3 THEN 'Cancelled'
WHEN 4 THEN 'InProgess'
ELSE 'Unknown'
END
FROM msdb.dbo.sysjobs j
INNER JOIN msdb.dbo.sysjobhistory jh ON jh.job_id = j.job_id
WHERE run_date = @Date
AND step_id = 0 -- Entire Job
AND run_duration > @MinTime
AND
(
( run_time >= @StartTime AND run_time <= @EndTime )
OR
( Run_Time < @StartTime
AND
DATEADD(second,run_duration%100,DATEADD(minute,run_duration / 100%100,DATEADD(hour,run_duration / 10000,CONVERT(TIME,STUFF(STUFF(REPLACE(STR(RTRIM(jh.run_time),6,0),' ','0'),3,0,':'),6,0,':'))))) > CONVERT(TIME,STUFF(STUFF(REPLACE(STR(RTRIM(@StartTime),6,0),' ','0'),3,0,':'),6,0,':'))
)
)
ORDER BY Run_Date,Run_Time

Note that jobs that are still executing do not have an entry in job history. So combining this script with the executing jobs system stored procedure is a good idea if you are dealing with a recent time window especially. You can add the parameter @Category_Name = ‘jobs from msx’ to return only the jobs managed by your Central Management Server.


EXEC msdb.dbo.sp_help_job @Execution_Status = 1

PowerShell: Custom Objects and Tables

Returning data in PowerShell is normally an automatic feature, but oftentimes you will need to specially format or splice some data into a single table. There are a lot of ways to go about it, but none of them are exactly intuitive. I’ll list my four preferred methods, all using a moderately complex statement that I’ve previously mentioned.

Method 1 – String

The first method is probably the easiest one to remember syntax-wise. You create an empty variable, pipe column names to it, add results to the table, and finally return the result. This will leave the data as a string, which has certain pros and cons. It supports multiple rows of data just fine.


## Simple Example - String ##
$Results = '' | SELECT ComputerName,UserName,Processes,Services
$Results.ComputerName = $Env:ComputerName
$Results.Username = $Env:UserName
$Results.Processes = (Get-Process *).Count
$Results.Services = (Get-Service *).Count
$Results | Format-Table -Autosize

SelectResults

A sample result of the custom table

A much more complex, but realistic, example is below. I’m querying network adapter data to get the name of the adapter and match it to the IP and Mac Addresses, then returning the result set.


## Advanced Example - String ##
$Adapters = Get-WmiObject win32_networkadapter | WHERE {$_.NetConnectionId -NOTLIKE '' -AND $_.NetConnectionId -NOTLIKE 'Local Area Connection*' }
#  $ErrorActionPreference = 'SilentlyContinue'
$Info = ForEach($Adapter in $Adapters)
{
$Results = '' | SELECT ComputerName,AdapterName,IpAddress
$Results.ComputerName = $Env:ComputerName
$Results.AdapterName = $Adapter.NetConnectionId
$Results.IpAddress = (Get-WmiObject win32_networkadapterconfiguration -Filter "Index = $($Adapter.DeviceId)").IpAddress
$Results
}
$Info | FT -auto

I’ll continue to show the simple and complex example for each custom object method.

Method 2 – Hashtable to PsCustomObject

Method 2 is very similar to Method 1. It’s actually the first one I learned, so I use it the most. This creates a HashTable instead of a String then converts that to a Custom Object so it will display nicely as a table. Older versions of PowerShell might not format it to look like a table, instead listing out the results in 2 columns. One of my machines displays as a list, while the other, with a newer version of PowerShell, displays as a table.

## Simple Example - HashTable to PsCustomObject ##
$Results = @{}
$Results.ComputerName = $Env:ComputerName
$Results.Username = $Env:UserName
$Results.Processes = (Get-Process *).Count
$Results.Services = (Get-Service *).Count
[PsCustomObject]$Results | Format-Table -Autosize

The advanced hash table isn’t overly complicated, the setup is only a bit different.

## Advanced Example - HashTable ##
$Results = @()
$Adapters = Get-WmiObject win32_networkadapter | WHERE {$_.NetConnectionId -NOTLIKE '' -AND $_.NetConnectionId -NOTLIKE 'Local Area Connection*' }
ForEach($Adapter in $Adapters)
{
 $Results += [PSCustomObject] @{
 ComputerName = $Env:ComputerName
 AdapterName = $Adapter.NetConnectionId
 IpAddress = (Get-WmiObject win32_networkadapterconfiguration -Filter "Index = $($Adapter.DeviceId)").IpAddress
}#EndCustomObject
}#EndForEach
$Results | FT -auto

Method 3 – Custom Table

This method is very syntax heavy with lots of punctuation. It’s a good method to create custom column names in a table, as you can just use the @{l='ColumnName';e={$Value}} format for any column in the table. Just remember l is for label and e is for expression,  and you can mix custom column names with default ones.

## Simple Example - Custom Table ##
Get-Process * | SELECT -First 1 |
Format-Table @{l='ComputerName';e={$Env:ComputerName}},@{l='UserName';e={$Env:UserName}},
@{l='Processes';e={(Get-Process *).Count}},@{l='Services';e={(Get-Service *).Count}} -AutoSize

The advanced example is very similar to the simple example for this method. Just remember to keep track of those brackets. They can be overwhelming.

## Advanced Example - Custom Table ##
Get-WmiObject win32_networkadapter |
WHERE {$_.NetConnectionId -NOTLIKE '' -AND $_.NetConnectionId -NOTLIKE 'Local Area Connection*' } |
Format-Table @{l='ComputerName';e={$Env:ComputerName}},@{l='Adaptername';e={$_.NetConnectionId}},
@{l='IpAddress';e={Get-wmiObject win32_networkadapterconfiguration -Filter &amp;quot;Index = $($_.DeviceId)&amp;quot; |
Select-Object -Expand ipaddress}} -auto

Method 4 – Add Member

Coders may tell you that this is the preferred method, but I feel the syntax and execution is too troublesome for just displaying some query results. This method returns as a PSCustomObject. There is a lot of syntax here, but it’s mostly repetitive and easy to read. You also have the benefit of explicitly defining the MemberType with this method, although you’d normally just leave it as NoteProperty. My major complaint with this is that it does not easily support multiple rows of data…I’ll explain in the advanced example.

$Results = New-Object PSObject
Add-Member -InputObject $Results -MemberType NoteProperty -Name ComputerName -Value $Env:ComputerName
Add-Member -InputObject $Results -MemberType NoteProperty -Name UserName -Value $Env:UserName
Add-Member -InputObject $Results -MemberType NoteProperty -Name Processes -Value (Get-Process *).Count
Add-Member -InputObject $Results -MemberType NoteProperty -Name Services -Value (Get-Service *).Count
$Results | Format-Table -Autosize

The advanced example requires some consideration. In this case, you’ll need to make sure that the New-Object command is inside of a ForEach loop, and thus creating a new set object each time. Otherwise, you’ll just repeat the same first line of data over and over. The Add-Member command creates columns, so if they already exist…you just get errors. With the looping, we are generating a few tables and splicing them into one. At the end of the ForEach, we assign the generated data into another variable to populate the custom table and display the data.

$Adapters = Get-WmiObject win32_networkadapter | WHERE {$_.NetConnectionId -NOTLIKE '' -AND $_.NetConnectionId -NOTLIKE 'Local Area Connection*' }
Clear-Variable Results
ForEach($Adapter in $Adapters)
{
$obj = New-Object PSObject
Add-Member -InputObject $obj -MemberType NoteProperty -Name ComputerName -Value $Env:ComputerName
Add-Member -InputObject $obj -MemberType NoteProperty -Name Adaptername -Value $Adapter.NetConnectionId
Add-Member -InputObject $obj -MemberType NoteProperty -Name MacAddress -Value $Adapter.MacAddress
Add-Member -InputObject $obj -MemberType NoteProperty -Name IpAddress -Value (Get-WmiObject win32_networkadapterconfiguration -Filter &amp;amp;amp;quot;Index = $($Adapter.DeviceId)&amp;amp;amp;quot;).IpAddress
$Results += $obj
}#EndForEach
$Results | Format-Table -Autosize

In my opinion, use whichever method fits your needs and is the easiest to implement and remember. Personally, I use method 1 and 2 the most.

Troubleshooting Policys and xp_dirtree

Today we had a policy failure on a newly built SQL server. The policy verifies that every SQL server has a backup file stored on a remote directory. Simple enough, but in the past, this policy has been prone to permissions errors.

In case you would like to implement something like this on your system, the dumbed-down condition of the policy is below:


ExecuteSql('NUMERIC', '
--Get DBs on server
DECLARE @DBCount INT,@File VARCHAR(250)
SELECT  @DBCount = COUNT(*) FROM sys.databases WHERE database_id &lt;&gt; 2
SELECT  @File = ''\\BACKUPSERVER\BACKUP\''+CONVERT(VARCHAR(100),SERVERPROPERTY(''MachineName''))
--Insert DirTree results into a table variable to return a numeric result
DECLARE @Check TABLE (SubDirectory VARCHAR(250), Depth BIT, [File] BIT)
INSERT  @Check
EXEC master.dbo.xp_dirtree @File,1,1
--Numeric result for easy policy evaluation 1=Pass if( # files = # dbs )
IF((SELECT COUNT(*) FROM @Check) = @DBCount)
SELECT 1
ELSE
SELECT 0
')

Policy Explanation

The dumbed-down policy checks a backup server, which has a folder named for each SQL server, to see if it has the correct count of backups.  Each folder should have a backup file for each database, except for TempDB. This isn’t exactly how we do it, but it gets my point across and it provides a good example of how a moderately complex policy works using ExecuteSQL. There aren’t a lot of examples on the web, so I try to provide as many as I can.

The Troubleshooting

The new server wasn’t processing the policy correctly. My first idea was, “Well, there’s no files!” Easy! Problem solved!  Or not…the files were in the folder, and the number was even correct. So it wasn’t the backups or copy failing. That meant that there was something wrong with the permissions, the problem was, what?

I verified the logins were all present in SQL by reviewing Security -> Logins through Management Studio. This was a new server, so I assumed it would be something simple like that. The account in question was the SQL Server service account.  Policy Based Management and xp_dirtree use this account. The account was in SQL and it had all the proper permissions. OK, next thing.

The folder where all the files are stored is a shared folder, so I went into the Properties of the folder and verified the account that was shared. Everything looked right there. Every time I’ve had an issue with the permissions of xp_dirtree reading servers, it’s been because the service account in the shared folder was wrong, so my next stop was checking out Active Directory.

FileSharing

Sharing a folder makes the file path easier. Just make sure your service account has permission here.

Drilling down through Active Directory, I found the account used for sharing the folder. This account had every SQL service account as a member. It looked fine in AD, it looked fine on the Shared folder. Everything worked perfectly except on this new server. What was wrong?!?

At this point, I just decided to throw a hammer at it and throw in extra service accounts until hopefully one worked. Eventually I added the SQL Agent account. Suddenly, the policy passed. The new server could read the backup directory! So I took the normal account out of the permissions. Now everything failed except this new server. Wait, what…?

I logged into the new server and loaded up SQL Configuration Manager…

Oh…

When the server was setup, someone switched the accounts. SQL Server was running using the SQL Agent service account. SQL Agent was running using the SQL Server service account. I restarted each with the correctly corresponding service and account. I then fixed the permissions for the shared backup directory back to the original setup.

Voila! Everything worked perfectly!

The Moral of the Story

Make sure new servers and settings are configured properly before trying to fix things that weren’t broken before the new stuff existed. I’m sad to say it took me so long to understand the problem. Even with other DBAs chiming in. None of us thought about checking the Configuration Manager…that or they assumed I had already checked that.

Sometimes, it’s the little things that get you…OK, normally its the little things that get you.

Get Failed Jobs with T-SQL

It’s been quite some time since I did a post relating to SQL. I’ve just been sharing a lot of PowerShell scripts lately because almost all my views are related to PowerShell.

Monitoring SQL Jobs can be done in many ways, and I highly suggest your jobs send an email notification to your DBA group on failure. No not just you. What happens when you are on vacation or you leave permanently and your account gets disabled? No notifications. Don’t do that. You can also set up fancy reports to display information about your job status using SSRS, but I always like to have something that is easily portable and can be ran in SSMS quickly.

Uses

In the middle of the night when email isn’t loading and a call comes in that something important failed, but the helpdesk can’t articulate what exactly it is over the phone…this script can help a lot. Suddenly confusion and trying to piece together a string of letters and numbers over the phone turns into a simple, “Oh, yes, I see Server X has failed Job Y 3 times. It ran successfully last time, I’ll go ahead and connect to that server to look into it but I suspect someone might have been doing Z.”

I developed this failed job script to run on a single SQL server or through a multi-server query to see all failed jobs across all servers. I commonly run this script in the mornings as a summary when there are multiple failed job emails that came in over night. It’s a much easier way for me to see the history this way. If a job is continuing to fail and it’s high priority, I start there.

Explanation

The job provides the name, run date, and run time for each failed job, but it also provides some extra information for sorting priority. I’ve used ROW_NUMBER to sort job history and retrieve the latest run results of each failed job. That’s the only purpose of using a CTE here, getting Row 1, or the most recent run result for each job. Using the latest result helps determine how problematic the reported job failure is. Did the latest trans log backup run successfully even though one of them failed in the past? OK, that’s probably a minor issue then. Are they still failing? Well, then we have something a bit more worrisome to consider.

I always limit the results to a 24 hour period to limit the result set. I’ve also included a clause in the past to exclude a specific job that was prone to failing due to network issues or a colleague testing something.

WITH JobInfo AS
(
 SELECT
  j.job_id
  ,j.name
  ,jh.run_status
  ,jh.run_date
  ,jh.run_time
  ,jh.[message]
  ,[PriorStatus] = --'NA' --2008
    LAG(jh.run_status,1) OVER (ORDER BY jh.job_id,jh.run_date,jh.run_time)  --2012
  ,[NextStatus] = --'NA' --2008
    LEAD(jh.run_status,1) OVER (ORDER BY jh.job_id,jh.run_date,jh.run_time) --2012
  ,[RN] = ROW_NUMBER() OVER (PARTITION BY jh.job_id
 ORDER BY jh.job_id DESC, jh.run_date DESC, jh.run_time DESC)
 FROM msdb.dbo.sysjobs j
  INNER JOIN msdb.dbo.sysjobhistory jh ON jh.job_id = j.job_id
 WHERE jh.run_date &amp;amp;gt;= CONVERT(INT,CONVERT(VARCHAR,GETDATE()-1,112))
  AND jh.step_id = 0
)
SELECT
  j.Name
  ,[RunDate] = CONVERT(DATE,CONVERT(VARCHAR,j.run_date,112))
  ,[RunTime] = CONVERT(TIME(0),STUFF(STUFF(REPLACE(STR(RTRIM(j.run_time),6,0),' ','0'),3,0,':'),6,0,':'))
  ,[PriorStatus] = CASE j.PriorStatus WHEN '1' THEN 'SUCCESS' WHEN '0' THEN 'FAIL' ELSE 'N/A' END
  ,[NextStatus] = CASE j.NextStatus WHEN '1' THEN 'SUCCESS' WHEN '0' THEN 'FAIL' ELSE 'N/A' END
  ,[LatestStatus] = CASE ji.run_status WHEN '1' THEN 'SUCCESS' ELSE 'FAIL' END
  ,j.[message]
 FROM JobInfo j
   INNER JOIN JobInfo ji ON ji.job_id = j.job_id AND ji.RN = 1
 WHERE j.run_status = 0
 ORDER BY j.name,[RunDate],[RunTime]

Exceptions

If you are running the script on SQL 2012, you can get a bit more information using LEAD/LAG in the form of the completion status of the job before and after the reported failure. I use this to pattern failures and to further prioritize multiple job failures. All other things being equal, I’m going to be more concerned about the job that has consecutive failure than a job that fails intermittently.

Sadly, IF statements, LEAD/LAG, and SQL 2008R2 or older do not play well with each other. I’ve tried making IF statements that run the LEAD/LAG version of the job only on SQL 2012 while running the NA version of the job otherwise, but SQL always wants to parse the 2012 functions, resulting in errors. I’ve resigned to just commenting out the LEAD/LAG functions when running this script on SQL 2008R2 and older. The benefits of the PriorStatus and NextStatus is fairly minimal, so it’s not a huge loss. I find the LatestStatus to be the most important of the three by far.

2015 in Review

With 2015 coming to a close, so does my first year of blogging. Technically though, I started this blog around February, so I’m still a couple of months shy of a full year. I’m glad that I started along this path. It’s definitely gotten me to study some topics and practice writing.

My original plan was to write two posts a week, every week. That would have put me at around 100 posts for the year. I didn’t even get close to that, but I did manage to average about two a month. That’s not terrible, but I hit a few weeks where I just couldn’t bring myself to write. I also helped train two new DBAs this year.  Coming up with fun challenges for them took a lot of time out of my normal blogging schedule.

Hopefully next year I can write 1.5x as many blogs as this year. I’d like to set the goal of writing twice as many blogs as I did this year, but after my initial lofty goal, I think I should set my sights on something a bit more realistic.

As for blog views, initially I had very few, and in the grand scheme of the internet my views are still abysmally low, but for almost no advertisement, I feel they are respectable. Consistently my most popular posts were anything related to PowerShell automation, which people found through desperate google attempts. Each of my TSQL Tuesday posts saw a lot of traffic for the month they were published in as well.

Finally, I also started a Twitter account this year. My number of followers is still only a handful, but I cannot expect much since my posts are few and far between and far from exciting. I’m not much of a social media person. It seems I’m not just an introvert in real life, but online too.

Here’s to a strong 2016 with lots of blogs, tweets, and learning!

Here’s an excerpt:

A San Francisco cable car holds 60 people. This blog was viewed about 1,600 times in 2015. If it were a cable car, it would take about 27 trips to carry that many people.

Click here to see the complete report.

Security of PWDCOMPARE and SQL Hashing

Few people are as concerned about security as they should be, and fewer still are proficient in it. Even if you are, while you might be able to tackle the major security flaws, there’s always going to be something new that can get you, unless you implement an unhackable system – turned off and unplugged.

In SQL 2008, Microsoft added an interesting new function PWDCOMPARE. The idea was to provide DBAs with the ability to check if existing passwords were too simple, then require them to be changed. I was immediately concerned after reading about this. Researching the topic for this post actually alleviated most of my concerns, but there are a few things to know.

Function Explanation
PWDCOMPARE works by comparing a provided plain-text password against a password hash, both of which are provided as parameters to the function. If the result returns a value of 1, the password matches, while a value of 0 means the password is incorrect. You have to type the password exactly to get a match, that doesn’t seem so dangerous, right?

SELECT
name
,password_hash
,is_disabled
,[Pwd] = 'App!3'
,[IsMatch] = PWDCOMPARE('App!3',password_hash)
FROM sys.sql_logins

All users have access to query sys.sql_logins. Also, by default, they can see their own login as well as the SA account. We all (should) know that best practice is to disable and rename the SA account. (Just knowing it’s the right thing to do isn’t enough though, you should also be doing it!) Renaming the account doesn’t help here; users can still see the account with principal_id = 1, regardless of what it’s called and whether or not it’s disabled.

Now let me walk you through my original concern.

Curious User George logs in and performs a query on sys.sql_logins. He notices that the SA account is visible and someone accidentally left it enabled. Curious User George happens also to be Malicious User George. He sees a password_hash column in his query results. Here’s where SQL redeems some security though. If a user does not have the permission CONTROL SERVER, this field is left NULL. George cannot even see his own password hash.

SelectNullPassSA

When George attempts to use PWDCOMPARE, the result is also NULL. Crisis averted, basic users cannot decrypt passwords this way.

CompareNull

However, all those elevated users that should not exist, but do, in small companies? They have the power. They already have all the power, so this happens to be one of the smallest concerns.

PWDCOMPARE Concerns
MSDN explicitly states that “The PWDCOMPARE function is not a threat against the strength of password hashes because the same test could be performed by trying to log in using the password provided as the first parameter.”

However, if you try to login with a bad password, you should get an alert. I’m assuming you are auditing failed logins? If you try to do it repeatedly, you’ll be locked out of the account. Again, assuming routine security practices are in effect. If you compare a billion passwords using PWDCOMPARE, no alerts are fired, no one gets locked out. Even if it takes forever. How is that the same thing?

I created a temp table in six seconds that contained a million randomly generated passwords. It took 12-15 minutes to compare seven million rows and return the one password that matched. This really isn’t a viable method due to the time it takes for anything other than comparing some common passwords. But then again, all it took was a process running for a fraction of my day on a small development box. A targeted list will have significantly better results than using random GUIDs like in my below example.

Create a temp table with a million random passwords (no not reasonable passwords, just passwords)

WITH Test AS
(
SELECT [ID] = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a
CROSS JOIN sys.all_columns b
)
SELECT TOP 1000000
ID,[Pass]=NEWID()
INTO #Password
FROM Test

Find weak passwords; provide script to force users to change
This script will provide you with scripts that will force users with discovered passwords to change their password when next logging in. It’s still up to you to educate them on better password policies though.

SELECT
	 name
	,Pass
	,[IsMatch] = PWDCOMPARE(Pass,password_hash)
	,[ForceChange] = 'ALTER LOGIN ' + QUOTENAME(name) + '
	 WITH PASSWORD = ''' + Pass + ''' MUST_CHANGE,CHECK_POLICY=ON'
FROM sys.sql_logins
	CROSS JOIN #Password
WHERE PWDCOMPARE(Pass,password_hash)=1

Hashing Concerns
My friend, integgroll, is a Penetration Tester. As suggestive as that title is, he is essentially an Ethical Hacker. When I asked him about cracking passwords, he provided me with a lot of interesting information. There are quality, free programs out there that will force a password in less time and with less effort than using PWDCOMPARE, and without using SQL Server to even test against. Such a test/attack is especially effective if you use a Mask Attack (targeted list) instead of just a simple Brute Force Attack (every possible combination).

SQL Server uses the twelve-year-old encryption method of SHA-512, which is slowly being broken through collision attacks. SQL appends a 32-bit salt to complicate the hashed password. However, my friend pointed out that salting does not mean that passwords are suddenly unable to be cracked, it just takes more time. With a salted hash, he estimated it would only increase the time required to force the password by around 20%. Not too bad when most users follow a password pattern and you can potentially test 414 million combinations a second if you have a quality graphics card to boost Hashcat’s performance.

With sufficient permission, one could viably steal any hashed password this way. Just another reason to stick to Windows Logins. Windows logins do not display at all when using sys.sql_logins. They only return by using sys.syslogins, which still has a password field, but this is left NULL for windows accounts, keeping the hash secure (at least from SQL).

I have yet to rationalize why we need to see a password hash at all. So long as there are password complexity requirements, I’d rather keep the chances for stealing and cracking passwords to a minimum, meaning that I’d rather not be able to see a plain-text or even a hashed password of any user.

EDIT: It was pointed out to me, and then I immediately remembered I’ve done it many times in the past, that password hashes let us script out logins without knowing the original password. I didn’t think about that while posting this, probably because I was so caught up in the disturbingly simple methods of password cracking.

The most important thing you can do here is improve password strength. Decrypting a hashed password is disturbingly easy and quick, but each digit you add to a password exponentially increases the cracking time required.

Administrators:

  • Implement a minimum password
    • Eight digits is too short for a reasonably secure password, more is better
  • Do not have a maximum length (or at least make it huge)
    • If users want a 75 digit password, let them. Their password will be more secure than your admin account.
  • Allow special characters and numbers
    • This substantially increases possibilities and thus increases cracking time, anytime I see policies which still only allow alpha character, I get angry or depressed – depending on how much energy I have.
  • Do not advertise your password policy on failed login or password creation
    • Displaying the policy is convenient for users, but it also lets hackers know how they can structure their Mask Attack
    • Controversial and may not really help vs. a determined Mask Attacker

Users:

  • Every digit added to your password makes it exponentially harder to crack
  • Do not follow common patterns!
    • Timothy1950 is a common password pattern. Most passwords only contain a capital letter for the first value. Mix things up, don’t end with a year. Even changing the password to 19Tim50othy makes a Mask Attack a lot more time-consuming.
  • Do not share passwords between accounts
    • If one is cracked, hackers normally test other popular websites
    • Even if you follow a pattern, they now have a very powerful Mask Attack to use against you – changing a number or two is easier than guessing 15 characters.