RAID Storage and SQL Server

If you are a SQL Server Administrator, eventually you are going to need to request specific storage for your servers. Depending on the setup at your current company, that is all handled in the background by your Storage Administrators, but if you have the power or are asked for your opinion, knowing about RAID (Redundant Array of Independent Disks) technology is important. You can find full technical explanations on the web for this, but I’ll cover the basics from a SQL Server perspective.

RAID Overview

RAID uses multiple hard drives to improve availability and/or performance. RAID can overcome I/O bottlenecks that would result from using a single disk, provide resiliency from data loss through mirroring, and remove a single point of failure from a single drive being used.

To understand RAID, there are three terms we need to define first.

Mirroring is an idea you should understand intuitively, but perhaps not the exact details in relation to RAID. Disk Mirroring replicates data from one disk to another, providing a second copy for disaster recovery, and thus requiring at least two disks. RAID mirroring is performed synchronously and mirrored data can be read from either disk. More on Mirroring.

Striping means that the data is being separated onto multiple drives in a consecutive fashion. By separating the data onto different drives, the I/O load can be balanced across the disks and read times are faster. The more disks that the data is striped across, the faster the data throughput will be; however, if one device fails, since the data is spread evenly across all the disks involved in the striping, all the data will be corrupted and unable to be read. More on Striping.

Parity is probably the hardest term to understand or explain. In the most basic, mathematical sense, parity refers to whether an integer is even or odd. For computing, the definition is specifically whether the total value of 1’s occurring in a given binary number is even or odd. For RAID, parity data bits are a combination of existing data bits to provide redundancy. In the case of any single drive failure, the remaining data can be combined with the parity data to reconstruct the missing data. More on parity.

RAID Levels

Although there are more levels of RAID, for instance, 2, 3, 4…etc., they are rarely used, especially for SQL Server. I’ll just be explaining the four main types here.

 

Raid0RAID 0 (Striping) This basic form of RAID stripes data across multiple disks. Reads and writes occur simultaneously across all disks involved and thus provides faster reads and writes compared to a single disk. The more disks involved, the faster the performance. This creates multiple points of failure though, and is not really recommended for database use due to the increased vulnerability.

Pros: Improved Performance of Read and Writes
Cons: No Redundancy and any drive failure corrupts all data.

Raid1RAID 1 (Mirroring) This RAID level mirrors, or duplicates, data between a minimum of two disks. Mirroring requires 50% more storage since the mirror is an exact copy of the original data. Read speeds are faster since any disk can respond to a read request. Write speeds are reduced due to copying the data to multiple locations. Read times can be as fast as the fastest drive, while write times are often as slow as the slowest drive. If you need a relatively cheap method to protect your data, this is a good option. If one drive fails, you still have a perfect copy of the data on the other.

Pros: Redundancy, Faster Reads
Cons: 50% Extra Storage, Slower Writes

Raid5RAID 5 (Striping & Parity) This is the likely the most common type of RAID used, but requires at least three disks. Data and parity calculations are striped across all the disks. Since data is not mirrored, less storage is ‘wasted’, resulting in only a minimum of 1/3rd (1 / total # of drives) of the storage space used for redundancy. In the event of a drive failure, the data can be reconstructed using the parity data, but at a cost. There is a significant impact when one disk fails due to the parity data reconstruction overhead. Losing the 2nd drive in a three disk RAID 5 configuration will result in the entire array going offline and data being lost. Always replace after the first failure as soon as possible! Since write speeds are slower with RAID 5, it is not the best choice for Transaction Logs or Data Files. Backup drives are a prime candidate for this RAID level though since write speed is not as important.

RAID 6 is growing in popularity; it’s identical to RAID 5, except it adds an extra disk with another set of parity data. This RAID level requires a minimum of four disks and can handle up to two drive failures.

Pros: Fault Tolerance, Fast Reads (until a drive fails)
Cons: Slower Write Speeds, More Expensive than RAID 1 or 0

RAID 0+1 (Mirroring & Striping) Data is striped and subsequently mirrored in this RAID level. This incurs the 50% storage redundancy loss. There is fault tolerance for a single drive failure, but this reduces the RAID to essentially RAID 0 with no redundancy.

Raid10RAID 10 (Striping & Mirroring) Data is mirrored first and then striped with this method. As long as one drive from each side of the mirror remains, no outage will occur. With more fault tolerance, this is the preferred high-end method over RAID 0+1.

Pros: Fault Tolerance, Best Read/Write Speeds
Cons: Lots of Disks and Lots of Money

Both RAID 0+1 and 10 are hybrid RAID levels and provide the best read and write performance, but double the required storage. This is what everyone wants but not everyone can afford. It combines the best of the previous configurations but requires many disks and therefore a big budget. These RAID levels are best suited for high traffic, like your Data and Log files. If you cannot afford RAID 10, RAID 5 is a decent alternative though.

Advertisements

T-SQL Tuesday #68: Just Say No to Defaults

TSQL2sDay150x150_388014A5This month’s topic for T-SQL Tuesday is hosted by Andy Yun (blog | twitter) with the subject of defaults

This month, I’d like to ask everyone to blog about SQL Server Defaults. While SQL Server CAN be run with just the default settings, doing so is far from ideal. As we progress through our careers, many of us build lists of things we change. Let’s blog about those! And don’t think this month’s topic is constrained to just the SQL Server engine. SSIS & SSAS are absolutely fair game as well. SQL Server developer tools – sure! Want to take a different spin on “defaults,” go right ahead and be creative!

My job almost revolves around building out new SQL servers, so having an organized method is imperative. The list of security changes we make is a mile long. I may not even be exaggerating…However, there are a few changes from the default settings that I feel are the most important. I’m only going to touch on a few avoid an impossibly long read, and the likely redundant mentions.

Add TempDB Data Files

While Aaron Bertrand has some amazing ideas for how TempDB configuration should look during installation, at least SQL 2016 is making some improvements to encourage multiple data files. No one should be using a single data file by default. A good start is to have one data file per core, up to eight. Afterwards, it gets a bit more complicated.

TempDB

If this doesn’t look wrong to you…well I’m sorry.

Adjust the Autogrowth Settings

One of my biggest pet peeves when I look at a new SQL Server and see default autogrowth settings. It just drives me crazy. Just yesterday, I found a database that was about a gigabyte in size, and yet the autogrowth was still set to 1MB. Can you imagine how many times it has to autogrow? That’s a resource hit every single time. Increase your autogrowth to a reasonable size so that it doesn’t have to autogrow every hour or more! Percent autogrowth is even more dangerous, especially for a log file. Logs can grow quickly, especially during something like index maintenance. While 10% of 100 MB may not seem like much, 10% of 1 TB can be temporarily devastating during high traffic times. Even worse, what do you do when your 10% growth triggers but your drive isn’t big enough to handle that?

test

This angers me so much!

Change Max Server Memory

There is a lot to be said about changing the maximum memory allowed by SQL Server. I’ve previously shared a way to get it for all your servers and some suggestions on what to set max memory to. The amount to limit it by varies, but the most important thing is to limit it, otherwise your OS may be memory starved and bring everything to its knees.

Change Cost Threshold for Parallelism & Max Degree

I listed these together because they go hand in hand. There are innumerable debates on what these should be, and it really depends on your server, but almost everyone agrees that the default settings are too relaxed. I’ve seen arguments for a Cost Threshold of 30-60 for a new default. Starting at a number in that range is a good idea. You can adjust it after testing your more problematic queries. However, as more people start using Columnstore indexes, it could mean you need lower thresholds or force serial plans. Perhaps someday we will need a specialized setting to set default cost thresholds for ColumnStores only?

As for the max degree, even Microsoft agrees the default is bad. Changing this appropriately requires knowing your system hardware and maybe doing a bit of math. It should be based on your number of CPUs, but you’ve got some wiggle room depending on how busy your system is. Many queries firing at once indicate a smaller Max DOP, while the inverse is also true.

Change the Default SQL Server Port

Everyone knows that the default port for SQL Server is 1433 and that the Browser uses 1434. You can change the default port. It’s a very easy way to prevent very simple hacking attempts. Sure, you can still be port scanned, but why skip an easy configuration change if it can help your security even by a little bit? You can even take it a step further and blacklist (or whitelist only what you want) any address that attempts to connect through the default port. That requires a bit more planning though.

Make Management Studio Yours

There are a lot more Server settings out there, but I want to mention something about Management Studio. Configure it so that it works for you. Personally, I’m OK with the color scheme, light on dark bothers my eyes, but some people love it.

I always dock my toolbars using AutoHide. I want to get to my registered servers, etc., quickly, but I still need full screen real estate when using the toolbars. Too much stuff.

AutoHide

Finally, my favorite change is to move the Status Bar to the top of the screen. It felt weird the first time, but now it’s distracting each time I open Management Studio for the first time and the bar is back on the bottom. It’s much more natural to see what is going on when the bar is at the top. The problem is, I have to change this so rarely, I normally spend 5 minutes hunting for the correct setting. You can find this under Tools -> Options -> Text Editor -> Editor Tab.

StatusBarI also change the defaults under Tab Text to only Include File Name. Everything else is just fluff that I can find on the status bar.

 

PowerShell Serial Function to Parallel Workflow

I’ve been feeling very underwhelmed about blogging for the past week, so I needed something short and simple to discuss this time. Luckily I had an old script that I could adapt and explain.

PowerShell introduced Workflows in version 3, and they are particularly wonderful for the ability to create a ForEach loop that runs in parallel. This can significantly reduce your run times when making numerous remote connections, but with most things, it depends on your use and environment.

A standard PowerShell function executes against multiple computers in a serial fashion, meaning one at a time. This is perfectly fine for a small number of computers, but as the list increases so does the time required. Adapting to a Workflow is not especially difficult, but you may run into some invalid commands. In my below examples, I built a function and workflow with nearly identical syntax. If you look closely though, you’ll see that the Get-Service cmdlet includes a -PSComputerName parameter instead of the standard -ComputerName parameter. Also, Out-Null was not included in Workflows. I had to pipe to InlineScript in order to get that to work. Further details about Workflow scripting can be found here.

The example functions execute serially while the workflow executes in parallel. Get-ServiceTimed utilizes a standard cmdlet, but also calculates the total run time like the rest of the examples. I’ve included it solely for comparison to Get-ServiceSerial which has a ForEach loop – so as to be as similar to the workflow as possible.

Each function calculates and returns the runtime.

Cmdlet
Executed in 33.51 seconds against 32 computers.
Executed in 0.03 seconds against 2 computers.

Function Get-ServiceTimed
{
Param($Computers)
$StartTime = Get-Date
Get-Service -ComputerName $Computers | Out-Null
"Run Time - Cmdlet - $(((Get-Date) - $StartTime).TotalSeconds) seconds."
}

ForEach Serial
Executed in 29.30 seconds against 32 computers, just as slow as using the cmdlet normally.
Executed in 0.03 seconds against 2 computers.

Function Get-ServiceSerial
{
Param($Computers)
$StartTime = Get-Date
ForEach ($Computer In $Computers)
{ Get-Service -ComputerName $Computer | Out-Null }
"Run Time - Serial - $(((Get-Date) - $StartTime).TotalSeconds) seconds."
}

ForEach Parallel
Executed in 12.36 seconds against 32 computers. More than twice as fast as the serial execution.
Executed in 5.88 seconds against 2 computers. Significantly slower than any serial test.

WorkFlow Get-ServiceParallel
{
Param($Computers)
$StartTime = Get-Date
ForEach -Parallel ($Computer In $Computers)
{ Get-Service -PSComputerName $Computer | InlineScript { Out-Null } }
"Run Time - Parallel - $(((Get-Date) - $StartTime).TotalSeconds) seconds."
}

Summary I only supplied a few example execution times because times vary wildly depending on network traffic and configuration. Your results may vary greatly. Serial execution was actually faster than parallel execution on average until I was connecting to more than ten computers. If you only work with a handful of computers, you may not need to worry about this. I’m often connecting to at least 30 computers at once, in which case parallel execution is always much faster.