SQL Server Storage: Reading Block Size

This is the final post in the SQL Server Storage line of blog posts I’ve made. First we discussed Pages & Extents, then how to Read A Page using T-SQL, and finally this post will be about Disk Partition Offsets and Allocation.

Knowing that Pages are 8 KB and Extents are 64 KB, it’s understandable that SQL would have the best performance when the disks are aligned in the same manner. However, in earlier versions of Windows, this was never the case. Instead, Windows used an alignment configuration of 63 512 Byte sectors for a total of 31.5 KB. Since the way SQL Server reads and writes was not aligned with Windows, havoc ensued in the form of increased I/O operations. As of Windows 2008, partitions are aligned at 1024 KB by default, providing fewer I/O operations since 1024 is a multiple of 64.

I tried to keep the above explanation simple, but it still got wordy. The takeaway should be: you want SQL Server disks to have a Starting Offset that is a multiple of 64 KB, with the desirable default being 1024 KB. Higher values are fine for special circumstances, just so long as your Starting Offset is evenly divisible by 64.
As for your block size, that should also be 64 KB, or the size of a single Extent, to improve performance.

There is a single script to find both Block Size and Starting Offset, and it almost even works!

Get-WmiObject win32_DiskPartition |
	SELECT SystemName, Name, BlockSize, StartingOffSet |
	FT -Auto

That script returns the correct Starting Offset, but that Block Size is a screwy number that should not be trusted. To make matters worse, you get a Partition Number, but no Drive Letter. That doesn’t help me understand if my data disk is formatted correctly. In comes another query.

Get-WmiObject -Class win32_Volume |
    FT DriveLetter, Label, BlockSize -Auto

Great, now we have an accurate BlockSize and a Drive Letter! The only problem is, now there is no Partition Number to match up with the OffSet from the first query. Getting ALL the information at one time is a pain. Luckily there are Custom Objects to help us smash results together and Win32_LogicalDisk to help us relate the drives and partitions.

$Report = @()
$Disks = Get-WMIObject Win32_logicaldisk | WHERE { $_.DriveType -eq '3' }
$Result = ForEach ( $Disk in $Disks)
{
    $OffSet = Get-WmiObject -Query "Associators of {Win32_LogicalDisk.DeviceID='$($Disk.DeviceID)'} WHERE ResultRole=Antecedent"
    $Report = [PSCustomObject] @{
		ComputerName = $Disk.SystemName
		DriveLetter = $Disk.DeviceID
		Partition = $OffSet.Name
		BlockSize = ( Get-WmiObject Win32_Volume | WHERE {$_.DriveLetter -eq "$($Disk.DeviceID"} | SELECT -expand BlockSize )
		StartingOffSet = $OffSet.StartingOffSet
    }
    $Report
}
$Result | FT -Auto

Now you can determine the Block Size and Starting Offset of your drives easily. Changing those values, well that’s another story. Normally that will require formatting the drive, so its best to get it right in the first place. Make sure all your SQL Server drives are formatted correctly before you get started. Just educate your friendly Storage Admin on what SQL Server needs so everything is correct by the time you get it.

Advertisements

SQL Server Storage: Reading Pages with T-SQL

This is a continuation post from last weeks’ SQL Server Storage: Pages and Extents explanation. Since the description was a bit longer than I originally expected it to be, I decided to split the scripts into more posts.

First, I’m going to quickly cover how you can see information about a specific page using T-SQL. Now, normally this won’t be of much use to you, but it’s fun to play around with a bit just to see how things are actually stored. You might have need to read a page during some heavy troubleshooting at some point in the future too.  This procedure is undocumented though, so information is scarce and the feature could disappear without notice in future versions.

To read a page, you’ll need to utilize DBCC PAGE which I’ve listed the basic layout for below.


DBCC PAGE
(
  'DbName' OR DbId -- Database name or Database ID, either one!
  ,FileNumber -- File Number of the Page
  ,PageNumber -- Page Number in the File
  ,PrintOption -- Display option ranging from 0-3 with different info from each
);

Now we need to know what FileNumber and PageNumber to supply to DBCC PAGE though. Random numbers might work, but if you are actually trying to do anything halfway useful, they won’t get you far. To solve this problem, we have to utilize another procedure…DBCC IND


DBCC IND
(
  'DbName' OR DbId -- Database name or Database ID, either one!
  ,TableName -- Table Name...I don't think this really needs a comment
  ,IndexId -- index_id from sys.indexes; -1 = indexes and IAMs, -2 = IAMs
);

Using DBCC IND we can get some relevant data to pass into DBCC PAGE. The problem is, you still need a relevant Index_ID  for the selected table. The following query can help with that. Just supply the desired TableName in the WHERE clause.


--Get the index_id relating to your desired TableName to pass into DBCC IND
SELECT * FROM sys.indexes
WHERE OBJECT_NAME(object_id) = 'TableName'

SysIndexesResults

A sample result of sys.indexes. Grab the index_id and plug that into DBCC IND

 

With an index_id and a TableName in mind, we can get some results from DBCC IND.

DBCC_Ind_Results

If you are just testing, index_id = 1 is not a bad idea to check.

Awesome, now we have meaningful ids to use with DBCC PAGE. You’ll need to do one more thing before you run it though. Trace flag 3604 has to be set for SQL to provide output. Without it, you won’t get any results at all.

NoTraceFlagSet

No Trace Flag means no results


DBCC TRACEON(3604)
DBCC PAGE('DbName',1,100,3) WITH TABLERESULTS;
GO

With the Trace Flag on, you’ll get more data than you probably know what to do with.

DBCC_Page_Results

DBCC Page provides LOTS of information, I only captured a tiny snippet for your visual delight

Now you know how to get page information about your tables, so long as these undocumented procedures are available. Since this post again got longer than I originally expected, I’ll cover reading and setting block sizes on your disks next week. That will involve some fun PowerShell too!

SQL Server Storage: Pages and Extents

It’s time for another SQL Server refresher today! This time we will discuss some storage basics, specifically Pages and Extents and how they relate to each other. There are a lot of resources out there discussing these storage units, but I’ve tried to put my own spin on things and aggregate as much data as I could find about them.

Pages

Naturally we are going to discuss pages first, since they are the most fundamental unit of storage for SQL Server. Pages store everything in the database and are only 8 KB small. Your entire disk I/O is performed at the page level.

Page

An example data page layout is easier to visualize than explain

Pages have three major components, a page header, records, and the offset array. Pages start with a 96 byte header which contains meta-data, like the page number, owner’s object id, and page type. Pages end with the offset array which is 36 bytes and has pointers to each new row stored in the page.  These pointers are stored last to first, but that’s more easily explained in the picture. The offset array is essentially the index for the page. The middle of the page is the records, and consists of the remaining 8060 bytes containing stored data.

There are different types of pages, such as data, index, image, and a number of informational pages. Probably the most interesting type are overflow pages. If a row is greater than 8060 bytes, the data can be stored on overflow pages which are linked together. Overflow pages can store as much as 2GB in a single column, but obviously this is less than ideal. The performance impact increases since each extra page increases read times. The most obvious example of this situation is VARCHAR(MAX) or VARBINARY(MAX) datatypes. Data type limitations normally relate directly to the size of a page, (MAX) datatypes effectively bypass the limit and cause overflow pages. For instance, VARCHAR(8000) and NVARCHAR(4000) are the normal limits and based on the size of a single page. Using (MAX) datatypes that span multiple pages increases reads causing less than stellar performance. Queries like SELECT * can grab poorly performing datatypes accidently and should be avoided as much as possible.

Extents

Extent

Extents can be Uniform and have all the same type of pages or Mixed with a combination of page types

Simply put, extents are groups of pages. Extents consist of exactly eight contiguous pages, with every page being part of an extent. In other words, pages always come in groups of eight, so data grows in a minimum of 64 KB increments. Unlike the many types of pages, there are only two types of extents.

Mixed Extents: In these extents, pages are allocated to multiple objects, or different types of pages. New tables or indexes are put into mixed extents for storage efficiency. When a small table is made that would consist of less than eight pages, it gets stored in a mixed extent with other similarly small objects. If a database grows large enough to fill an entire extent, it can utilize the next type of extent.

Uniform Extents: These extents have pages that are all allocated to the same object. Larger databases often have extents with identical page types, such as data pages or index pages. More data can be read in a single read operation with uniform extents, so performance can see an improvement.

Originally I had planned to provide some example scripts to discover information about your pages, and storage setup, but in an effort to keep the information in byte-sized chunks, I’ll continue with that next week.

RAID Storage and SQL Server

If you are a SQL Server Administrator, eventually you are going to need to request specific storage for your servers. Depending on the setup at your current company, that is all handled in the background by your Storage Administrators, but if you have the power or are asked for your opinion, knowing about RAID (Redundant Array of Independent Disks) technology is important. You can find full technical explanations on the web for this, but I’ll cover the basics from a SQL Server perspective.

RAID Overview

RAID uses multiple hard drives to improve availability and/or performance. RAID can overcome I/O bottlenecks that would result from using a single disk, provide resiliency from data loss through mirroring, and remove a single point of failure from a single drive being used.

To understand RAID, there are three terms we need to define first.

Mirroring is an idea you should understand intuitively, but perhaps not the exact details in relation to RAID. Disk Mirroring replicates data from one disk to another, providing a second copy for disaster recovery, and thus requiring at least two disks. RAID mirroring is performed synchronously and mirrored data can be read from either disk. More on Mirroring.

Striping means that the data is being separated onto multiple drives in a consecutive fashion. By separating the data onto different drives, the I/O load can be balanced across the disks and read times are faster. The more disks that the data is striped across, the faster the data throughput will be; however, if one device fails, since the data is spread evenly across all the disks involved in the striping, all the data will be corrupted and unable to be read. More on Striping.

Parity is probably the hardest term to understand or explain. In the most basic, mathematical sense, parity refers to whether an integer is even or odd. For computing, the definition is specifically whether the total value of 1’s occurring in a given binary number is even or odd. For RAID, parity data bits are a combination of existing data bits to provide redundancy. In the case of any single drive failure, the remaining data can be combined with the parity data to reconstruct the missing data. More on parity.

RAID Levels

Although there are more levels of RAID, for instance, 2, 3, 4…etc., they are rarely used, especially for SQL Server. I’ll just be explaining the four main types here.

 

Raid0RAID 0 (Striping) This basic form of RAID stripes data across multiple disks. Reads and writes occur simultaneously across all disks involved and thus provides faster reads and writes compared to a single disk. The more disks involved, the faster the performance. This creates multiple points of failure though, and is not really recommended for database use due to the increased vulnerability.

Pros: Improved Performance of Read and Writes
Cons: No Redundancy and any drive failure corrupts all data.

Raid1RAID 1 (Mirroring) This RAID level mirrors, or duplicates, data between a minimum of two disks. Mirroring requires 50% more storage since the mirror is an exact copy of the original data. Read speeds are faster since any disk can respond to a read request. Write speeds are reduced due to copying the data to multiple locations. Read times can be as fast as the fastest drive, while write times are often as slow as the slowest drive. If you need a relatively cheap method to protect your data, this is a good option. If one drive fails, you still have a perfect copy of the data on the other.

Pros: Redundancy, Faster Reads
Cons: 50% Extra Storage, Slower Writes

Raid5RAID 5 (Striping & Parity) This is the likely the most common type of RAID used, but requires at least three disks. Data and parity calculations are striped across all the disks. Since data is not mirrored, less storage is ‘wasted’, resulting in only a minimum of 1/3rd (1 / total # of drives) of the storage space used for redundancy. In the event of a drive failure, the data can be reconstructed using the parity data, but at a cost. There is a significant impact when one disk fails due to the parity data reconstruction overhead. Losing the 2nd drive in a three disk RAID 5 configuration will result in the entire array going offline and data being lost. Always replace after the first failure as soon as possible! Since write speeds are slower with RAID 5, it is not the best choice for Transaction Logs or Data Files. Backup drives are a prime candidate for this RAID level though since write speed is not as important.

RAID 6 is growing in popularity; it’s identical to RAID 5, except it adds an extra disk with another set of parity data. This RAID level requires a minimum of four disks and can handle up to two drive failures.

Pros: Fault Tolerance, Fast Reads (until a drive fails)
Cons: Slower Write Speeds, More Expensive than RAID 1 or 0

RAID 0+1 (Mirroring & Striping) Data is striped and subsequently mirrored in this RAID level. This incurs the 50% storage redundancy loss. There is fault tolerance for a single drive failure, but this reduces the RAID to essentially RAID 0 with no redundancy.

Raid10RAID 10 (Striping & Mirroring) Data is mirrored first and then striped with this method. As long as one drive from each side of the mirror remains, no outage will occur. With more fault tolerance, this is the preferred high-end method over RAID 0+1.

Pros: Fault Tolerance, Best Read/Write Speeds
Cons: Lots of Disks and Lots of Money

Both RAID 0+1 and 10 are hybrid RAID levels and provide the best read and write performance, but double the required storage. This is what everyone wants but not everyone can afford. It combines the best of the previous configurations but requires many disks and therefore a big budget. These RAID levels are best suited for high traffic, like your Data and Log files. If you cannot afford RAID 10, RAID 5 is a decent alternative though.