SQL Server: To Compress or Not to Compress, That Is the Question

SQL Server: To Compress or Not to Compress, That Is the Question

Are you ready to have some fun with SQL Server compression?  Come on, let's party!  As with all things in life, you seek solutions based on a need SQL Server is no different.  That said, compression is no different than anything else in life – a solution based on a need. Welcome to SQL Server data compression. I hope you enjoy the ride!

Compression

So what does table compression do?  It decreases your storage requirements on disk by compressing your data.  “So it saves disk space?” you may be asking.  It sure does!  But it also means that more data fits on a page both on disk and in memory.  It helps you squeeze more space out of that buffer pool.  One other cool thing about compression is that it is pretty darn flexible in terms of being able to pick and choose the objects to compress.

So here are the steps you’ll take. First, you’ll need to find out if you would benefit from compression. If you would benefit from compression, then it’s time to decide what kind of compression is going to work best (aka – most bang for the buck). Lastly, you’ll implement compression and determine what kind of downtime you’ll need or performance impacts you’ll see.

I often consider using table/index compression as a solution when I hear about these needs:

  • Disk space pressure
  • Memory performance issues
  • Disk I/O performance issues

There are several different ways you can identify these symptoms.  Everyone has their own preferences for how they make their determinations, but I am going to talk about how I like to do my investigation.

Is Compression the Cure?

Disk Space Pressure

This one’s pretty straightforward.  Do you have much, or any, space available on the presented drive that your database files are stored on?  More importantly, if this is a shared storage environment, is it possible to easily add additional capacity to the LUN(s)?  The first question can be answered simply by looking at disk management (diskmgmt.msc).  The second one may involve taking your wonderful SAN administrator out to lunch to ask (and it certainly wouldn’t hurt to pay for said lunch and bat your eyes at him/her as you beg, plead and/or threaten to key their car for additional capacity).

It is generally considered a best practice to right size your data files to account for growth and monitor your free space within the data files so that you can grow them in a manual and controlled manner, so this should not really be fluctuating very much.  You will more than likely want to take a look at the free space within your data files to find out how close you are to either an autogrowth event or outgrowing your available disk space, depending on how much room you left yourself on disk for growth.  I put together the following query to allow you to take a look at that information (it executes at an instance level and iterates all databases within the instance):

set nocount on;

if object_id( 'tempdb..#results' ) is not null
	drop table #results;

create table #results
(
	name nvarchar( 256 ) null,
	Unallocated_Size_in_DataFile_MB decimal( 18, 4 ) null,
	Allocated_Size_in_DataFile_MB decimal( 18, 4 ) null,
	Total_DataFile_Size_on_Disk_MB decimal( 18, 4 ) null,
	Percent_DataFile_Allocated decimal( 18, 4 ) null,
	Percent_Mixed_Extents decimal( 18, 4 ) null
);

declare @currDbId as int;
declare @currDbName as nvarchar( 256 );
declare @sqlStatement as nvarchar( max );

declare dbCursor cursor fast_forward for
	select 
		database_id,
		name
	from 
		sys.databases
	where 
		database_id > 4;

open dbCursor;

fetch next from dbCursor into @currDbID, @currDbName;
while( @@FETCH_STATUS = 0 )
	begin
		set @sqlStatement = N'
			select 
				DB.name,
				cast( ( ( cast( FSU.unallocated_extent_page_count as decimal( 18, 4 ) ) * 8 ) / 1000 ) as decimal( 18, 4 ) ) as Unallocated_Size_in_DataFile_MB,
				cast( ( ( cast( FSU.allocated_extent_page_count as decimal( 18, 4 ) ) * 8 ) / 1000 ) as decimal( 18, 4 ) ) as Allocated_Size_in_DataFile_MB,
				cast( ( ( cast( FSU.total_page_count as decimal( 18, 4 ) ) * 8 ) / 1000 ) as decimal( 18, 4 ) ) as Allocated_Size_on_Disk_MB,
				cast( ( ( ( ( cast( FSU.allocated_extent_page_count as decimal( 18, 4 ) ) * 8 ) / 1000 ) / ( ( cast( FSU.total_page_count as decimal( 18, 4 ) ) * 8 ) / 1000 ) ) * 100. ) as decimal( 18, 4 ) ) as Percent_Allocated_in_DataFile,
				cast( ( ( cast( FSU.mixed_extent_page_count as decimal( 18, 4 ) ) / ( cast( FSU.total_page_count as decimal( 18, 4 ) ) + ( 8 - ( FSU.total_page_count % 8 ) ) / 8 ) ) * 100. ) as decimal( 18, 4 ) ) as Percentage_Mixed_Extents
			from 
				' + @currDbName + '.sys.database_files as FI
					inner join ' + @currDbName + '.sys.dm_db_file_space_usage as FSU on FSU.file_id = FI.file_id
					inner join ' + @currDbName + '.sys.databases as DB on FSU.database_id = DB.database_id
			where
				FI.type_desc = ''ROWS'';
			';

		insert into #results
			exec sp_executesql @stmt = @sqlStatement

		fetch next from dbCursor into @currDbID, @currDbName;
	end;

close dbCursor;
deallocate dbCursor;

select * from #results;

Memory Performance

Memory performance is best looked at from a holistic view and not from the silo of a singular memory counter. If you are fortunate enough to have a SQL monitoring solution, you will likely already have historical information that you can reference. In the event that you do not have a monitoring solution already in place, the quickest way to get this information is to use the Performance Monitor (perfmon.msc) and the PAL (Performance Analysis of Logs) tool.

I’m going to describe the minimum set of counters that I like to use. But before I get to that, let me say a few things about the PAL tool and thresholds. Most of the rule-of-thumb thresholds I use come from the PAL Tool’s SQL Server 2012 threshold file. If you haven’t heard of this tool, it lives on the CodePlex website and is a wonderful tool to use for diagnosing system issues (not just SQL Server). You can, in fact, use the PAL Tool to export a canned performance template that you can use to gather most of this information and make life easy (https://pal.codeplex.com/). I will caution you to let your Collector Set run somewhere… anywhere that isn’t production so you can take a look at what the size and performance impacts are going to be (especially if you are running the full set of counters that the PAL Tool exports for you in the template).

The memory counters that I like to look at are:

  • SQLServer: Buffer Manager – Page life expectancy
    • Unhealthy < 300s (5 minutes)
    • This is the duration, in seconds, that a data page stays in the buffer pool.  Higher is better for this statistic.
    • Note: if your SQL Server has multiple NUMA nodes, it is a good idea to also include SQLServer: Buffer Node – Page life expectancy.
  • SQLServer:SQL Statistics – Batch Requests/sec
    • There is no unhealthy threshold for this, it is merely a metric to give you an idea of how busy your SQL Server is at that time
    • This statistic is included for contextual purposes to help identify if there are a lot of simultaneous requests vs. few, heavy queries.  Why do you care?  Because it’s infinitely less effort and less impactful to the system to tune queries and indexes than it is to compress your data.
  • SQLServer: Memory Manager – Pages/sec
    • Unhealthy > 500
    • A high value doesn’t necessarily indicate an issue, but it is contextually valid for getting a view of the big picture.
  • SQLServer: Buffer Manager – Lazy writes/sec
    • Unhealthy > 20
    • This is how many times per second the lazy writer is flushing dirty pages from the buffer pool to disk.  This counter should increase as page life expectancy decreases.
  • Paging File – % Usage
    • Unhealthy > 60-70%
    • This is more indicative of external memory pressures (assuming maximum server memory has been configured for the instance)
  • Memory – Available Mbytes
    • Unhealthy < 200 MB
    • Your SQL server instances should have their maximum server memory settings configured to allow the operating system and any other applications on the server to have adequate memory available to complete their roles.  If you see this decreasing, the OS/applications on the server may require additional memory.

Some of the counters above are not explicitly relevant for identifying SQL Server memory pressures (and are also not in the PAL template), but they are useful in identifying whether memory issues are external or internal, which I like to do.  Why do I like to know if it’s internal vs. external memory pressure?  Internal memory pressure is more adequately solved with compression, however, you can also address external memory pressure with compression and decreasing SQL Server’s memory footprint (but hey, we’re DBAs and we don’t usually want to decrease our memory footprint!).  The most significant indicators of internal memory pressure are going to be high lazy writes/sec and low/decreasing Page Life Expectancy.  External memory pressures are going to be evident if you notice low Memory – Available Mbytes and high Paging File – % Usage.

Once you have an adequate sample size, you can either evaluate it by eyeballing it or by pointing the PAL Tool to the counter log path to do the analysis for you.  You're looking for any signs of memory pressure (even external memory pressure, although there are other things that you can do to mitigate that as well).  One last comment, this is just a piece of the puzzle.  Use it in conjunction with other clues.

Disk I/O Performance

Now you would think that disk I/O performance would be the only thing to look at to identify whether you want to compress your data (After all, making the access methods pull fewer pages is the most important thing, right?! Nope…). To figure out if your storage subsystem needs a break, it’s time to dust off PerfMon again and pull another two counters for investigation. Here are the counters:

  • Physical Disk: Avg. disk sec/Read
    • Less than stellar > .010
    • Unhealthy > .020
    • This counter tells you how long (in seconds) it takes to perform a read operation from disk.  Remember, this increases as the demand on the storage subsystem increases.  Make sure you obtain samples from a long enough period of time to hopefully even obtain a few times when you have depleted the reserves of the SAN’s disk caches (if you are in a SAN environment).  We want to make sure we’re getting some figures that observe all pieces of your storage access profile.
  • Physical Disk: Avg. disk sec/Write
    • Less than stellar > .010
    • Unhealthy > .020
    • This counter tells you how long (in seconds) it takes to perform a write operation to disk.  I could copy and paste the same information as what was written for Avg. disk sec/Write, but I think we get the point.

Unhealthy disk metrics could certainly benefit from compression (less so if you have poorly tuned queries and indexes that do a lot of scanning, but still some benefit).

Free Data Compression Guide!

So now you've determined whether or not you can benefit from compression. If compression is the answer, you now need to decide what kind of compression is going to work best and ultimately what kind of downtime you’ll need — or performance impacts you’ll see.

Quick Guide to Data Compression

It will delve into some hoops you'll have to jump through, the different types of data compression, when NOT to compress, balanced scorecards, page compression, row compression and more!


Next Steps

Review our case studies and engagements where we helped companies just like yours solve a variety of business needs.


About Oakwood

Since 1981, Oakwood has been helping companies of all sizes, across all industries, solve their business problems.  We bring world-class consultants to architect, design and deploy technology solutions to move your company forward.   Our proven approach guarantees better business outcomes.  With flexible engagement options, your project is delivered on-time and on budget.  11,000 satisfied clients can’t be wrong.

Like what you've read? Please spread the word!

Leave a Reply

Your email address will not be published. Required fields are marked *