Table partitioning and some numbers.

by Allan Svelmøe Hansen March 17, 2008 09:00

Partitioning is a nifty feature of SQL Server , and in SQL Server 2005 it has been made easy.
I’ll not go into the inner workings of partitioning, for that many other resources exists – for example, Microsoft’s own article is a good place to start. What I’m interested in – mostly – is how it can affect me and how I can utilize it. So I decided to run some small testing to see what I could pull from it.

I started by creating a database spanning multiple filegroups on different physical drives.
CREATE DATABASE ASH
ON PRIMARY (name 'ash'filename 'F:\ASHTest\ash.mdf'size=4mbMAXSIZE=10MBFILEGROWTH=1MB),
filegroup ash_fg1 (name 'ash_fg1'filename='F:\ASHTest\fg1.ndf'size=4mbMAXSIZE=10MBFILEGROWTH=1MB),
filegroup ash_fg2 (name 'ash_fg2'filename='W:\ASHTest\fg2.ndf'size=4mbMAXSIZE=10MB,  FILEGROWTH=1MB)
LOG ON (name 'ash_log'filename='F:\ASHTest\ash_log.ldf'size=1mbMAXSIZE=10MBFILEGROWTH=1MB);
GO
 

Then I created my partition function and partition schema:
CREATE PARTITION FUNCTION ash_partfunc(INT)
AS RANGE LEFT FOR VALUES (12);
GO

CREATE PARTITION SCHEME ash_partscheme
AS PARTITION ash_partfunc
TO (ash_fg1ash_fg2[PRIMARY]);
GO
 

After which I created two identical tables, with the only difference that TestOne is partitioned using the schema and TestTwo is not.
CREATE TABLE TestOne
(
 
MyID INT NOT NULL,
 
MyValue NVARCHAR(255) NOT NULL
)
ON ash_partscheme (MyID)
GO

CREATE TABLE TestTwo
(
 
MyID INT NOT NULL,
 
MyValue NVARCHAR(255) NOT NULL
)
GO
 

Then I filled the tables with the same data
DECLARE @i INT
DECLARE 
@j INT
SET 
@j 0
SET @i 0

WHILE @i 1000000 BEGIN
 INSERT INTO 
TestOne VALUES
 
(@j'abc' LTRIM(STR(@i)))
 
INSERT INTO TestTwo VALUES
 
(@j'abc' LTRIM(STR(@i)))
 
 
SET @j @j 1
 
IF @j 3
  
SET @j 0

 
SET @i @i+1
END
 

Then afterwards it was just running a single query to see the difference.
SELECT *
FROM TestTwo
WHERE MyID 1;
GO

SELECT *
FROM TestOne 
WHERE MyID 1;
GO 
 

This gave me the following estimated execution plan.
(click to enlarge)
Estimated Execution Plan For Partitioning 1

I was very surprised to see so big difference on such simple testing. Note there are no indexes or any other ways of optimizing my queries other then the partitioning.
Remember TestTwo is my unpartioned table and the estimated total cost of the query was about 0.882487 with 0.66 of that being I/O.
For my partitioned table the cost was 0.246486 with 0.221722 of that being labeled as I/O. This is a large difference for such a simple example and exemplifies clearly the benefit of partitioning.
 
So then I thought – what happens when I need not one, but two MyID at the same time, so I ran this query:
SELECT *
FROM TestTwo
WHERE MyID OR MyID 2;
GO

SELECT *
FROM TestOne 
WHERE MyID OR MyID 2;
GO 
 

And it now showed an advantage towards the unpartitioned table, which is logical enough, because now the sql engine has to combine results from 2 filegroups. My surprise however comes from the fact that there seems to be a big difference in I/O still and it is only more CPU resource which is requiered.
(click to enlarge)
Estimated Execution Plan For Partitioning 2

The query run against TestTwo displays the same numbers as before, which is logical because it does a complete table scan. That can only be done in one way.
However the I/O cost of the second query is interesting.
As can be seen from the execution plan the query against TestOne table no consists of a Table Scan which has an I/O cost of 0.221644, as well as a Constant Scan and Nested Loop – both of which does not use I/O.

This means that even though the total cost (1.19178) of the TestOne – the partitioned table – is higher than the one of the TestTwo (0.882487) – the unpartitioned table – it still uses significantly less I/O. This result repeats itself if I run selects without any where clause
SELECT *
FROM TestTwo
GO

SELECT *
FROM TestOne 
GO
 

(click to enlarge)
Estimated Execution Plan For Partitioning 3

The total cost for querying the partitioned table is higher (1.73213 versus 0.882487), the total I/O cost is still significantly lower.
SET STATISTICS IO ON reveals however that TestOne query needs 3 logical scans, which makes sense, seeing as there are 3 partitions and thus 3 “tables”.

Anyways, this clearly shows – at least in my view – that if you can indeed group your data in a manner such as this, it is possible to get a nice performance boost by doing it. However, it is necessary – or at least advisable, if the grouping is so unique that you most likely only would use data from one group the majority of the time as there otherwise looks to be a performance hit. But the advantege is not counting managebility and scalability by having data on multiple files and drives which also can be a factor.

(I have not tested the overhead by inserting data or maintaining the data, that’ll be a blog for another time.)

 

Bookmark and Share DotnetKicks dotnetshoutout

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.