Partition removed by adding index

by Allan Svelmøe Hansen April 29, 2008 13:17

During some testing of partitioned tables to give myself more insight in the possibilities and effects (more numbers), I ran into something weird.
I created tables as I did in my earlier testing, however now when I added indexes to the tables, I found all performance gain was gone, and when I removed indexes, I noticed my tables were no longer partitioned.
Testing some more, backtracking my steps, I found out that it was the creation of indexes which caused my partitioned tables to become non-partitioned.

I have created the tables via SQL scripts but I created the indexes via SQL Server Management Studio (SSMS as it is usually abbreviated). This creation of indexes did not take into considerations that I had partitioned tables, because I found that adding an index changed the table to non-partitioned.

Then I started to read the documentation for create index SQL command (try first, then read the documentation …… isn’t that how it usually goes? Innocent ) and there I read that I could specify an partition scheme.

CREATE CLUSTERED INDEX <INDEX NAME>
ON TABLE  (<COLUMNS>)
ON <PARTITION SCHEME>

So while I did know that you could partition your indexes, I was not aware of the fact that apparently SSMS alters your table and removes the partitioning somehow.
Annoying, but at least now I know.

Bookmark and Share DotnetKicks dotnetshoutout

More table partitioning and numbers

by Allan Svelmøe Hansen April 02, 2008 12:50

Following up on my previous testing of table partitioning, I wanted to see if I could identify any notable overhead when using partitioned tables other then in selects, meaning delete, update and of course inserts. At the same time I thought I could test out with a partitioned table but where the files was located on the same physical drive.
So I created a new database, 2 partition functions and schemas and 3 new tables. TestOne was partitioned as my last time, and so was TestTwo. TestThree table was partitioned exactly like TestOne except it utilized file groups located on the same physical drive, whereas TestOne had its filegroups split across two physical drives.
I found out that selects provided no noticeable performance difference between TestOne and TestThree tables. I would think the main area where one would see performance differences here is dependent on hardware and load on the drives, and less so with the SQL Server performance.  So further testing into this will be outside the scope of at least this blog entry.

So – onwards to testing other SQL then selects. The first I tested was ordinary inserts for each of my 3 tables.

INSERT INTO <TABLE>
VALUES (1'test')
GO

When running this it provided me with this execution plan:

(click to enlarge)
Estimated Execution Plan For Insert

Not surprisingly there is an overhead visible when it comes to inserting into a partitioned table compared to a non-partitioned one.  Plus the plans do not look the same. I would simply attribute the overhead to the fact that SQL Server needs to look up which file group to locate the data in based on the partition function.
The difference in this instance does however look relative small. The entire subtree cost for inserting into a partitioned table (TestOne and TestThree) were 0.010471 whereas for TestTwo it was 0.01.
So while the specific numbers might differ from system to system etc, it does show an overhead, albeit a small one.
When it came to both delete and updates, I found that the same issues as with select statements were in effect.
When I only needed to delete or update on a specific "grouping" of data, the partitioned tables were actually faster than the non-partitioned one. When I had to update across "groups" then the non-partitioned one was fastest.
For example:

UPDATE TestOne
SET MyValue='Test2'
WHERE MyID 1
GO
 

was faster then

UPDATE TestTwo
SET MyValue='Test2'
WHERE MyID 1
GO
 

by a factor around 20%, whereas

UPDATE TestOne
SET MyValue='Test2'
WHERE MyID OR MyID  2
 

was about 10% slower then

UPDATE TestTwo
SET MyValue='Test2'
WHERE MyID OR MyID  2
 

This leads back to my conclusion in the last piece that if you in fact can group your data in a manner which makes sense and avoids “cross-grouping data”, then it is faster performance wise to do so. The only time it looks to be noticeable slower to have partitioned data is with inserts.
This of course is not an ever valid conclusion, and I’m sure more concrete and scientifically gathered data could present cases against. But if your data abides to some rules which makes a grouping possible and the main focus on the data is reading versus inserting; then there looks to be mainly advantages when it comes to performance with table partitioning. And especially this can run transparent for normal users and database developers it is definitely something worth a thought.

 

 

Bookmark and Share DotnetKicks dotnetshoutout

Table partitioning and some numbers.

by Allan Svelmøe Hansen March 17, 2008 09:00

Partitioning is a nifty feature of SQL Server , and in SQL Server 2005 it has been made easy.
I’ll not go into the inner workings of partitioning, for that many other resources exists – for example, Microsoft’s own article is a good place to start. What I’m interested in – mostly – is how it can affect me and how I can utilize it. So I decided to run some small testing to see what I could pull from it.

I started by creating a database spanning multiple filegroups on different physical drives.
CREATE DATABASE ASH
ON PRIMARY (name 'ash'filename 'F:\ASHTest\ash.mdf'size=4mbMAXSIZE=10MBFILEGROWTH=1MB),
filegroup ash_fg1 (name 'ash_fg1'filename='F:\ASHTest\fg1.ndf'size=4mbMAXSIZE=10MBFILEGROWTH=1MB),
filegroup ash_fg2 (name 'ash_fg2'filename='W:\ASHTest\fg2.ndf'size=4mbMAXSIZE=10MB,  FILEGROWTH=1MB)
LOG ON (name 'ash_log'filename='F:\ASHTest\ash_log.ldf'size=1mbMAXSIZE=10MBFILEGROWTH=1MB);
GO
 

Then I created my partition function and partition schema:
CREATE PARTITION FUNCTION ash_partfunc(INT)
AS RANGE LEFT FOR VALUES (12);
GO

CREATE PARTITION SCHEME ash_partscheme
AS PARTITION ash_partfunc
TO (ash_fg1ash_fg2[PRIMARY]);
GO
 

After which I created two identical tables, with the only difference that TestOne is partitioned using the schema and TestTwo is not.
CREATE TABLE TestOne
(
 
MyID INT NOT NULL,
 
MyValue NVARCHAR(255) NOT NULL
)
ON ash_partscheme (MyID)
GO

CREATE TABLE TestTwo
(
 
MyID INT NOT NULL,
 
MyValue NVARCHAR(255) NOT NULL
)
GO
 

Then I filled the tables with the same data
DECLARE @i INT
DECLARE 
@j INT
SET 
@j 0
SET @i 0

WHILE @i 1000000 BEGIN
 INSERT INTO 
TestOne VALUES
 
(@j'abc' LTRIM(STR(@i)))
 
INSERT INTO TestTwo VALUES
 
(@j'abc' LTRIM(STR(@i)))
 
 
SET @j @j 1
 
IF @j 3
  
SET @j 0

 
SET @i @i+1
END
 

Then afterwards it was just running a single query to see the difference.
SELECT *
FROM TestTwo
WHERE MyID 1;
GO

SELECT *
FROM TestOne 
WHERE MyID 1;
GO 
 

This gave me the following estimated execution plan.
(click to enlarge)
Estimated Execution Plan For Partitioning 1

I was very surprised to see so big difference on such simple testing. Note there are no indexes or any other ways of optimizing my queries other then the partitioning.
Remember TestTwo is my unpartioned table and the estimated total cost of the query was about 0.882487 with 0.66 of that being I/O.
For my partitioned table the cost was 0.246486 with 0.221722 of that being labeled as I/O. This is a large difference for such a simple example and exemplifies clearly the benefit of partitioning.
 
So then I thought – what happens when I need not one, but two MyID at the same time, so I ran this query:
SELECT *
FROM TestTwo
WHERE MyID OR MyID 2;
GO

SELECT *
FROM TestOne 
WHERE MyID OR MyID 2;
GO 
 

And it now showed an advantage towards the unpartitioned table, which is logical enough, because now the sql engine has to combine results from 2 filegroups. My surprise however comes from the fact that there seems to be a big difference in I/O still and it is only more CPU resource which is requiered.
(click to enlarge)
Estimated Execution Plan For Partitioning 2

The query run against TestTwo displays the same numbers as before, which is logical because it does a complete table scan. That can only be done in one way.
However the I/O cost of the second query is interesting.
As can be seen from the execution plan the query against TestOne table no consists of a Table Scan which has an I/O cost of 0.221644, as well as a Constant Scan and Nested Loop – both of which does not use I/O.

This means that even though the total cost (1.19178) of the TestOne – the partitioned table – is higher than the one of the TestTwo (0.882487) – the unpartitioned table – it still uses significantly less I/O. This result repeats itself if I run selects without any where clause
SELECT *
FROM TestTwo
GO

SELECT *
FROM TestOne 
GO
 

(click to enlarge)
Estimated Execution Plan For Partitioning 3

The total cost for querying the partitioned table is higher (1.73213 versus 0.882487), the total I/O cost is still significantly lower.
SET STATISTICS IO ON reveals however that TestOne query needs 3 logical scans, which makes sense, seeing as there are 3 partitions and thus 3 “tables”.

Anyways, this clearly shows – at least in my view – that if you can indeed group your data in a manner such as this, it is possible to get a nice performance boost by doing it. However, it is necessary – or at least advisable, if the grouping is so unique that you most likely only would use data from one group the majority of the time as there otherwise looks to be a performance hit. But the advantege is not counting managebility and scalability by having data on multiple files and drives which also can be a factor.

(I have not tested the overhead by inserting data or maintaining the data, that’ll be a blog for another time.)

 

Bookmark and Share DotnetKicks dotnetshoutout

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.