Delete using join-queries

by Allan Svelmøe Hansen April 24, 2008 18:29

Occasionally I run into having to clean or modify data using a range of more or less complex conditions.
Then the following syntax comes in handy, because it is actually possible to use complex queries utilizing joins, to delete information from.

The example is realtive simple but it illustrates the point quite well ....... I hope :)

DELETE T1
FROM Table1 T1
INNER JOIN Table2 T2 ON T1.TableOneID T2.TableOneID

As can be seen the syntax is a little bit different from a normal delete. This query will delete all rows in T1 aka Table1 which would be selected out by my query.

A good rule when building such queries to delete from is to build them up as selects, and then when you have selected the correct data out, you can substitute the SELECT with a DELETE <ALIAS>

I use this type of synax a lot in at work and just wanted to share it, in case it wasn't as known for people. It can quickly become a powerful tool for deleting the correct data when normal subqueries aren't useful.

Bookmark and Share DotnetKicks dotnetshoutout

Tags: , ,

SQL

More table partitioning and numbers

by Allan Svelmøe Hansen April 02, 2008 12:50

Following up on my previous testing of table partitioning, I wanted to see if I could identify any notable overhead when using partitioned tables other then in selects, meaning delete, update and of course inserts. At the same time I thought I could test out with a partitioned table but where the files was located on the same physical drive.
So I created a new database, 2 partition functions and schemas and 3 new tables. TestOne was partitioned as my last time, and so was TestTwo. TestThree table was partitioned exactly like TestOne except it utilized file groups located on the same physical drive, whereas TestOne had its filegroups split across two physical drives.
I found out that selects provided no noticeable performance difference between TestOne and TestThree tables. I would think the main area where one would see performance differences here is dependent on hardware and load on the drives, and less so with the SQL Server performance.  So further testing into this will be outside the scope of at least this blog entry.

So – onwards to testing other SQL then selects. The first I tested was ordinary inserts for each of my 3 tables.

INSERT INTO <TABLE>
VALUES (1'test')
GO

When running this it provided me with this execution plan:

(click to enlarge)
Estimated Execution Plan For Insert

Not surprisingly there is an overhead visible when it comes to inserting into a partitioned table compared to a non-partitioned one.  Plus the plans do not look the same. I would simply attribute the overhead to the fact that SQL Server needs to look up which file group to locate the data in based on the partition function.
The difference in this instance does however look relative small. The entire subtree cost for inserting into a partitioned table (TestOne and TestThree) were 0.010471 whereas for TestTwo it was 0.01.
So while the specific numbers might differ from system to system etc, it does show an overhead, albeit a small one.
When it came to both delete and updates, I found that the same issues as with select statements were in effect.
When I only needed to delete or update on a specific "grouping" of data, the partitioned tables were actually faster than the non-partitioned one. When I had to update across "groups" then the non-partitioned one was fastest.
For example:

UPDATE TestOne
SET MyValue='Test2'
WHERE MyID 1
GO
 

was faster then

UPDATE TestTwo
SET MyValue='Test2'
WHERE MyID 1
GO
 

by a factor around 20%, whereas

UPDATE TestOne
SET MyValue='Test2'
WHERE MyID OR MyID  2
 

was about 10% slower then

UPDATE TestTwo
SET MyValue='Test2'
WHERE MyID OR MyID  2
 

This leads back to my conclusion in the last piece that if you in fact can group your data in a manner which makes sense and avoids “cross-grouping data”, then it is faster performance wise to do so. The only time it looks to be noticeable slower to have partitioned data is with inserts.
This of course is not an ever valid conclusion, and I’m sure more concrete and scientifically gathered data could present cases against. But if your data abides to some rules which makes a grouping possible and the main focus on the data is reading versus inserting; then there looks to be mainly advantages when it comes to performance with table partitioning. And especially this can run transparent for normal users and database developers it is definitely something worth a thought.

 

 

Bookmark and Share DotnetKicks dotnetshoutout

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.