The MERGE syntax - syntax and performance

by Allan Svelmøe Hansen May 21, 2010 15:40

Normally when you have rows from one table you want to move over into the other, you'll have to run both an update query to get your existing rows updated with the new values and insert query to get new rows over to your target. That means you have to write two queries. With the merge, you only need one as it performs both the update and the insert.
So let's take a look at it.

First, I'll create 2 tables with dummy data; a tblSource and a tblTarget.
The scripts for creating the tables and data can be found here merge_setup_20100521.sql (3.58 kb), but basically it's just a target and source table with an ID, ValA, ValB and ValC with a clustered index on ID and some dummy data.
So to do the update/insert it would look like this:
UPDATE MyTarget SET 
   
MyTarget.ValA MySource.ValA

   
MyTarget.ValB MySource.ValB

   
MyTarget.ValC 
MySource.ValC 
FROM dbo.tblTargetInUp AS 
MyTarget  
INNER JOIN dbo.tblSourceInUp AS MySource ON MyTarget.ID 
MySource.ID 

INSERT INTO 
dbo.tblTargetInUp 
SELECT IDValAValB
ValC 
FROM dbo.tblSourceInUp AS 
MySource  
WHERE NOT EXISTS (SELECT FROM dbo.tblTargetInUp AS MyTarget WHERE MyTarget.ID MySource.ID
)  
 

If we look at the execution plan, I get an estimated cost of 9.1754 for the update and 2.48725 for the insert, meaning a combined cost of 11.66265 for 33.333 rows updated and 33.3334 rows inserted, into a target of 66.666 rows.

Now do remember, the estimated cost is just a number for how the query runs in my environment, it can't be taken as a direct number and transferred to another system - I'm only interested in the relative comparison with the merge. More data, more indexes, more variations will all affect the actual numbers. Also if looking at the execution plan, it is clear it is two queries we fire, meaning that all the overhead which goes into running one query will be doubled for this. But it was how you'd have to do merges in the past.

Now, with the merge syntax we can do it like this:
MERGE dbo.TblTargetMerge AS MyTarget USING 
(   
   
SELECT FROM dbo.TblSourceMerge AS 
TS 
AS 
MySource 
ON MyTarget.ID 
MySource.ID 
WHEN MATCHED 
THEN  
   UPDATE SET 
       
MyTarget.ValA MySource.ValA

       
MyTarget.ValB MySource.ValB

       
MyTarget.ValC 
MySource.ValC 
WHEN NOT MATCHED 
THEN 
   INSERT  
   
(IDValAValBValC

   
VALUES 
   
(MySource.IDMySource.ValAMySource.ValBMySource.ValC

  

Note that the syntax takes both the update and insert in the WHEN MATCHED and WHEN NOT MATCHED.
For the complete overview over the syntax, I'll refer you to the documentation by microsoft: MERGE (Transact-SQL).
But basically – you MERGE into a table using a source, and then define the ON clause (as you would a join), and then specify the WHEN MATCHED and the WHEN NOT MATCHED clauses.

One thing I'll expand on myself though, is the OUTPUT clause which can also be coupled on to the merge. I mentioned the OUTPUT clause myself recently.
The important thing is that you can couple the $action to the output clause and get information about whether you merged the data or you inserted the data, meaning whether the row was matched or not matched.
Like this:

MERGE dbo.TblTargetMerge AS MyTarget USING 
<...snipped FOR being brief...

OUTPUT $ACTION
rest_of_select_here 
;  
  

Nifty.
Anyways - once we've build this query, we can look at the execution plan, and here we clearly see it is handled as one query.
And in my case, the estimated cost is 8.407 for the same number of rows as above, meaning we’ve saved about 27.9% just by changing syntax.

Now, this structure I used to compare is very simple - with a simple matching of ID to ID and then just insert/overwrite everything so the actual result may vary (naturally), however the merge syntax does appear to be faster and with the added bonus of keeping the query combined into one syntax rather than divided into two different queries.
 
I must admit, I do like the merge syntax myself once I learned to read and write them.

Bookmark and Share DotnetKicks dotnetshoutout

Rollback the transaction

by Allan Svelmøe Hansen April 21, 2009 11:22

Last night I wrote a bit about an experience I had where the transaction blocked access.
So I decided to quickly write about how to rollback a transaction.
Suppose we have two tables:

Tables for Transaction example

I run the following SQL to illustrate wraping some INSERTs in a transaction.

BEGIN TRAN

BEGIN 
try

INSERT INTO TableOne VALUES ('aa')
INSERT INTO TableTwo  VALUES (SCOPE_IDENTITY(), 'bb')
INSERT INTO TableTwo  VALUES (-1'cc')
INSERT INTO TableOne VALUES ('aa')

END try
BEGIN catch
ROLLBACK TRAN
END 
catch

IF @@TRANCOUNT 
   
COMMIT TRAN

What happens here is the first insert will insert 'aa' into TableOne.
Then I use the identity value from TableOne and inserts into TableTwo in the subsequent insert. This will succeed. The third insert will try to insert a value -1 which will throw a foreign key restraint error.

Because I have wrapped in a transaction, and utilize the Try/Catch functionality (like in most other programming languages) – I can within rollback nicely within the "catch", and make sure nothing gets committed to the database.

The end of the query tests the transaction counter (@@TRANCOUNT) and whether or not the transaction should be committed. This way I avoid having open transactions, and I avoid trying to close a transaction which have already been handled. Each BEGIN TRAN(saction) will increment @@TRANCOUNT, whereas ROLLBACK and COMMIT will decrement it.

Bookmark and Share DotnetKicks dotnetshoutout

Using @@IDENTITY and SCOPE_IDENTITY()

by Allan Svelmøe Hansen February 16, 2009 20:23

Often I see people just using @@IDENTITY when needing the latest used identity value inserted into a table.
However this is dangerous behavior seeing as a potential bug resides there.

As mentioned in the documentation for the functionalities @@IDENTITY returns the last used identity value within the given session, but the globally used identity. This means that if something else, within the same session (not to be mistaken with a HTTP session ;) ) uses identity insert, you risk getting the wrong value back.
SCOPE_IDENTITY() only returns within the same scope.
One exists called IDENT_CURRENT which can return the identity for any session for a given table. However, I’ll only focus on @@IDENTITY and SCOPE_IDENTITY() as they are most common.

An easy way to see the difference is to make two tables with an identity column and a dummy value field like this.

CREATE TABLE TableOne(
   
[ID] [int] IDENTITY(1,1
) NOT NULL,
   
[Value] [nvarchar](50
) NULL
)
CREATE TABLE TableTwo
(
   
[ID] [int] IDENTITY(10,1
) NOT NULL,
   
[Value] [nvarchar](50
) NULL
)
   

Note that the second table has a different initial identity value then TableOne. This will help with the illustration.
Now create a trigger on TableOne to run after Insert which inserts another row into TableTwo. Just some dummy data like this:

CREATE TRIGGER [dbo].[TriggerInsert]
   
ON  
[dbo].[TableOne]
   AFTER 
INSERT
AS 
BEGIN
   SET 
NOCOUNT ON

    
INSERT INTO TableTwo VALUES ('ss'
)
END
GO 
 
Now try running the following query to illustrate the difference between the two IDENTITY functionalities:
INSERT INTO TableOne VALUES ('test')
SELECT 
@@IDENTITY
SELECT SCOPE_IDENTITY
() 
 

You’ll notice that the first SELECT @@IDENTITY returns a different identity value then just inserted into TableOne, and that is because it takes the last used identity value for the given session. SCOPE_IDENTITY() returns the expected value.

This is a potential problem if not aware of the differences and for example triggers are used. It can often provide in foreign key integrity problems and can be difficult to debug if multiple developers work on the same database.

Bookmark and Share DotnetKicks dotnetshoutout

UPDATE/INSERT syntax trick

by Allan Svelmøe Hansen June 08, 2008 09:17

A little trick you can use with an update or insert is an automatic assignment.
For example at times I've had to update all rows in a (temporay) table with an consecutive number and a fast way of doing this is utilizing this technique:

DECLARE @I INT
SET 
@I 0
UPDATE Table_1
SET @I MyID @I 1
 

This example instantiates an interger (I) to the value 0. Then I update a table where I set an ID column to the value of I + 1, but at the same time I set my I variable to the value of MyID. This means that the first row will get the value 1 into MyID and the second will get 2 and so forth.

And of course this example is very simple, and in such a case one might as well just use identity. However because you can make any calculations you want, and because it does not have to be an integer data type - you can use this for various situations.

It just goes to show that T-SQL syntax allows for playing around.

Bookmark and Share DotnetKicks dotnetshoutout

Tags: ,

SQL

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.