SQL Server has two commands useful for finding similar rows and different rows between two tables.
These commands are EXCEPT and INTERSECT.
They work by comparing the entire row of one query/table with another query/table.
EXCEPT then returns the rows which are in the first, but not the second table and INTERSECT returns the rows from the first which are also in the second table. But how do they perform?
In real life I find it is rare that I have to compare two tables or queries (or more) which possible have completely indentical data, but it does happen once in a while. However most of the time, these rows also have a unique primary key or otherwise index, so most of the time I use IN and NOT IN myself.
So I'll try EXCEPT/INTERSECT against IN/NOT IN and see how they fare.
I've made two identical tables, Table1 and Table1 both with an ID (int) and Value (nvarchar(10)), and filled them with some test data, ensuring a number of identical rows and a number of different rows using the following query:
DECLARE @I INT
SET @I = 0
WHILE @I < 10000 BEGIN
IF @I % 5 <> 0
INSERT INTO Table1 VALUES (@I, REPLICATE('A', 10))
IF @I % 4 <> 0 INSERT
INTO Table2 VALUES (@I, REPLICATE('A', 10))
SET @I = @I + 1
END
I've not used any indexes et al, because it isn't needed. I am only after the comparable values.
Anyway, using what I know of common problems from work, I now want to find rows in Table1 which aren't in Table2.
Because I know I have "ID" which I assume right now is a unique key (I have no duplicates, because I know what data I've inserted), I'll only use an NOT IN on the ID column, like this:
SELECT *
FROM Table1
WHERE ID NOT IN (
SELECT ID FROM Table2
)
GO
SELECT * FROM Table1
EXCEPT
SELECT * FROM Table2
GO
This performs as one might expect vastly superior for the NOT IN syntax with a ratio of 23% vs. 77%.
However, suppose that I also needed to make sure that Value was the same and that I thus couldn't rely only on the ID, making the following query:
SELECT *
FROM Table1
WHERE ID NOT IN (
SELECT ID FROM Table2
)
OR [Value] NOT IN (
SELECT [Value] FROM Table2
)
GO
SELECT * FROM Table1
EXCEPT
SELECT * FROM Table2
GO
And now the result changes with the NOT IN syntax taking 78% and the EXCEPT only 22%.
So if you need more then one value to be different, then it starts to look like you can use the EXCEPT with performance gain versus the NOT IN syntax.
If you only need to compare on one, it is a waste of ressources to use EXCEPT – lest it is a one time query of course. But it if is something which needs to run "often", I'd think twice.
Conclusions on the results of INTERSECT are exactly (well, the numbers differ of course) the same.
One column to compare, and it is better to use IN, like this:
SELECT *
FROM Table1
WHERE ID IN (
SELECT ID FROM Table2
)
GO
Compared to
SELECT * FROM Table1
INTERSECT
SELECT * FROM Table2
GO
If needing more values, then INTERSECT is better.