Mar 022011

When using COUNT with LEFT JOINs, it is worth noting that is that it does not count NULL values if counting over something different than *.
This is useful if you wish to count how many of the rows fulfill some sort of join clause.

Take the following example.
I’ve created two tables consisting of a unique identifier and a text column. I’ve added test data and for every two row I placed into Tabel_1, I placed one row into Table_2. Then I made a left join query which looks like:

SELECT *
FROM Table_1 AS T
LEFT JOIN Table_2 AS T2 ON T.PK = T2.FK

Which provide an output like this:

LEFT JOIN example for using COUNT | EXEC(@sql)

LEFT JOIN example for using COUNT

Now to illustrate the difference in count, run a query like this:

SELECT COUNT(*) AS AllRows, COUNT(PK) AS RowsFromLeft, COUNT(FK) AS RowsFromRight
FROM Table_1 AS T
LEFT JOIN Table_2 AS T2 ON T.PK = T2.FK

Which provide the following output:

COUNT result for LEFT JOIN | EXEC(@sql)

COUNT result for LEFT JOIN

I’ve seen many more or less creative solutions in my time to how to count over the Right table in a Left Join, because few people seem to read the documentation on something as simple as a COUNT.
However often it is well worth it just to take a peek.

I ran into the error: The select list for the INSERT statement contains fewer items than the insert list. The number of SELECT values must match the number of INSERT columns.
Well I’ve run into it often.

Normally this simply means you have specified fewer items in values for an INSERT statement than you have specified you need. For example:

INSERT INTO Table1
( ID, [Column], Column1 )
SELECT 1,
'Value', 

Now, that’s easily rectified, by including the columns you need or removing the one from INSERT you do not need.

However, I also noticed that in my situation I had all the columns needed, but I was lacking a comma – like this:

INSERT INTO Table1
( ID, [Column], Column1 )
SELECT 1,
'Value1'
'Value2'

Notice the lack of comma between Value1 and Value2.

This gives the mentioned error and not a syntax error in the SELECT statement. The reason is that SQL Server concatenates the Value1 and Value2 fields into one column, and therefore the SELECT only produces 2 columns.
SQL Server will be able to concatenate two columns into one without needing to add them together, but only two columns.
So if I were to forget the third comma I would get an “Incorrect syntax near Value2″ error message alerting me to the issue. But forget one comma between 2 columns and you get no such information.

I didn’t see a mention of this in the MSDN documentation.

Using @@rowcount is nice at times to get the number of rows affected by the last SQL statement. However sometimes you get an unexpected result.
One such situation is with the print statement which is often used as debugging statement in a query.
If looking at the SQL statement:
SELECT 1
UNION
SELECT
2
SELECT @@ROWCOUNT

Then the @@rowcount naturally will return the value 2.
However if we insert a print statement just before the rowcount like this:

SELECT 1
UNION
SELECT
2
PRINT
‘test’
SELECT @@ROWCOUNT

Rowcount now suddenly is 0.

That means the print statement changes the rowcount despite it being just a print to the console window.
Something to be aware of if relying on @@rowcount.

Normally when you have rows from one table you want to move over into the other, you’ll have to run both an update query to get your existing rows updated with the new values and insert query to get new rows over to your target. That means you have to write two queries. With the merge, you only need one as it performs both the update and the insert.
So let’s take a look at it.

First, I’ll create 2 tables with dummy data; a tblSource and a tblTarget.
The scripts for creating the tables and data can be found here merge_setup_20100521.sql (3.58 kb), but basically it’s just a target and source table with an ID, ValA, ValB and ValC with a clustered index on ID and some dummy data.
So to do the update/insert it would look like this:
UPDATE MyTarget SET
MyTarget.ValA = MySource.ValA
,
MyTarget.ValB = MySource.ValB
,
MyTarget.ValC =
MySource.ValC
FROM dbo.tblTargetInUp AS
MyTarget
INNER JOIN dbo.tblSourceInUp AS MySource ON MyTarget.ID =
MySource.ID

INSERT INTO dbo.tblTargetInUp
SELECT ID, ValA, ValB,
ValC
FROM dbo.tblSourceInUp AS
MySource
WHERE NOT EXISTS (SELECT 1 FROM dbo.tblTargetInUp AS MyTarget WHERE MyTarget.ID = MySource.ID
)

If we look at the execution plan, I get an estimated cost of 9.1754 for the update and 2.48725 for the insert, meaning a combined cost of 11.66265 for 33.333 rows updated and 33.3334 rows inserted, into a target of 66.666 rows.

Now do remember, the estimated cost is just a number for how the query runs in my environment, it can’t be taken as a direct number and transferred to another system – I’m only interested in the relative comparison with the merge. More data, more indexes, more variations will all affect the actual numbers. Also if looking at the execution plan, it is clear it is two queries we fire, meaning that all the overhead which goes into running one query will be doubled for this. But it was how you’d have to do merges in the past.

Now, with the merge syntax we can do it like this:
MERGE dbo.TblTargetMerge AS MyTarget USING
(
SELECT * FROM dbo.TblSourceMerge AS
TS
) AS
MySource
ON MyTarget.ID =
MySource.ID
WHEN MATCHED
THEN
UPDATE SET
MyTarget.ValA = MySource.ValA
,
MyTarget.ValB = MySource.ValB
,
MyTarget.ValC =
MySource.ValC
WHEN NOT MATCHED
THEN
INSERT
(ID, ValA, ValB, ValC
)
VALUES
(MySource.ID, MySource.ValA, MySource.ValB, MySource.ValC
)
;

Note that the syntax takes both the update and insert in the WHEN MATCHED and WHEN NOT MATCHED.
For the complete overview over the syntax, I’ll refer you to the documentation by microsoft: MERGE (Transact-SQL).
But basically – you MERGE into a table using a source, and then define the ON clause (as you would a join), and then specify the WHEN MATCHED and the WHEN NOT MATCHED clauses.

One thing I’ll expand on myself though, is the OUTPUT clause which can also be coupled on to the merge. I mentioned the OUTPUT clause myself recently.
The important thing is that you can couple the $action to the output clause and get information about whether you merged the data or you inserted the data, meaning whether the row was matched or not matched.
Like this:

MERGE dbo.TblTargetMerge AS MyTarget USING
<…snipped FOR being brief…
>
OUTPUT $ACTION,
rest_of_select_here
;

Nifty.
Anyways – once we’ve build this query, we can look at the execution plan, and here we clearly see it is handled as one query.
And in my case, the estimated cost is 8.407 for the same number of rows as above, meaning we’ve saved about 27.9% just by changing syntax.

Now, this structure I used to compare is very simple – with a simple matching of ID to ID and then just insert/overwrite everything so the actual result may vary (naturally), however the merge syntax does appear to be faster and with the added bonus of keeping the query combined into one syntax rather than divided into two different queries.

I must admit, I do like the merge syntax myself once I learned to read and write them.

May 202010

I’ve seen many uses, and some misuses, of the STR function over time.

One of the more “easy to spot problems” is that the STR function returns char datatype, meaning it’ll pad the result with spaces and people need to trim the result.

Today I saw an even worse issue. The STR function takes a float argument. This means if you feed it a string (yes, I’ve seen it done) it will implicit convert that string to a float if it can. That in itself can cause an error, but even worse, it can cause a difficult to find bug.
Suppose you have STR(’0001′) and run that, you’ll after trimming end up with the result of ’1′. Why?
Because ’0001′ is converted to a float, which – as we all know – is 1, which then will be cast to a string.

Mar 202010

The output clause is a really nifty thing in SQL Server 2008, which provides you access to the same “inserted” and “deleted” tables you get access to via triggers, but from within the same query.
To illustrate, suppose we have a table:

CREATE TABLE [dbo].[OutputEx](
[Key] [int] IDENTITY(1,1) NOT NULL,
[Value] [nchar](10) NULL,
[SecondValue] [datetime2](7) NULL,
CONSTRAINT [PK_OutputEx] PRIMARY KEY CLUSTERED
(
[Key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS
= ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[OutputEx] ADD  CONSTRAINT [DF_OutputEx_SecondValue] DEFAULT (GETDATE()) FOR [SecondValue]

This is a table which has an identity column in [Key] and a datetime value field in SecondValue2 which defaults to the current date (GETDATE())

Now, if I insert a [Value] into this table SQL Server will automatically insert the identity value and the current date SecondValue, but if I wanted to get these automated values out, I would have had to select them out manually afterwards.
This might not seem like much of a problem given these two values (Identity and date) however it might be much more complex constraints and default values.

So what I can do is use the OUTPUT clause like this:

INSERT INTO OutputEx ( Value)
OUTPUT inserted.[key], inserted.secondvalue
VALUES ( 'a')

This will provide me with a result of the identity value inserted into [key] and the date value inserted into SecondValue.
I can use this result in an outer query by nesting the insert in a sub-query, or I can insert it into another table directly like this:

DECLARE @T TABLE(ID INT, [Date] DATETIME2)

INSERT INTO OutputEx ( Value)
OUTPUT inserted.[key], inserted.secondvalue INTO @T
VALUES ( ‘a’)

My table variable @T will now hold the values inserted
This also works with the “deleted” table instead of the inserted and thus both in insert/update and deleted queries.
It is a useful technique.

I was stuck with a problem at work where I had a very complex query pulling data out of some tables, some XML and what not, and a lot of data manipulating in the query to be able to easier populate a DataWarehouse.
I then had to expand this query and pull some extra Boolean information out, which were based on a logical comparison of two strings, which I would normal do with a CASE like CASE string1 = string2 THEN 1.

However because I had a GROUP BY clause, I could not simply do this, because the columns the strings were taken from was not in the group clause and I didn’t want them there.
So I found out that you can actually put an aggregated function around the case using the following syntax:

SELECT
<GroupByFields
>,
MAX
(
CASE
WHEN <NonGroupField> = ‘some value’ THEN
1
ELSE
0
END
)
FROM <TABLE
>
GROUP BY <GroupByFields
>

This saved me from using a common table expression or two more, and kept things simple. In essense this allows you to make conditional SUM and similar using this technique.
Just more evidence that a lot is possible in SQL, and just trying something often reveals interesting results.

A common problem to solve is to list, for example the current price for a product or latest order for a customer or similar.
It is a problem which quickly can seem complex, but once understanding the situation, then – as everything else – it is relative simple, and I’ll show how using both window functions (in this instance, RANK) from SQL Server 2005+ and using a sub-query in case working on an earlier version, or other databases.

I’ll take offset in the Northwind database, which I’ve installed on a SQL Server 2008.
It has an Order table with a foreign key to a Customer table; however for this the Order table is the only one of interest:
The Order table contains the following information:
SELECT [OrderID]
,
[CustomerID]
,
[EmployeeID]
,
[OrderDate]
,
[RequiredDate]
,
[ShippedDate]
,
[ShipVia]
,
[Freight]
,
[ShipName]
,
[ShipAddress]
,
[ShipCity]
,
[ShipRegion]
,
[ShipPostalCode]
,
[ShipCountry]
FROM [Northwind].[dbo].[Orders]

In this situation, we’ll get the latest order per customer, but the problem is the same as if you’d need the current price – the only thing which differs will be the tables and where clauses and so on.
Solving the issue using the RANK window function, and a common table expression, it would look like this:

;WITH CTE AS (
SELECT
RANK() OVER (Partition BY T1.CustomerID ORDER BY OrderDate DESC) AS OrderRank
,
T1.
*
FROM
Orders T1
)
SELECT * FROM CTE WHERE OrderRank =
1
ORDER BY CustomerID

What happens here is that we use RANK to give us a number partitioned (grouped) by CustomerID and sorted by OrderDate descending. This will give each row selected the number/rank that order is historically, from newest to earliest. That means each row which have the rank 1 will be the latest/newest order.
You can then expand on the joins inside/outside the common table expression to get information about customers or order details or what not.

It is also solvable without using the window function RANK, and would look for example something like this:

SELECT *
FROM
Orders T1
INNER JOIN
(
SELECT CustomerID, MAX(OrderDate) AS
OrderDate
FROM
Orders T2
GROUP BY
T2.CustomerID
) T3 ON T3.OrderDate = T1.OrderDate AND T1.CustomerID =
T3.CustomerID
ORDER BY T1.CustomerID

Here we utilize a sub-query where we select the CustomerID and the highest order date from Orders, which we then can join into Orders again with a self-join on customer id and order date.

Note that these are just examples. Many similar solutions exists, but they all follow the same methodology.
I’ve also not taken optimizaiton or anything like that into account. It was just to illustrate a solution to a common problem.

Sep 042009

As mentioned – casually – in the msdn documentation the IsNumeric function will return 1 for some values which aren’t actually numbers.  The currency sign $ is mentioned as well are plus/minus.
Period and comma will also be returning 1.

This essentially means that you can’t be sure that it is actually a number/numeric value which passes the IsNumeric check.
And least of all, you can’t be assured of the semantic value returned is what you expect.
This can – if unaware and not paying attention – be a problem when dealing with number values for countries which do not follow the same period/decimal convention as the US.

Also – “funnily” enough due to the caveats in the IsNumeric, you can’t actually be assured that it can be converted to a Numeric data type. (It can normally always be converted into a Money, but then the name is misleading :) )
To illustrate notice for example the following snippet:

DECLARE @VAR VARCHAR(20)
SET @VAR = ‘€,.,,’

This will pass a “IsNumeric” check with the result 1, but it can’t be cast into a numeric/decimal. It can be cast into money but will return 0.00
And that is despite it is not in any form an actual numeric value.
There are some semantic checks built into the IsNumeric such as you can’t have values infront of the currency sign, so

SET @VAR = ’123$’

would fail a “IsNumeric” check. Also some checks on the numbers of “plus/minus” signs and so on……
So when using IsNumeric, it is time to be careful and not just accept a success for actual success.

I was recently debugging a problem in a Stored Procedure where the wrong values where inserted into a table, in the middle of a long running cursor operation.
The culprit turned out to be the scope of a variable in SQL versus the scope most (object orientated) developers are used to in the language they usually code in.

The problem can be illustrated with this syntax:
DECLARE @Counter INT = 1

WHILE @Counter < 5 BEGIN
DECLARE
@Var VARCHAR(MAX
)
IF @Counter =
2
SET @Var =
‘SomeValue’
SELECT
@Var
SET @Counter +=
1
END

The way scope works for variables in T-SQL is “batch scope”, whereas most OO developers are used to the scope being limited within the innermost context.

The way it functions in SQL is that in the first iteration of the while loop the variable “@Var” gets declared but not assigned a value.
Thus, when selecting it out, the result will be NULL.

However in the second iteration, the variable is already declared once, so SQL Server will not “recreate” it, because the scope is “batch scope”.
So here it’ll just be assigned the value “SomeValue”.

The third iteration is like the second. The variable is already declared, and because it in the previous iteration had the value set to “SomeValue”, that value will persist in this iteration, and all subsequent iterations until another value is set.

This is a source of errors if not careful, because the object orientated developer will – if not aware of this – most likely read the above as only iteration 2 will contain the value “SomeValue”, whereas all other iterations will contain NULL.
But because of the batch scope, it is not so.

As the result from the query shows: