Allan S. Hansen

I'm a developer working primarily with .NET programming (VB.NET/C#) but have a (secret) passion for databases, performance and their usage. So I'm using this blog to share things I work on, scripts I make, or concepts and techniques I find interesting and hopefully others might find them useful as well. I’m focusing mainly on practical problems and situations. I've been developing professionally in .NET and MS SQL Server since 2004.

A little quick note, when needing composite keys in your relationship in SQL XML Bulk Load the syntax is as follow:

<sql:relationship name=“Detail”
parent
=
“Import.Product”
parent
-KEY=
“ProductID FileID”
child
=
“Import.ProductDetails”
child
-KEY=“ProductID FileID”/>

meaning they’re simply space separated in the relationship annotation. The composite key in this aspect are the columns ProductID and FileID.
As can also be seen in this annotation, I’m using a schema qualifiaction with Import, so if you have other schemas, you can simply write them in the parent/child.

For more syntax, please refer to the blog post SQL XML Bulk Load – basic XSD syntax

Mar 202010

The output clause is a really nifty thing in SQL Server 2008, which provides you access to the same “inserted” and “deleted” tables you get access to via triggers, but from within the same query.
To illustrate, suppose we have a table:

CREATE TABLE [dbo].[OutputEx](
[Key] [int] IDENTITY(1,1) NOT NULL,
[Value] [nchar](10) NULL,
[SecondValue] [datetime2](7) NULL,
CONSTRAINT [PK_OutputEx] PRIMARY KEY CLUSTERED
(
[Key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS
= ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[OutputEx] ADD  CONSTRAINT [DF_OutputEx_SecondValue] DEFAULT (GETDATE()) FOR [SecondValue]

This is a table which has an identity column in [Key] and a datetime value field in SecondValue2 which defaults to the current date (GETDATE())

Now, if I insert a [Value] into this table SQL Server will automatically insert the identity value and the current date SecondValue, but if I wanted to get these automated values out, I would have had to select them out manually afterwards.
This might not seem like much of a problem given these two values (Identity and date) however it might be much more complex constraints and default values.

So what I can do is use the OUTPUT clause like this:

INSERT INTO OutputEx ( Value)
OUTPUT inserted.[key], inserted.secondvalue
VALUES ( 'a')

This will provide me with a result of the identity value inserted into [key] and the date value inserted into SecondValue.
I can use this result in an outer query by nesting the insert in a sub-query, or I can insert it into another table directly like this:

DECLARE @T TABLE(ID INT, [Date] DATETIME2)

INSERT INTO OutputEx ( Value)
OUTPUT inserted.[key], inserted.secondvalue INTO @T
VALUES ( ‘a’)

My table variable @T will now hold the values inserted
This also works with the “deleted” table instead of the inserted and thus both in insert/update and deleted queries.
It is a useful technique.

I’m in the process of upgrading to new blogengine.net version, and in that process there might be some issues while I get it all set up.

I’m also going to clean up the comment spam pretty heftily – so in case your legit comment vanishes, I’m sorry in advanced, but I just had to prune some 3000 spam comments
I’ll also look at incorporate a catpcha as well as automated spam detection built into the engine, so hopefully comment spam and moderation of them will be easier.

And finally – I’m going to try out feedburner for my RSS feed, so I hope I do not break anything by it :)

Thanks for your patience.

I have been working with SQL XML Bulk Load (version 4.0) over the last period of time, and suddenly I encountered an issue where the imported data didn’t match the data in the XML files I received.
Looking closer into the issue, I discovered it was because the SQL XML Bulk Load removed my line breaks entirely from the imported data.

In XML line breaks are defined as standard using only “line feed” (LF). However working in a windows environment line breaks are almost always both carriage return and line feed combined (CR+LF).
However when the SQL XML Bulk Load parser (I would think it is a parsing issue) encounters a CR+LF in a node value it strips both characters from the text which results in the data being imported without the line break. If however the parser encounters only a LF, then that LF is retained in the imported data.

Now this seemed like a strange behaviour, because why would both CR and LF be removed and not only the CR, if it was to rely on the standard. However even more annoying, when the CR+LF is used as line breaks most every other place in Windows environments, why even remove those two characters to begin with?

I contacted Microsoft technical support which seems to have confirmed that it isn’t an error on my account, and that they maybe would look at it for the next release.  (*fingers crossed*)
Whether or not they view it as a bug I do not know – but there is definitely something wrong there. And I would call it a bug, because removing extra information can’t be intentional.

To illustrate my issue I made a very simple XML file containing two nodes:

<root>
<
myNode>Before CRLF
After CRLF
</myNode>
<
myNode>Before LF
After LF
</myNode>
</
root>

In this file, the first line break was made with a CR+LF (or rather Environment.NewLine via .NET, but parsing it matches CR+LF characters)  and the second was made only with the line feed as a line break.
I then imported that file into a database using the schema file:

<?xml version=1.0 encoding=utf-8?>
<
xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema
xmlns:dt=urn:schemas-microsoft-com:datatypes
xmlns:sql=urn:schemas-microsoft-com:mapping-schema>

<xs:element name=root sql:is-constant=1>
<
xs:complexType>
<
xs:sequence>
<
xs:element xml:space=preserve type=xs:token dt:type=string name=myNode sql:relation=LineBreak />
</
xs:sequence>
</
xs:complexType>
</
xs:element>
</
xs:schema>

Where I tried to do all sorts of XSD “keep my line breaks as they are” tricks (space=”preserve” and so on).
I then ran the import (vb.net code)

Dim strInFile As String = “c:\linebreak.xml”
Dim strSchema As String =
“c:\linebreak.xsd”
Dim objBL As Object = CreateObject(“SQLXMLBulkLoad.SQLXMLBulkload.4.0″
)
Try
objBL.ConnectionString
=
“connection string”
objBL.KeepIdentity
= False
objBL.XMLFragment
= True
objBL.ErrorLogFile =
“error log file path”
objBL.Execute(strSchema, strInFile
)
Catch ex As
Exception
Debug.WriteLine(ex.ToString
)
Finally
If Not objBL
Is Nothing Then
objBL
= Nothing
End If
End
Try

And when I viewed the imported data, the CR+LF was removed from the first “myNode” and it had no breaks what so ever, and in the second “myNode” the LF was retained.
I was asked by the Microsoft employee I was in contact with to create the issue on Microsoft Connect, so I eagerly await what comes of this for the next release(s) of SQL XML Bulk Load. Hopefully they’ll have this fixed.

The only work around I have found myself is to pre-/post process the files. Before importing, run through the XML file and replace your line breaks with some special character not present in the file (usually ¤ is a good candidate) – and once the import has run, then post process your data and replace the used character with CR+LF again.
It is not a beautiful solution, but it works around the issue and helped myself move forward with the project.

I was stuck with a problem at work where I had a very complex query pulling data out of some tables, some XML and what not, and a lot of data manipulating in the query to be able to easier populate a DataWarehouse.
I then had to expand this query and pull some extra Boolean information out, which were based on a logical comparison of two strings, which I would normal do with a CASE like CASE string1 = string2 THEN 1.

However because I had a GROUP BY clause, I could not simply do this, because the columns the strings were taken from was not in the group clause and I didn’t want them there.
So I found out that you can actually put an aggregated function around the case using the following syntax:

SELECT
<GroupByFields
>,
MAX
(
CASE
WHEN <NonGroupField> = ‘some value’ THEN
1
ELSE
0
END
)
FROM <TABLE
>
GROUP BY <GroupByFields
>

This saved me from using a common table expression or two more, and kept things simple. In essense this allows you to make conditional SUM and similar using this technique.
Just more evidence that a lot is possible in SQL, and just trying something often reveals interesting results.

A common problem to solve is to list, for example the current price for a product or latest order for a customer or similar.
It is a problem which quickly can seem complex, but once understanding the situation, then – as everything else – it is relative simple, and I’ll show how using both window functions (in this instance, RANK) from SQL Server 2005+ and using a sub-query in case working on an earlier version, or other databases.

I’ll take offset in the Northwind database, which I’ve installed on a SQL Server 2008.
It has an Order table with a foreign key to a Customer table; however for this the Order table is the only one of interest:
The Order table contains the following information:
SELECT [OrderID]
,
[CustomerID]
,
[EmployeeID]
,
[OrderDate]
,
[RequiredDate]
,
[ShippedDate]
,
[ShipVia]
,
[Freight]
,
[ShipName]
,
[ShipAddress]
,
[ShipCity]
,
[ShipRegion]
,
[ShipPostalCode]
,
[ShipCountry]
FROM [Northwind].[dbo].[Orders]

In this situation, we’ll get the latest order per customer, but the problem is the same as if you’d need the current price – the only thing which differs will be the tables and where clauses and so on.
Solving the issue using the RANK window function, and a common table expression, it would look like this:

;WITH CTE AS (
SELECT
RANK() OVER (Partition BY T1.CustomerID ORDER BY OrderDate DESC) AS OrderRank
,
T1.
*
FROM
Orders T1
)
SELECT * FROM CTE WHERE OrderRank =
1
ORDER BY CustomerID

What happens here is that we use RANK to give us a number partitioned (grouped) by CustomerID and sorted by OrderDate descending. This will give each row selected the number/rank that order is historically, from newest to earliest. That means each row which have the rank 1 will be the latest/newest order.
You can then expand on the joins inside/outside the common table expression to get information about customers or order details or what not.

It is also solvable without using the window function RANK, and would look for example something like this:

SELECT *
FROM
Orders T1
INNER JOIN
(
SELECT CustomerID, MAX(OrderDate) AS
OrderDate
FROM
Orders T2
GROUP BY
T2.CustomerID
) T3 ON T3.OrderDate = T1.OrderDate AND T1.CustomerID =
T3.CustomerID
ORDER BY T1.CustomerID

Here we utilize a sub-query where we select the CustomerID and the highest order date from Orders, which we then can join into Orders again with a self-join on customer id and order date.

Note that these are just examples. Many similar solutions exists, but they all follow the same methodology.
I’ve also not taken optimizaiton or anything like that into account. It was just to illustrate a solution to a common problem.

Seeing as I have now worked somewhat the SQL XML Bulk load, I have decided to start compiling a list of the errors I got and what I did to solve them to help myself (and others) the next time I encounter one of these errors.

The statement has been terminated.
This usually means some SQL Error has occurred within the bulk load, for example a wrong data type, truncating of data and so on.
Check the ErrorLog file for more information.

Schema: the relationship XXX referred to in the node ‘YYY’ is not defined.
This means – as the error says – that a relationship is missing in the schema file. Usually more often than not, this means I’ve misspelled something in my relationship definitions or where I use the relationship. Otherwise make sure your relationship is defined if it is missing.

Schema: the annotation ‘field’ on the empty content element ‘XXX’ is not supported.
The “field” annotation is set on what is a container tag. You can’t assign a field to a container tag, but it needs to be set on an element.

Schema: the parent/child table of the relationship on ‘XXX’ does not match.
This for me usually means that I’ve forgotten to add  the relation on the field which has a relationship, or that I’m accessing the wrong tables.

The column ‘XXX’ was defined in the schema, but does not exist in the database.
Well, obviously, this means that there’s a field mapping in the schema file which doesn’t match the database. Check the spelling of the field/column name in the schema file.

Invalid character value for cast specification.
This means that you’re trying to make an invalid cast, for example trying to cast a string into a number field.

Sep 042009

As mentioned – casually – in the msdn documentation the IsNumeric function will return 1 for some values which aren’t actually numbers.  The currency sign $ is mentioned as well are plus/minus.
Period and comma will also be returning 1.

This essentially means that you can’t be sure that it is actually a number/numeric value which passes the IsNumeric check.
And least of all, you can’t be assured of the semantic value returned is what you expect.
This can – if unaware and not paying attention – be a problem when dealing with number values for countries which do not follow the same period/decimal convention as the US.

Also – “funnily” enough due to the caveats in the IsNumeric, you can’t actually be assured that it can be converted to a Numeric data type. (It can normally always be converted into a Money, but then the name is misleading :) )
To illustrate notice for example the following snippet:

DECLARE @VAR VARCHAR(20)
SET @VAR = ‘€,.,,’

This will pass a “IsNumeric” check with the result 1, but it can’t be cast into a numeric/decimal. It can be cast into money but will return 0.00
And that is despite it is not in any form an actual numeric value.
There are some semantic checks built into the IsNumeric such as you can’t have values infront of the currency sign, so

SET @VAR = ’123$’

would fail a “IsNumeric” check. Also some checks on the numbers of “plus/minus” signs and so on……
So when using IsNumeric, it is time to be careful and not just accept a success for actual success.

Some time ago Google announced more insight in how the nofollow
attribute worked, and since then I’ve received an explosion in comment
spam.

So now I’ve taken the liberty of enabling moderating of comments, to stem up against this inrease in spams – too bad people have to ruin it for others.

Funnily enough some of these spamming idiots have the gall to put
“nospam” in as e-mail address ….. so I guess spam is an issue for them
as well. 

With any luck, moderated comments will just be temporary, but
well – it never hurt anybody to wait a little while with putting a
comment on-line.

I was recently debugging a problem in a Stored Procedure where the wrong values where inserted into a table, in the middle of a long running cursor operation.
The culprit turned out to be the scope of a variable in SQL versus the scope most (object orientated) developers are used to in the language they usually code in.

The problem can be illustrated with this syntax:
DECLARE @Counter INT = 1

WHILE @Counter < 5 BEGIN
DECLARE
@Var VARCHAR(MAX
)
IF @Counter =
2
SET @Var =
‘SomeValue’
SELECT
@Var
SET @Counter +=
1
END

The way scope works for variables in T-SQL is “batch scope”, whereas most OO developers are used to the scope being limited within the innermost context.

The way it functions in SQL is that in the first iteration of the while loop the variable “@Var” gets declared but not assigned a value.
Thus, when selecting it out, the result will be NULL.

However in the second iteration, the variable is already declared once, so SQL Server will not “recreate” it, because the scope is “batch scope”.
So here it’ll just be assigned the value “SomeValue”.

The third iteration is like the second. The variable is already declared, and because it in the previous iteration had the value set to “SomeValue”, that value will persist in this iteration, and all subsequent iterations until another value is set.

This is a source of errors if not careful, because the object orientated developer will – if not aware of this – most likely read the above as only iteration 2 will contain the value “SomeValue”, whereas all other iterations will contain NULL.
But because of the batch scope, it is not so.

As the result from the query shows: