I have posted a couple of times about the XML/XSD syntax used for SQL XML Bulk Load but mostly the syntax have been pretty basic, so I wanted to expand a little on the syntax with a more complex example.

So this time I’ll have an XML file structured like this. As can be seen this XML file consists of a couple of nested tags and is a structure that’s realistic when it comes to data imports. In fact it is a minimization of an existing one I have used for work, where I have removed duplicated structures to provide a cleaner example.
The file consists of repeated details tags with content and attributes used, a product files section containing a repeated tag and a nesting of Product Lists. It should cover most of the situations which one would encounter.
The only thing I do not use in this instance is composite keys, but they’re no different compared to using a singular one. I’ve written about composite keys and SQL XML Bulk Load here.
Anyways, let’s tackle the XSD for this file. (If new to the SQL XML Bulk Load, please read my SQL XML Bulk Load – basic XSD syntax post first)

The first tag, ProductID, is simple enough and covered in the basic syntax blog post.
It requires a database table containing this field. Any subsequent values placed on the same level of the XML file would be included into the same table.
So we’ll make a Product table:
CREATE TABLE Product(
ProductID INT
NOT NULL
)

The XSD field will then be mapped to this table. And because I’ve named the field the same as the tag in the XML file I do not need to map the field.
This gives me the segment:
<xs:element Name=“Product” sql:relation=“Product” maxOccurs=“unbounded”>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element Name=“ProductID” />

The details section is next. For this we need to set up a relationship because the section can be repeated several times, so we need to map it out into a related table.
For this section we’ll need the table:
CREATE TABLE Details(
Name VARCHAR(MAX
) NULL,
[Description] VARCHAR(MAX
) NULL,
LanguageCode VARCHAR(5
) NOT NULL,
ProductID INT
NOT NULL
)

This can then hold the Name and Description tags from the section, and the attribute “LanguageCode” and the key ProductID to map it to the Product table/Product section.
Because we need a relationship we’ll have to map it out:
<sql:relationship Name=“Detail”
parent=
“Product”
parent-key=
“ProductID”
child=
“Details”
child-key=
“ProductID”
/>

And this then provides the following section or the XSD
<xs:element Name=“Details” sql:relation=“Details” sql:relationship=“Detail”>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element Name=“Name”
/>
<
xs:element Name=“Description”
/>
</
xs:sequence
>
<
xs:attribute Name=“LanguageCode” Type=“xs:string” use=“required”
/>
</
xs:complexType
>
</
xs:element
>

As can be seen we map the section to the relationship and the table relation.
For the section of Files, we’ll need to expand a little more. Again we need a relationship to tell the bulk load which keys to use.
First the table to import into:
CREATE TABLE Files (
ProductID INT
NOT NULL,
[Type] VARCHAR(255
) NULL,
[FileName] VARCHAR(255
) NULL
)

Then the relationship in the XSD file
<sql:relationship Name=“ProductFiles”
parent=
“Product”
parent-key=
“ProductID”
child=
“Files”
child-key=
“ProductID”
/>

Just like the previous one.

The XSD section will then look like the following:
<xs:element Name=“ProductFiles” sql:is-constant=“1″>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element maxOccurs=“unbounded” Name=“FileName”
sql:relationship=“ProductFiles” sql:relation=“Files”>
<
xs:complexType
>
<
xs:simpleContent
>
<
xs:extension base=“xs:string”
>
<
xs:attribute Name=“type” Type=“xs:string” use=“required”
/>
</
xs:extension
>
</
xs:simpleContent
>
</
xs:complexType
>
</
xs:element
>
</
xs:sequence
>
</
xs:complexType
>
</
xs:element>


We have the is-constant=”1” because this section can only exists once. The relationship and relation is put on the “FileName” element. The “tricky” part here is the xs:extension which is needed because otherwise the Bulk Load will throw an error that you’re trying to map multiple FileName elements. <ERROR>

On to the List section.
Now this is a more difficult one, because here we’ll need a double relationship. We need to map the ProductID to the Type and then the Type to the multiple ProductID’s within that section. So we need the following tables:
CREATE TABLE ProductList (
[ID] INT IDENTITY(1,1
) NOT NULL,
ProductID INT
NOT NULL,
Header VARCHAR(255
) NULL
)
CREATE TABLE ProductListProduct
(
ProductListID INT
NOT NULL,
ProductID
INT
)

Here we make an Identity ID in the ProductList which will be the foreign key in the ProductListProduct:ProductListID column.
So we need the two following relationships in our XSD
<sql:relationship Name=“ProductToList”
parent=
“Product”
parent-key=
“ProductID”
child=
“ProductList”
child-key=
“ProductID”
/>

<sql:relationship Name=“ListToProduct”
parent=
“ProductList”
parent-key=
“ID”
child=
“ProductListProduct”
child-key=
“ProductListID”
/>

To map the relationship. The only difference here is that the parent-key for ListToProduct is the identity column from the ProductList table, and it will be used as the child-key in the column ProductListID.
And this gives the following XSD section

<xs:element Name=“ProductList” sql:is-constant=“1″>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element Name=“List” sql:relationship=“ProductToList” sql:relation=“ProductList”
>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element maxOccurs=“unbounded” Name=“ProductID” Type=“xs:integer”
sql:relationship=“ListToProduct” sql:relation=“ProductListProduct”
/>
</
xs:sequence
>
<
xs:attribute Name=“header” Type=“xs:string” use=“required”
/>
</
xs:complexType
>
</
xs:element
>
</
xs:sequence
>
</
xs:complexType
>
</
xs:element
>

Again we have the is-constant annotation.
We then map the element List to the relationship ProductToList as defined previously and to the table ProductList. Because header is then an attribute named the same as the field in the database, it’ll map automatically as usual.
However we’ll then map the inner element “ProductID” within the List element to the next relationship ListToProduct and the table ProductListProduct.

The SQL XML Bulk Load will then automatically handle the identity column in the ProductList table and map it to the ProductListID column in the ProductListProduct table.

This gives us this following complete XSD file

And when running the bulk load with this XML and this XSD we get the following result:

Result for importing SQL XML Bulk Load

A little quick note, when needing composite keys in your relationship in SQL XML Bulk Load the syntax is as follow:

<sql:relationship name=“Detail”
parent
=
“Import.Product”
parent
-KEY=
“ProductID FileID”
child
=
“Import.ProductDetails”
child
-KEY=“ProductID FileID”/>

meaning they’re simply space separated in the relationship annotation. The composite key in this aspect are the columns ProductID and FileID.
As can also be seen in this annotation, I’m using a schema qualifiaction with Import, so if you have other schemas, you can simply write them in the parent/child.

For more syntax, please refer to the blog post SQL XML Bulk Load – basic XSD syntax

Mar 202010

The output clause is a really nifty thing in SQL Server 2008, which provides you access to the same “inserted” and “deleted” tables you get access to via triggers, but from within the same query.
To illustrate, suppose we have a table:

CREATE TABLE [dbo].[OutputEx](
[Key] [int] IDENTITY(1,1) NOT NULL,
[Value] [nchar](10) NULL,
[SecondValue] [datetime2](7) NULL,
CONSTRAINT [PK_OutputEx] PRIMARY KEY CLUSTERED
(
[Key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS
= ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[OutputEx] ADD  CONSTRAINT [DF_OutputEx_SecondValue] DEFAULT (GETDATE()) FOR [SecondValue]

This is a table which has an identity column in [Key] and a datetime value field in SecondValue2 which defaults to the current date (GETDATE())

Now, if I insert a [Value] into this table SQL Server will automatically insert the identity value and the current date SecondValue, but if I wanted to get these automated values out, I would have had to select them out manually afterwards.
This might not seem like much of a problem given these two values (Identity and date) however it might be much more complex constraints and default values.

So what I can do is use the OUTPUT clause like this:

INSERT INTO OutputEx ( Value)
OUTPUT inserted.[key], inserted.secondvalue
VALUES ( 'a')

This will provide me with a result of the identity value inserted into [key] and the date value inserted into SecondValue.
I can use this result in an outer query by nesting the insert in a sub-query, or I can insert it into another table directly like this:

DECLARE @T TABLE(ID INT, [Date] DATETIME2)

INSERT INTO OutputEx ( Value)
OUTPUT inserted.[key], inserted.secondvalue INTO @T
VALUES ( ‘a’)

My table variable @T will now hold the values inserted
This also works with the “deleted” table instead of the inserted and thus both in insert/update and deleted queries.
It is a useful technique.

I’m in the process of upgrading to new blogengine.net version, and in that process there might be some issues while I get it all set up.

I’m also going to clean up the comment spam pretty heftily – so in case your legit comment vanishes, I’m sorry in advanced, but I just had to prune some 3000 spam comments
I’ll also look at incorporate a catpcha as well as automated spam detection built into the engine, so hopefully comment spam and moderation of them will be easier.

And finally – I’m going to try out feedburner for my RSS feed, so I hope I do not break anything by it :)

Thanks for your patience.

I have been working with SQL XML Bulk Load (version 4.0) over the last period of time, and suddenly I encountered an issue where the imported data didn’t match the data in the XML files I received.
Looking closer into the issue, I discovered it was because the SQL XML Bulk Load removed my line breaks entirely from the imported data.

In XML line breaks are defined as standard using only “line feed” (LF). However working in a windows environment line breaks are almost always both carriage return and line feed combined (CR+LF).
However when the SQL XML Bulk Load parser (I would think it is a parsing issue) encounters a CR+LF in a node value it strips both characters from the text which results in the data being imported without the line break. If however the parser encounters only a LF, then that LF is retained in the imported data.

Now this seemed like a strange behaviour, because why would both CR and LF be removed and not only the CR, if it was to rely on the standard. However even more annoying, when the CR+LF is used as line breaks most every other place in Windows environments, why even remove those two characters to begin with?

I contacted Microsoft technical support which seems to have confirmed that it isn’t an error on my account, and that they maybe would look at it for the next release.  (*fingers crossed*)
Whether or not they view it as a bug I do not know – but there is definitely something wrong there. And I would call it a bug, because removing extra information can’t be intentional.

To illustrate my issue I made a very simple XML file containing two nodes:

<root>
<
myNode>Before CRLF
After CRLF
</myNode>
<
myNode>Before LF
After LF
</myNode>
</
root>

In this file, the first line break was made with a CR+LF (or rather Environment.NewLine via .NET, but parsing it matches CR+LF characters)  and the second was made only with the line feed as a line break.
I then imported that file into a database using the schema file:

<?xml version=1.0 encoding=utf-8?>
<
xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema
xmlns:dt=urn:schemas-microsoft-com:datatypes
xmlns:sql=urn:schemas-microsoft-com:mapping-schema>

<xs:element name=root sql:is-constant=1>
<
xs:complexType>
<
xs:sequence>
<
xs:element xml:space=preserve type=xs:token dt:type=string name=myNode sql:relation=LineBreak />
</
xs:sequence>
</
xs:complexType>
</
xs:element>
</
xs:schema>

Where I tried to do all sorts of XSD “keep my line breaks as they are” tricks (space=”preserve” and so on).
I then ran the import (vb.net code)

Dim strInFile As String = “c:\linebreak.xml”
Dim strSchema As String =
“c:\linebreak.xsd”
Dim objBL As Object = CreateObject(“SQLXMLBulkLoad.SQLXMLBulkload.4.0″
)
Try
objBL.ConnectionString
=
“connection string”
objBL.KeepIdentity
= False
objBL.XMLFragment
= True
objBL.ErrorLogFile =
“error log file path”
objBL.Execute(strSchema, strInFile
)
Catch ex As
Exception
Debug.WriteLine(ex.ToString
)
Finally
If Not objBL
Is Nothing Then
objBL
= Nothing
End If
End
Try

And when I viewed the imported data, the CR+LF was removed from the first “myNode” and it had no breaks what so ever, and in the second “myNode” the LF was retained.
I was asked by the Microsoft employee I was in contact with to create the issue on Microsoft Connect, so I eagerly await what comes of this for the next release(s) of SQL XML Bulk Load. Hopefully they’ll have this fixed.

The only work around I have found myself is to pre-/post process the files. Before importing, run through the XML file and replace your line breaks with some special character not present in the file (usually ¤ is a good candidate) – and once the import has run, then post process your data and replace the used character with CR+LF again.
It is not a beautiful solution, but it works around the issue and helped myself move forward with the project.

I was stuck with a problem at work where I had a very complex query pulling data out of some tables, some XML and what not, and a lot of data manipulating in the query to be able to easier populate a DataWarehouse.
I then had to expand this query and pull some extra Boolean information out, which were based on a logical comparison of two strings, which I would normal do with a CASE like CASE string1 = string2 THEN 1.

However because I had a GROUP BY clause, I could not simply do this, because the columns the strings were taken from was not in the group clause and I didn’t want them there.
So I found out that you can actually put an aggregated function around the case using the following syntax:

SELECT
<GroupByFields
>,
MAX
(
CASE
WHEN <NonGroupField> = ‘some value’ THEN
1
ELSE
0
END
)
FROM <TABLE
>
GROUP BY <GroupByFields
>

This saved me from using a common table expression or two more, and kept things simple. In essense this allows you to make conditional SUM and similar using this technique.
Just more evidence that a lot is possible in SQL, and just trying something often reveals interesting results.

A common problem to solve is to list, for example the current price for a product or latest order for a customer or similar.
It is a problem which quickly can seem complex, but once understanding the situation, then – as everything else – it is relative simple, and I’ll show how using both window functions (in this instance, RANK) from SQL Server 2005+ and using a sub-query in case working on an earlier version, or other databases.

I’ll take offset in the Northwind database, which I’ve installed on a SQL Server 2008.
It has an Order table with a foreign key to a Customer table; however for this the Order table is the only one of interest:
The Order table contains the following information:
SELECT [OrderID]
,
[CustomerID]
,
[EmployeeID]
,
[OrderDate]
,
[RequiredDate]
,
[ShippedDate]
,
[ShipVia]
,
[Freight]
,
[ShipName]
,
[ShipAddress]
,
[ShipCity]
,
[ShipRegion]
,
[ShipPostalCode]
,
[ShipCountry]
FROM [Northwind].[dbo].[Orders]

In this situation, we’ll get the latest order per customer, but the problem is the same as if you’d need the current price – the only thing which differs will be the tables and where clauses and so on.
Solving the issue using the RANK window function, and a common table expression, it would look like this:

;WITH CTE AS (
SELECT
RANK() OVER (Partition BY T1.CustomerID ORDER BY OrderDate DESC) AS OrderRank
,
T1.
*
FROM
Orders T1
)
SELECT * FROM CTE WHERE OrderRank =
1
ORDER BY CustomerID

What happens here is that we use RANK to give us a number partitioned (grouped) by CustomerID and sorted by OrderDate descending. This will give each row selected the number/rank that order is historically, from newest to earliest. That means each row which have the rank 1 will be the latest/newest order.
You can then expand on the joins inside/outside the common table expression to get information about customers or order details or what not.

It is also solvable without using the window function RANK, and would look for example something like this:

SELECT *
FROM
Orders T1
INNER JOIN
(
SELECT CustomerID, MAX(OrderDate) AS
OrderDate
FROM
Orders T2
GROUP BY
T2.CustomerID
) T3 ON T3.OrderDate = T1.OrderDate AND T1.CustomerID =
T3.CustomerID
ORDER BY T1.CustomerID

Here we utilize a sub-query where we select the CustomerID and the highest order date from Orders, which we then can join into Orders again with a self-join on customer id and order date.

Note that these are just examples. Many similar solutions exists, but they all follow the same methodology.
I’ve also not taken optimizaiton or anything like that into account. It was just to illustrate a solution to a common problem.

Seeing as I have now worked somewhat the SQL XML Bulk load, I have decided to start compiling a list of the errors I got and what I did to solve them to help myself (and others) the next time I encounter one of these errors.

The statement has been terminated.
This usually means some SQL Error has occurred within the bulk load, for example a wrong data type, truncating of data and so on.
Check the ErrorLog file for more information.

Schema: the relationship XXX referred to in the node ‘YYY’ is not defined.
This means – as the error says – that a relationship is missing in the schema file. Usually more often than not, this means I’ve misspelled something in my relationship definitions or where I use the relationship. Otherwise make sure your relationship is defined if it is missing.

Schema: the annotation ‘field’ on the empty content element ‘XXX’ is not supported.
The “field” annotation is set on what is a container tag. You can’t assign a field to a container tag, but it needs to be set on an element.

Schema: the parent/child table of the relationship on ‘XXX’ does not match.
This for me usually means that I’ve forgotten to add  the relation on the field which has a relationship, or that I’m accessing the wrong tables.

The column ‘XXX’ was defined in the schema, but does not exist in the database.
Well, obviously, this means that there’s a field mapping in the schema file which doesn’t match the database. Check the spelling of the field/column name in the schema file.

Invalid character value for cast specification.
This means that you’re trying to make an invalid cast, for example trying to cast a string into a number field.

Sep 042009

As mentioned – casually – in the msdn documentation the IsNumeric function will return 1 for some values which aren’t actually numbers.  The currency sign $ is mentioned as well are plus/minus.
Period and comma will also be returning 1.

This essentially means that you can’t be sure that it is actually a number/numeric value which passes the IsNumeric check.
And least of all, you can’t be assured of the semantic value returned is what you expect.
This can – if unaware and not paying attention – be a problem when dealing with number values for countries which do not follow the same period/decimal convention as the US.

Also – “funnily” enough due to the caveats in the IsNumeric, you can’t actually be assured that it can be converted to a Numeric data type. (It can normally always be converted into a Money, but then the name is misleading :) )
To illustrate notice for example the following snippet:

DECLARE @VAR VARCHAR(20)
SET @VAR = ‘€,.,,’

This will pass a “IsNumeric” check with the result 1, but it can’t be cast into a numeric/decimal. It can be cast into money but will return 0.00
And that is despite it is not in any form an actual numeric value.
There are some semantic checks built into the IsNumeric such as you can’t have values infront of the currency sign, so

SET @VAR = ’123$’

would fail a “IsNumeric” check. Also some checks on the numbers of “plus/minus” signs and so on……
So when using IsNumeric, it is time to be careful and not just accept a success for actual success.

Some time ago Google announced more insight in how the nofollow
attribute worked, and since then I’ve received an explosion in comment
spam.

So now I’ve taken the liberty of enabling moderating of comments, to stem up against this inrease in spams – too bad people have to ruin it for others.

Funnily enough some of these spamming idiots have the gall to put
“nospam” in as e-mail address ….. so I guess spam is an issue for them
as well. 

With any luck, moderated comments will just be temporary, but
well – it never hurt anybody to wait a little while with putting a
comment on-line.