Handling namespaces in SQL XML Bulk Load can be a headache as there’s not really much documentation about it. However while laborious, it is basically following the normal XML Schema structure.
I’ll not go deep into the SQL XML Bulk Load syntax in this blog post, if you’re looking for that, I’ll advice you to read my SQL XML Bulk Load – basic XSD syntax or SQL XML Bulk Load – more XSD syntax blog posts before this one.

I’ve made this simplified example to illustrate. Suppose you have an XML file something akin to this:

<?xml version="1.0" encoding="utf-8"?>
<Catalogue xmlns:sdt="urn:oasis:names:specification:ubl:schema:xsd:SpecializedDatatypes-2"
           xmlns="urn:oasis:names:specification:ubl:schema:xsd:Catalogue-2"
           xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
           xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Catalogue-2 UBL-Catalogue-2.0.xsd">
  <cbc:UBLVersionID>2.0</cbc:UBLVersionID>
   <cac:ValidityPeriod>
    <cbc:StartDate>2011-03-02</cbc:StartDate>
  </cac:ValidityPeriod>
   <cac:CatalogueLine>
    <cbc:ID>11002</cbc:ID>    
  </cac:CatalogueLine>
   <cac:CatalogueLine>
    <cbc:ID>11003</cbc:ID>
  </cac:CatalogueLine>
</Catalogue>

I have two custom namespaces, and a mixture of the elements within each namespace. This is simplified, but once the logic is known, it is just expanding it with more namespaces, and more mapping.

First we need to make the base XML schema/XSD file. A way to handle namespaces in the XSD file is to use import statements and split each namespace element in to its own separate file. And that technique works for SQL XML Bulk Load as well.
So we’ll make the base XSD like this

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"            
           xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
           xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:sql="urn:schemas-microsoft-com:mapping-schema">

  <xs:annotation>
    <xs:appinfo>
      <sql:relationship name="SomeName"
                        parent="Import"
                        parent-key="PK"
                        child="Import2"
                        child-key="FK" />
    </xs:appinfo>
  </xs:annotation>

  <xs:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaLocation="cbc.xsd" />
  <xs:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" schemaLocation="cac.xsd" />

  <xs:element name="Catalogue" sql:relation="Import">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="cbc:UBLVersionID" />
        <xs:element ref="cac:ValidityPeriod" />
        <xs:element maxOccurs="unbounded" ref="cac:CatalogueLine" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Notice that each of the 3 elements simply has a reference to another element. So the XML tag cbc:UBLVersionID will in the XSD simply be element which ref=”cbc:UBLVersionID”, just as cac:ValidityPeriod element is also just a reference.
The key here however is that I’ve imported two namespaces with a schema location

 <xs:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaLocation="cbc.xsd" />
<xs:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" schemaLocation="cac.xsd" />

These lines will tell the base schema file where to look for the specific definitions for the namespaces.

Now, I can make a specific schema file for the cbc namespace and the cac namespace.
The cac one looks like

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:tns="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
                attributeFormDefault="unqualified" elementFormDefault="qualified"
                targetNamespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
           xmlns:sql="urn:schemas-microsoft-com:mapping-schema"
           xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaLocation="cbc.xsd" />

  <xs:element name="ValidityPeriod" sql:is-constant="1" >
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="cbc:StartDate" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="CatalogueLine" sql:relationship="SomeName" sql:relation="Import2">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="cbc:ID"  />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Here the same principle of using import statement to map a namespace

<xs:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaLocation="cbc.xsd" />

is used similar to the base file.

The cac file is used to define the contents of all cac namespace elements (the naming of the file is irrelevant, I simply named them this way to help myself keeping track). And where needed in this cac file, a reference is made to an element in the cbc file (in this case cbc:StartDate and cbc:ID).
The cbc file will then qualify all cbc elements

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:tns="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
           attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:sql="urn:schemas-microsoft-com:mapping-schema"

           >
  <xs:element name="UBLVersionID" type="xs:decimal" sql:field="VersionID" />
  <xs:element name="StartDate" type="xs:date" sql:field="DateX" />
  <xs:element name="ID" type="xs:unsignedShort" sql:field="LineID" />
</xs:schema>

As can be seen, the sql:field, sql:relationship and sql:relation are all used across all 3 files.
In the base file, I’ve defined a relationship, and I’m using the relationship in the cac file around a cbc element.

Running this through a SQL XML Bulk Load, and I get my data imported into two tables like this

This was just a simplified example, but I hope it helps to show how the syntax is build up in a XSD and then spice it up with some SQL mapping.

For more insight into SQL XML Bulk Load and the syntax, you can check out my other posts on the subject.

I have posted a couple of times about the XML/XSD syntax used for SQL XML Bulk Load but mostly the syntax have been pretty basic, so I wanted to expand a little on the syntax with a more complex example.

So this time I’ll have an XML file structured like this. As can be seen this XML file consists of a couple of nested tags and is a structure that’s realistic when it comes to data imports. In fact it is a minimization of an existing one I have used for work, where I have removed duplicated structures to provide a cleaner example.
The file consists of repeated details tags with content and attributes used, a product files section containing a repeated tag and a nesting of Product Lists. It should cover most of the situations which one would encounter.
The only thing I do not use in this instance is composite keys, but they’re no different compared to using a singular one. I’ve written about composite keys and SQL XML Bulk Load here.
Anyways, let’s tackle the XSD for this file. (If new to the SQL XML Bulk Load, please read my SQL XML Bulk Load – basic XSD syntax post first)

The first tag, ProductID, is simple enough and covered in the basic syntax blog post.
It requires a database table containing this field. Any subsequent values placed on the same level of the XML file would be included into the same table.
So we’ll make a Product table:
CREATE TABLE Product(
ProductID INT
NOT NULL
)

The XSD field will then be mapped to this table. And because I’ve named the field the same as the tag in the XML file I do not need to map the field.
This gives me the segment:
<xs:element Name=“Product” sql:relation=“Product” maxOccurs=“unbounded”>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element Name=“ProductID” />

The details section is next. For this we need to set up a relationship because the section can be repeated several times, so we need to map it out into a related table.
For this section we’ll need the table:
CREATE TABLE Details(
Name VARCHAR(MAX
) NULL,
[Description] VARCHAR(MAX
) NULL,
LanguageCode VARCHAR(5
) NOT NULL,
ProductID INT
NOT NULL
)

This can then hold the Name and Description tags from the section, and the attribute “LanguageCode” and the key ProductID to map it to the Product table/Product section.
Because we need a relationship we’ll have to map it out:
<sql:relationship Name=“Detail”
parent=
“Product”
parent-key=
“ProductID”
child=
“Details”
child-key=
“ProductID”
/>

And this then provides the following section or the XSD
<xs:element Name=“Details” sql:relation=“Details” sql:relationship=“Detail”>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element Name=“Name”
/>
<
xs:element Name=“Description”
/>
</
xs:sequence
>
<
xs:attribute Name=“LanguageCode” Type=“xs:string” use=“required”
/>
</
xs:complexType
>
</
xs:element
>

As can be seen we map the section to the relationship and the table relation.
For the section of Files, we’ll need to expand a little more. Again we need a relationship to tell the bulk load which keys to use.
First the table to import into:
CREATE TABLE Files (
ProductID INT
NOT NULL,
[Type] VARCHAR(255
) NULL,
[FileName] VARCHAR(255
) NULL
)

Then the relationship in the XSD file
<sql:relationship Name=“ProductFiles”
parent=
“Product”
parent-key=
“ProductID”
child=
“Files”
child-key=
“ProductID”
/>

Just like the previous one.

The XSD section will then look like the following:
<xs:element Name=“ProductFiles” sql:is-constant=“1″>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element maxOccurs=“unbounded” Name=“FileName”
sql:relationship=“ProductFiles” sql:relation=“Files”>
<
xs:complexType
>
<
xs:simpleContent
>
<
xs:extension base=“xs:string”
>
<
xs:attribute Name=“type” Type=“xs:string” use=“required”
/>
</
xs:extension
>
</
xs:simpleContent
>
</
xs:complexType
>
</
xs:element
>
</
xs:sequence
>
</
xs:complexType
>
</
xs:element>


We have the is-constant=”1” because this section can only exists once. The relationship and relation is put on the “FileName” element. The “tricky” part here is the xs:extension which is needed because otherwise the Bulk Load will throw an error that you’re trying to map multiple FileName elements. <ERROR>

On to the List section.
Now this is a more difficult one, because here we’ll need a double relationship. We need to map the ProductID to the Type and then the Type to the multiple ProductID’s within that section. So we need the following tables:
CREATE TABLE ProductList (
[ID] INT IDENTITY(1,1
) NOT NULL,
ProductID INT
NOT NULL,
Header VARCHAR(255
) NULL
)
CREATE TABLE ProductListProduct
(
ProductListID INT
NOT NULL,
ProductID
INT
)

Here we make an Identity ID in the ProductList which will be the foreign key in the ProductListProduct:ProductListID column.
So we need the two following relationships in our XSD
<sql:relationship Name=“ProductToList”
parent=
“Product”
parent-key=
“ProductID”
child=
“ProductList”
child-key=
“ProductID”
/>

<sql:relationship Name=“ListToProduct”
parent=
“ProductList”
parent-key=
“ID”
child=
“ProductListProduct”
child-key=
“ProductListID”
/>

To map the relationship. The only difference here is that the parent-key for ListToProduct is the identity column from the ProductList table, and it will be used as the child-key in the column ProductListID.
And this gives the following XSD section

<xs:element Name=“ProductList” sql:is-constant=“1″>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element Name=“List” sql:relationship=“ProductToList” sql:relation=“ProductList”
>
<
xs:complexType
>
<
xs:sequence
>
<
xs:element maxOccurs=“unbounded” Name=“ProductID” Type=“xs:integer”
sql:relationship=“ListToProduct” sql:relation=“ProductListProduct”
/>
</
xs:sequence
>
<
xs:attribute Name=“header” Type=“xs:string” use=“required”
/>
</
xs:complexType
>
</
xs:element
>
</
xs:sequence
>
</
xs:complexType
>
</
xs:element
>

Again we have the is-constant annotation.
We then map the element List to the relationship ProductToList as defined previously and to the table ProductList. Because header is then an attribute named the same as the field in the database, it’ll map automatically as usual.
However we’ll then map the inner element “ProductID” within the List element to the next relationship ListToProduct and the table ProductListProduct.

The SQL XML Bulk Load will then automatically handle the identity column in the ProductList table and map it to the ProductListID column in the ProductListProduct table.

This gives us this following complete XSD file

And when running the bulk load with this XML and this XSD we get the following result:

Result for importing SQL XML Bulk Load

A little quick note, when needing composite keys in your relationship in SQL XML Bulk Load the syntax is as follow:

<sql:relationship name=“Detail”
parent
=
“Import.Product”
parent
-KEY=
“ProductID FileID”
child
=
“Import.ProductDetails”
child
-KEY=“ProductID FileID”/>

meaning they’re simply space separated in the relationship annotation. The composite key in this aspect are the columns ProductID and FileID.
As can also be seen in this annotation, I’m using a schema qualifiaction with Import, so if you have other schemas, you can simply write them in the parent/child.

For more syntax, please refer to the blog post SQL XML Bulk Load – basic XSD syntax

I have been working with SQL XML Bulk Load (version 4.0) over the last period of time, and suddenly I encountered an issue where the imported data didn’t match the data in the XML files I received.
Looking closer into the issue, I discovered it was because the SQL XML Bulk Load removed my line breaks entirely from the imported data.

In XML line breaks are defined as standard using only “line feed” (LF). However working in a windows environment line breaks are almost always both carriage return and line feed combined (CR+LF).
However when the SQL XML Bulk Load parser (I would think it is a parsing issue) encounters a CR+LF in a node value it strips both characters from the text which results in the data being imported without the line break. If however the parser encounters only a LF, then that LF is retained in the imported data.

Now this seemed like a strange behaviour, because why would both CR and LF be removed and not only the CR, if it was to rely on the standard. However even more annoying, when the CR+LF is used as line breaks most every other place in Windows environments, why even remove those two characters to begin with?

I contacted Microsoft technical support which seems to have confirmed that it isn’t an error on my account, and that they maybe would look at it for the next release.  (*fingers crossed*)
Whether or not they view it as a bug I do not know – but there is definitely something wrong there. And I would call it a bug, because removing extra information can’t be intentional.

To illustrate my issue I made a very simple XML file containing two nodes:

<root>
<
myNode>Before CRLF
After CRLF
</myNode>
<
myNode>Before LF
After LF
</myNode>
</
root>

In this file, the first line break was made with a CR+LF (or rather Environment.NewLine via .NET, but parsing it matches CR+LF characters)  and the second was made only with the line feed as a line break.
I then imported that file into a database using the schema file:

<?xml version=1.0 encoding=utf-8?>
<
xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema
xmlns:dt=urn:schemas-microsoft-com:datatypes
xmlns:sql=urn:schemas-microsoft-com:mapping-schema>

<xs:element name=root sql:is-constant=1>
<
xs:complexType>
<
xs:sequence>
<
xs:element xml:space=preserve type=xs:token dt:type=string name=myNode sql:relation=LineBreak />
</
xs:sequence>
</
xs:complexType>
</
xs:element>
</
xs:schema>

Where I tried to do all sorts of XSD “keep my line breaks as they are” tricks (space=”preserve” and so on).
I then ran the import (vb.net code)

Dim strInFile As String = “c:\linebreak.xml”
Dim strSchema As String =
“c:\linebreak.xsd”
Dim objBL As Object = CreateObject(“SQLXMLBulkLoad.SQLXMLBulkload.4.0″
)
Try
objBL.ConnectionString
=
“connection string”
objBL.KeepIdentity
= False
objBL.XMLFragment
= True
objBL.ErrorLogFile =
“error log file path”
objBL.Execute(strSchema, strInFile
)
Catch ex As
Exception
Debug.WriteLine(ex.ToString
)
Finally
If Not objBL
Is Nothing Then
objBL
= Nothing
End If
End
Try

And when I viewed the imported data, the CR+LF was removed from the first “myNode” and it had no breaks what so ever, and in the second “myNode” the LF was retained.
I was asked by the Microsoft employee I was in contact with to create the issue on Microsoft Connect, so I eagerly await what comes of this for the next release(s) of SQL XML Bulk Load. Hopefully they’ll have this fixed.

The only work around I have found myself is to pre-/post process the files. Before importing, run through the XML file and replace your line breaks with some special character not present in the file (usually ¤ is a good candidate) – and once the import has run, then post process your data and replace the used character with CR+LF again.
It is not a beautiful solution, but it works around the issue and helped myself move forward with the project.

Some time ago I started a blog post on how to use SQL XML Bulk Load for SQL Server, and this time around I’ll tackle some of the syntax for how to map a XML file to the Bulk Load.

When mapping an XML file to the schema for XML SQL Bulk Load the easiest method is to get an IDE (for example Visual Studio) to create the schema file for you based on your XML, if you don’t have the schema outright.

In this blog post, I’ll use this XML file as an example, because it holds both how to read fields, attributes and relationships, and thus should cover many of the basics of the bulk load. I made my schema file via Visual Studio, and once you have a schema file you can start mapping the elements to the database tables and columns.

The first things first is to import the namespace: urn:schemas-microsoft-com:xml-sql into the file with a prefix. Usually I suggest (as others do) using “sql” for this prefix, simply to illustrate clearly that the namespace is tied to the bulk import and not to confuse it with other namespaces.
Now –a word of warning. The bulk load can only insert values into the database, so the way I normally handle imports in a production environment is to make an “import structure” which mirrors the imported data and will act as temporary storage. Then I make some SQL scripts to moving the data over into the production tables from my temporary import structure.
This also allows for interjecting a sort of validation layer for your data if you choose.

For this xml import, I make this import structure in the database:

CREATE TABLE [dbo].[BulkLoadedProduct](
[InternalProductID] [int] IDENTITY(1,1
) NOT NULL,
[ProductID] [nvarchar](200
) NOT NULL
)
ON
[PRIMARY]

CREATE TABLE [dbo].[BulkLoadedProductDetails](
[InternalProductID] [int]
NOT NULL,
[LanguageCode] [varchar](5
) NOT NULL,
[Name] [nvarchar](255
) NOT NULL,
[ProductDescription1] [nvarchar](MAX
) NOT NULL
)
ON [PRIMARY]

Now the sharp observer will notice some inconsistency in the naming of my columns; it is quite intentional, because I’ll use it later to illustrate how to map elements to fields in two different methods.

Just to recap, the method of calling the Bulk Load was written in an earlier blog piece which can be found here.

So, down to business. The first keyword which is needed to be mentioned is the “is-constant“.
This is used to indicate an element which is only appearing once per “relationship”, and which isn’t to be mapped to a field in the database.
So in my instance the “Products” root element is one of those, so I annotate this field with the is-constant=”1″ like this:
<xs:element Name=“Products” sql:is-constant=“1″>

The next annotation which is important to know is “relation“. This is used to map elements to a table in the database. So in this case, I want to map Products elements to the table BulkLoadedProduct which happens like this:
<xs:element Name=“Product” maxOccurs=“unbounded” sql:relation=“BulkLoadedProduct”>

After this we need to tell the bulk load where to put the field “ID” into the database. That’s done with the sql:field annotation like this:
<xs:element Name=“ID” Type=“xs:string” sql:field=“ProductID” />

Right now I want to introduce an annotation called “mapped” which tells the bulk load that the field is not mapped to the database. So adding the sql:mapped=”0″ (or false if you’d rather use that) to an element then you’ll not get the field into the database.
So if we for example add the sql:mapped=”0″ to the element LanguageValues like this
<xs:element Name=“LanguageValues” sql:mapped=“0″>
then it will tell the bulk load to ignore all child structure under that element.
But seeing as I want the details imported, I’ll skip that part myself – but it is useful for debugging purpose as well, so it is important to know.

So, the next problem is getting the details loaded. This is done with relationships.
So remove the mapped annotation and write the following in the top of your schema file. I personally prefer to keep the relationships in the top of my schema file for easier reference, but they can be added directly on the element. I’ll however only show the one way of doing it.
<xs:annotation>
<
xs:appinfo
>
<
sql:relationship Name=
“Detail”
parent=
“BulkLoadedProduct”
parent-key=
“ProductID”
child=
“BulkLoadProductDetails”
child-key=“ProductID”
/>
</
xs:appinfo
>
</
xs:annotation>

This tells the bulk load to create a relationship between the two tables based on the parent/child key naming.
This relationship is then added to the LanguageValue element:
<xs:element maxOccurs=“unbounded” Name=“LanguageValue”
sql:relation=“BulkLoadProductDetails” sql:relationship=“Detail”>

One thing to be wary of is that the more relationships you have, i.e. a more complex XML structure, the slower your import will be. So if speed is important, make your XML import file simple as possible, then you can load big files very fast.

Finally, as I mentioned earlier, you can map elements to columns in two different ways. One was via the “field” annotation, however if the element is called the very same as the column in the target table relation, you do not need to add a field. So, for the fields Name and LangaugeCode, I do not need to add any “sql:field” annotation because the Bulk Load automatically will map it to the fields in the table when they’re named the same. This can simplify a schema file tremendously, so it is worth remembering.

Anyways, putting it all together and I end up with the following schema file.
Running the XML through the XML SQL Bulk Load with the scehma file, and I end up with a result which looks like this:

Result SQL selection from XML SQL Bulk Load
(click for larger view)

and now I’ve successfully imported the data from my XML into the database.