SQL XML Bulk Load - more XSD syntax

by Allan Svelmøe Hansen April 17, 2010 13:43

I have posted a couple of times about the XML/XSD syntax used for SQL XML Bulk Load but mostly the syntax have been pretty basic, so I wanted to expand a little on the syntax with a more complex example.

So this time I’ll have an XML file structured like this. As can be seen this XML file consists of a couple of nested tags and is a structure that’s realistic when it comes to data imports. In fact it is a minimization of an existing one I have used for work, where I have removed duplicated structures to provide a cleaner example.
The file consists of repeated details tags with content and attributes used, a product files section containing a repeated tag and a nesting of Product Lists. It should cover most of the situations which one would encounter.
The only thing I do not use in this instance is composite keys, but they’re no different compared to using a singular one. I’ve written about composite keys and SQL XML Bulk Load here.
Anyways, let’s tackle the XSD for this file. (If new to the SQL XML Bulk Load, please read my SQL XML Bulk Load - basic XSD syntax post first)

The first tag, ProductID, is simple enough and covered in the basic syntax blog post.
It requires a database table containing this field. Any subsequent values placed on the same level of the XML file would be included into the same table.
So we’ll make a Product table:
CREATE TABLE Product(
 
ProductID INT 
NOT NULL
)


The XSD field will then be mapped to this table. And because I’ve named the field the same as the tag in the XML file I do not need to map the field.
This gives me the segment:
<xs:element Name="Product" sql:relation="Product" maxOccurs="unbounded">
          <
xs:complexType
>
            <
xs:sequence
>
              <
xs:element Name="ProductID" />
  


The details section is next. For this we need to set up a relationship because the section can be repeated several times, so we need to map it out into a related table.
For this section we’ll need the table:
CREATE TABLE Details(
 
Name VARCHAR(MAX
) NULL,
 
[Description] VARCHAR(MAX
) NULL, 
 
LanguageCode VARCHAR(5
) NOT NULL,
 
ProductID INT 
NOT NULL
)
  


This can then hold the Name and Description tags from the section, and the attribute “LanguageCode” and the key ProductID to map it to the Product table/Product section.
Because we need a relationship we’ll have to map it out:
<sql:relationship Name="Detail"
            
parent=
"Product"
            
parent-key=
"ProductID"
            
child=
"Details"
            
child-key=
"ProductID"
            
/> 
  


And this then provides the following section or the XSD
<xs:element Name="Details" sql:relation="Details" sql:relationship="Detail"
   <
xs:complexType

       <
xs:sequence
>
            <
xs:element Name="Name" 
/>
            <
xs:element Name="Description" 
/> 
       </
xs:sequence

       <
xs:attribute Name="LanguageCode" Type="xs:string" use="required" 
/> 
   </
xs:complexType
>
</
xs:element
> 


As can be seen we map the section to the relationship and the table relation.
For the section of Files, we’ll need to expand a little more. Again we need a relationship to tell the bulk load which keys to use.
First the table to import into:
CREATE TABLE Files 
  
ProductID INT 
NOT NULL, 
  
[Type] VARCHAR(255
) NULL, 
  
[FileName] VARCHAR(255
) NULL
)
  

Then the relationship in the XSD file
<sql:relationship Name="ProductFiles" 
        
parent=
"Product" 
        
parent-key=
"ProductID" 
        
child=
"Files" 
        
child-key=
"ProductID" 
        
/> 
  


Just like the previous one.

The XSD section will then look like the following:
<xs:element Name="ProductFiles" sql:is-constant="1">
  <
xs:complexType
>
    <
xs:sequence
>
      <
xs:element maxOccurs="unbounded" Name="FileName" 
          
sql:relationship="ProductFiles" sql:relation="Files">
        <
xs:complexType
>
          <
xs:simpleContent
>
            <
xs:extension base="xs:string"
>
              <
xs:attribute Name="type" Type="xs:string" use="required" 
/>
            </
xs:extension
>
          </
xs:simpleContent
>
        </
xs:complexType
>
      </
xs:element
>
    </
xs:sequence
>
  </
xs:complexType
>
</
xs:element>


We have the is-constant=”1” because this section can only exists once. The relationship and relation is put on the “FileName” element. The “tricky” part here is the xs:extension which is needed because otherwise the Bulk Load will throw an error that you’re trying to map multiple FileName elements. <ERROR>

On to the List section.
Now this is a more difficult one, because here we’ll need a double relationship. We need to map the ProductID to the Type and then the Type to the multiple ProductID’s within that section. So we need the following tables:
CREATE TABLE ProductList (
                
[ID] INT IDENTITY(1,1
) NOT NULL,
                
ProductID INT 
NOT NULL,
                
Header VARCHAR(255
) NULL
    )
CREATE TABLE ProductListProduct
(
                
ProductListID INT 
NOT NULL,
                
ProductID 
INT
    
  


Here we make an Identity ID in the ProductList which will be the foreign key in the ProductListProduct:ProductListID column.
So we need the two following relationships in our XSD
<sql:relationship Name="ProductToList"
                       
parent=
"Product"
                       
parent-key=
"ProductID"
                       
child=
"ProductList"
                       
child-key=
"ProductID"
                        
/>


<
sql:relationship Name=
"ListToProduct"
                  
parent=
"ProductList"
                  
parent-key=
"ID"
                  
child=
"ProductListProduct"
                  
child-key=
"ProductListID"
                        
/>


To map the relationship. The only difference here is that the parent-key for ListToProduct is the identity column from the ProductList table, and it will be used as the child-key in the column ProductListID.
And this gives the following XSD section

<xs:element Name="ProductList" sql:is-constant="1">
  <
xs:complexType
>
    <
xs:sequence
>
      <
xs:element Name="List" sql:relationship="ProductToList" sql:relation="ProductList"
>
        <
xs:complexType
>
          <
xs:sequence
>
            <
xs:element maxOccurs="unbounded" Name="ProductID" Type="xs:integer" 
             
sql:relationship="ListToProduct" sql:relation="ProductListProduct" 
/>
          </
xs:sequence
>
          <
xs:attribute Name="header" Type="xs:string" use="required" 
/>
        </
xs:complexType
>
      </
xs:element
>
    </
xs:sequence
>
  </
xs:complexType
>
</
xs:element

 
Again we have the is-constant annotation.
We then map the element List to the relationship ProductToList as defined previously and to the table ProductList. Because header is then an attribute named the same as the field in the database, it’ll map automatically as usual.
However we’ll then map the inner element “ProductID” within the List element to the next relationship ListToProduct and the table ProductListProduct.

The SQL XML Bulk Load will then automatically handle the identity column in the ProductList table and map it to the ProductListID column in the ProductListProduct table.

This gives us this following complete XSD file

And when running the bulk load with this XML and this XSD we get the following result:

Result for importing SQL XML Bulk Load

Bookmark and Share DotnetKicks dotnetshoutout

SQL XML Bulk Load, Relationships and composite keys

by Allan Svelmøe Hansen April 14, 2010 10:31

A little quick note, when needing composite keys in your relationship in SQL XML Bulk Load the syntax is as follow:

<sql:relationship name="Detail"
                  parent
=
"Import.Product"
                  parent
-KEY=
"ProductID FileID"
                  child
=
"Import.ProductDetails"
                  child
-KEY="ProductID FileID"/>
  


meaning they’re simply space separated in the relationship annotation. The composite key in this aspect are the columns ProductID and FileID.
As can also be seen in this annotation, I'm using a schema qualifiaction with Import, so if you have other schemas, you can simply write them in the parent/child.

For more syntax, please refer to the blog post SQL XML Bulk Load - basic XSD syntax

Bookmark and Share DotnetKicks dotnetshoutout

SQL XML Bulk Load and line breaks

by Allan Svelmøe Hansen February 09, 2010 13:12

I have been working with SQL XML Bulk Load (version 4.0) over the last period of time, and suddenly I encountered an issue where the imported data didn’t match the data in the XML files I received.
Looking closer into the issue, I discovered it was because the SQL XML Bulk Load removed my line breaks entirely from the imported data.

In XML line breaks are defined as standard using only "line feed" (LF). However working in a windows environment line breaks are almost always both carriage return and line feed combined (CR+LF).
However when the SQL XML Bulk Load parser (I would think it is a parsing issue) encounters a CR+LF in a node value it strips both characters from the text which results in the data being imported without the line break. If however the parser encounters only a LF, then that LF is retained in the imported data.

Now this seemed like a strange behaviour, because why would both CR and LF be removed and not only the CR, if it was to rely on the standard. However even more annoying, when the CR+LF is used as line breaks most every other place in Windows environments, why even remove those two characters to begin with?

I contacted Microsoft technical support which seems to have confirmed that it isn’t an error on my account, and that they maybe would look at it for the next release.  (*fingers crossed*)
Whether or not they view it as a bug I do not know – but there is definitely something wrong there. And I would call it a bug, because removing extra information can’t be intentional.

To illustrate my issue I made a very simple XML file containing two nodes:

<root>
<
myNode>Before CRLF
After CRLF
</myNode>
<
myNode>Before LF
After LF
</myNode>
</
root>

In this file, the first line break was made with a CR+LF (or rather Environment.NewLine via .NET, but parsing it matches CR+LF characters)  and the second was made only with the line feed as a line break.
I then imported that file into a database using the schema file:

<?xml version="1.0" encoding="utf-8"?>
<
xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dt="urn:schemas-microsoft-com:datatypes"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema"> 

  <xs:element name="root" sql:is-constant="1">
   <
xs:complexType>
    <
xs:sequence>
     <
xs:element xml:space="preserve" type="xs:token" dt:type="string" name="myNode" sql:relation="LineBreak" />
   </
xs:sequence>
  </
xs:complexType>
 </
xs:element>
</
xs:schema>

Where I tried to do all sorts of XSD “keep my line breaks as they are” tricks (space=”preserve” and so on).
I then ran the import (vb.net code)

Dim strInFile As String = "c:\linebreak.xml" 
Dim strSchema As String = 
"c:\linebreak.xsd" 
Dim objBL As Object = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0"
)
Try
    objBL.ConnectionString 
"connection string"
    
objBL.KeepIdentity 
= False
    
objBL.XMLFragment 
= True
    
objBL.ErrorLogFile 
"error log file path"
    
objBL.Execute(strSchemastrInFile
)
Catch ex As 
Exception
    
Debug.WriteLine(ex.ToString
)
Finally
    
If Not objBL 
Is Nothing Then
        
objBL 
= Nothing
    End If
End 
Try
  

And when I viewed the imported data, the CR+LF was removed from the first “myNode” and it had no breaks what so ever, and in the second “myNode” the LF was retained.
I was asked by the Microsoft employee I was in contact with to create the issue on Microsoft Connect, so I eagerly await what comes of this for the next release(s) of SQL XML Bulk Load. Hopefully they’ll have this fixed.

The only work around I have found myself is to pre-/post process the files. Before importing, run through the XML file and replace your line breaks with some special character not present in the file (usually ¤ is a good candidate) – and once the import has run, then post process your data and replace the used character with CR+LF again.
It is not a beautiful solution, but it works around the issue and helped myself move forward with the project.
 

Bookmark and Share DotnetKicks dotnetshoutout

SQL XML Bulk Load - some common errors

by Allan Svelmøe Hansen October 22, 2009 11:50

Seeing as I have now worked somewhat the SQL XML Bulk load, I have decided to start compiling a list of the errors I got and what I did to solve them to help myself (and others) the next time I encounter one of these errors.

The statement has been terminated.
This usually means some SQL Error has occurred within the bulk load, for example a wrong data type, truncating of data and so on.
Check the ErrorLog file for more information.

Schema: the relationship XXX referred to in the node 'YYY' is not defined.
This means – as the error says – that a relationship is missing in the schema file. Usually more often than not, this means I’ve misspelled something in my relationship definitions or where I use the relationship. Otherwise make sure your relationship is defined if it is missing.

Schema: the annotation 'field' on the empty content element 'XXX' is not supported.
The “field” annotation is set on what is a container tag. You can’t assign a field to a container tag, but it needs to be set on an element.

Schema: the parent/child table of the relationship on 'XXX' does not match.
This for me usually means that I’ve forgotten to add  the relation on the field which has a relationship, or that I’m accessing the wrong tables.

The column 'XXX' was defined in the schema, but does not exist in the database.
Well, obviously, this means that there’s a field mapping in the schema file which doesn’t match the database. Check the spelling of the field/column name in the schema file.

Invalid character value for cast specification.
This means that you’re trying to make an invalid cast, for example trying to cast a string into a number field.

 

Bookmark and Share DotnetKicks dotnetshoutout

SQL XML Bulk Load - basic XSD syntax

by Allan Svelmøe Hansen July 13, 2009 20:05

Some time ago I started a blog post on how to use SQL XML Bulk Load for SQL Server, and this time around I’ll tackle some of the syntax for how to map a XML file to the Bulk Load.

When mapping an XML file to the schema for XML SQL Bulk Load the easiest method is to get an IDE (for example Visual Studio) to create the schema file for you based on your XML, if you don’t have the schema outright.

In this blog post, I’ll use this XML file as an example, because it holds both how to read fields, attributes and relationships, and thus should cover many of the basics of the bulk load. I made my schema file via Visual Studio, and once you have a schema file you can start mapping the elements to the database tables and columns.

The first things first is to import the namespace: urn:schemas-microsoft-com:xml-sql into the file with a prefix. Usually I suggest (as others do) using "sql" for this prefix, simply to illustrate clearly that the namespace is tied to the bulk import and not to confuse it with other namespaces.
Now –a word of warning. The bulk load can only insert values into the database, so the way I normally handle imports in a production environment is to make an "import structure" which mirrors the imported data and will act as temporary storage. Then I make some SQL scripts to moving the data over into the production tables from my temporary import structure.
This also allows for interjecting a sort of validation layer for your data if you choose.

For this xml import, I make this import structure in the database:

CREATE TABLE [dbo].[BulkLoadedProduct](
   
[InternalProductID] [int] IDENTITY(1,1
) NOT NULL,
   
[ProductID] [nvarchar](200
) NOT NULL
ON 
[PRIMARY]

CREATE TABLE [dbo].[BulkLoadedProductDetails]
(
   
[InternalProductID] [int] 
NOT NULL,
   
[LanguageCode] [varchar](5
) NOT NULL,
   
[Name] [nvarchar](255
) NOT NULL,
   
[ProductDescription1] [nvarchar](MAX
) NOT NULL
ON [PRIMARY]
  

Now the sharp observer will notice some inconsistency in the naming of my columns; it is quite intentional, because I’ll use it later to illustrate how to map elements to fields in two different methods.

Just to recap, the method of calling the Bulk Load was written in an earlier blog piece which can be found here.

So, down to business. The first keyword which is needed to be mentioned is the "is-constant".
This is used to indicate an element which is only appearing once per “relationship”, and which isn’t to be mapped to a field in the database.
So in my instance the “Products” root element is one of those, so I annotate this field with the is-constant="1" like this:
<xs:element Name="Products" sql:is-constant="1">

The next annotation which is important to know is "relation". This is used to map elements to a table in the database. So in this case, I want to map Products elements to the table BulkLoadedProduct which happens like this:
<xs:element Name="Product" maxOccurs="unbounded" sql:relation="BulkLoadedProduct">    

After this we need to tell the bulk load where to put the field "ID" into the database. That’s done with the sql:field annotation like this:
<xs:element Name="ID" Type="xs:string" sql:field="ProductID" />

Right now I want to introduce an annotation called "mapped" which tells the bulk load that the field is not mapped to the database. So adding the sql:mapped="0" (or false if you’d rather use that) to an element then you’ll not get the field into the database.
So if we for example add the sql:mapped="0" to the element LanguageValues like this
<xs:element Name="LanguageValues" sql:mapped="0">  
then it will tell the bulk load to ignore all child structure under that element.
But seeing as I want the details imported, I’ll skip that part myself - but it is useful for debugging purpose as well, so it is important to know. 

So, the next problem is getting the details loaded. This is done with relationships.
So remove the mapped annotation and write the following in the top of your schema file. I personally prefer to keep the relationships in the top of my schema file for easier reference, but they can be added directly on the element. I’ll however only show the one way of doing it.
<xs:annotation
  <
xs:appinfo

    <
sql:relationship Name=
"Detail" 
                      
parent=
"BulkLoadedProduct" 
                      
parent-key=
"ProductID" 
                      
child=
"BulkLoadProductDetails" 
                      
child-key="ProductID" 
/> 
  </
xs:appinfo

</
xs:annotation>
  

This tells the bulk load to create a relationship between the two tables based on the parent/child key naming.
This relationship is then added to the LanguageValue element:
<xs:element maxOccurs="unbounded" Name="LanguageValue" 
      
sql:relation="BulkLoadProductDetails" sql:relationship="Detail">
  

One thing to be wary of is that the more relationships you have, i.e. a more complex XML structure, the slower your import will be. So if speed is important, make your XML import file simple as possible, then you can load big files very fast.

Finally, as I mentioned earlier, you can map elements to columns in two different ways. One was via the “field” annotation, however if the element is called the very same as the column in the target table relation, you do not need to add a field. So, for the fields Name and LangaugeCode, I do not need to add any “sql:field” annotation because the Bulk Load automatically will map it to the fields in the table when they’re named the same. This can simplify a schema file tremendously, so it is worth remembering.

Anyways, putting it all together and I end up with the following schema file.
Running the XML through the XML SQL Bulk Load with the scehma file, and I end up with a result which looks like this:

Result SQL selection from XML SQL Bulk Load
(click for larger view)

and now I’ve successfully imported the data from my XML into the database.

Bookmark and Share DotnetKicks dotnetshoutout

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.