I have been working with SQL XML Bulk Load (version 4.0) over the last period of time, and suddenly I encountered an issue where the imported data didn’t match the data in the XML files I received.
Looking closer into the issue, I discovered it was because the SQL XML Bulk Load removed my line breaks entirely from the imported data.

In XML line breaks are defined as standard using only “line feed” (LF). However working in a windows environment line breaks are almost always both carriage return and line feed combined (CR+LF).
However when the SQL XML Bulk Load parser (I would think it is a parsing issue) encounters a CR+LF in a node value it strips both characters from the text which results in the data being imported without the line break. If however the parser encounters only a LF, then that LF is retained in the imported data.

Now this seemed like a strange behaviour, because why would both CR and LF be removed and not only the CR, if it was to rely on the standard. However even more annoying, when the CR+LF is used as line breaks most every other place in Windows environments, why even remove those two characters to begin with?

I contacted Microsoft technical support which seems to have confirmed that it isn’t an error on my account, and that they maybe would look at it for the next release.  (*fingers crossed*)
Whether or not they view it as a bug I do not know – but there is definitely something wrong there. And I would call it a bug, because removing extra information can’t be intentional.

To illustrate my issue I made a very simple XML file containing two nodes:

<root>
<
myNode>Before CRLF
After CRLF
</myNode>
<
myNode>Before LF
After LF
</myNode>
</
root>

In this file, the first line break was made with a CR+LF (or rather Environment.NewLine via .NET, but parsing it matches CR+LF characters)  and the second was made only with the line feed as a line break.
I then imported that file into a database using the schema file:

<?xml version=1.0 encoding=utf-8?>
<
xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema
xmlns:dt=urn:schemas-microsoft-com:datatypes
xmlns:sql=urn:schemas-microsoft-com:mapping-schema>

<xs:element name=root sql:is-constant=1>
<
xs:complexType>
<
xs:sequence>
<
xs:element xml:space=preserve type=xs:token dt:type=string name=myNode sql:relation=LineBreak />
</
xs:sequence>
</
xs:complexType>
</
xs:element>
</
xs:schema>

Where I tried to do all sorts of XSD “keep my line breaks as they are” tricks (space=”preserve” and so on).
I then ran the import (vb.net code)

Dim strInFile As String = “c:\linebreak.xml”
Dim strSchema As String =
“c:\linebreak.xsd”
Dim objBL As Object = CreateObject(“SQLXMLBulkLoad.SQLXMLBulkload.4.0″
)
Try
objBL.ConnectionString
=
“connection string”
objBL.KeepIdentity
= False
objBL.XMLFragment
= True
objBL.ErrorLogFile =
“error log file path”
objBL.Execute(strSchema, strInFile
)
Catch ex As
Exception
Debug.WriteLine(ex.ToString
)
Finally
If Not objBL
Is Nothing Then
objBL
= Nothing
End If
End
Try

And when I viewed the imported data, the CR+LF was removed from the first “myNode” and it had no breaks what so ever, and in the second “myNode” the LF was retained.
I was asked by the Microsoft employee I was in contact with to create the issue on Microsoft Connect, so I eagerly await what comes of this for the next release(s) of SQL XML Bulk Load. Hopefully they’ll have this fixed.

The only work around I have found myself is to pre-/post process the files. Before importing, run through the XML file and replace your line breaks with some special character not present in the file (usually ¤ is a good candidate) – and once the import has run, then post process your data and replace the used character with CR+LF again.
It is not a beautiful solution, but it works around the issue and helped myself move forward with the project.

2 Responses to “SQL XML Bulk Load and line breaks”

  1. First off, let me state that your internet site is awesome. I like the design that you have. It turned out to be non-taxing on the eyes. Appreciate your write-up as well. Definitely subscribed to your feed to ensure I will not be missing out on any the latest. Wonderful job! Cheers to a successful company

  2. Do you have a spam problem here; I also use Blog Engine, and I was wondering about your situation; we have developed some good practices and we would like to exchange practices with others, please Email me if you are interested.

Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>