SQL XML Bulk Load, Relationships and composite keys

by Allan Svelmøe Hansen April 14, 2010 10:31

A little quick note, when needing composite keys in your relationship in SQL XML Bulk Load the syntax is as follow:

<sql:relationship name="Detail"
                  parent
=
"Import.Product"
                  parent
-KEY=
"ProductID FileID"
                  child
=
"Import.ProductDetails"
                  child
-KEY="ProductID FileID"/>
  


meaning they’re simply space separated in the relationship annotation. The composite key in this aspect are the columns ProductID and FileID.
As can also be seen in this annotation, I'm using a schema qualifiaction with Import, so if you have other schemas, you can simply write them in the parent/child.

For more syntax, please refer to the blog post SQL XML Bulk Load - basic XSD syntax

Bookmark and Share DotnetKicks dotnetshoutout

The OUTPUT clause

by Allan Svelmøe Hansen March 20, 2010 12:25

The output clause is a really nifty thing in SQL Server 2008, which provides you access to the same "inserted" and "deleted" tables you get access to via triggers, but from within the same query.
To illustrate, suppose we have a table:

CREATE TABLE [dbo].[OutputEx]
   
[Key] [int] IDENTITY(1,1) NOT NULL, 
   
[Value] [nchar](10) NULL, 
   
[SecondValue] [datetime2](7) NULL, 
 
CONSTRAINT [PK_OutputEx] PRIMARY KEY CLUSTERED  

   
[Key] ASC 
)WITH (PAD_INDEX  = OFFSTATISTICS_NORECOMPUTE  = OFFIGNORE_DUP_KEY = OFF
ALLOW_ROW_LOCKS  
= ON,
ALLOW_PAGE_LOCKS  = ONON [PRIMARY] 
ON [PRIMARY] 

GO 

ALTER TABLE [dbo].[OutputEx] ADD  CONSTRAINT [DF_OutputEx_SecondValue]  DEFAULT (GETDATE()) FOR [SecondValue] 

This is a table which has an identity column in [Key] and a datetime value field in SecondValue2 which defaults to the current date (GETDATE())

Now, if I insert a [Value] into this table SQL Server will automatically insert the identity value and the current date SecondValue, but if I wanted to get these automated values out, I would have had to select them out manually afterwards.
This might not seem like much of a problem given these two values (Identity and date) however it might be much more complex constraints and default values.

So what I can do is use the OUTPUT clause like this:

INSERT INTO OutputEx Value)
OUTPUT inserted.[key]inserted.secondvalue         
VALUES  'a')  
 
This will provide me with a result of the identity value inserted into [key] and the date value inserted into SecondValue.
I can use this result in an outer query by nesting the insert in a sub-query, or I can insert it into another table directly like this:

DECLARE @T TABLE(ID INT[Date] DATETIME2)

INSERT INTO OutputEx Value)
OUTPUT inserted.[key]inserted.secondvalue  INTO @T
VALUES  'a')
 

 
My table variable @T will now hold the values inserted
This also works with the "deleted" table instead of the inserted and thus both in insert/update and deleted queries.
It is a useful technique.

Bookmark and Share DotnetKicks dotnetshoutout

Upgrading to new engine

by Allan Svelmøe Hansen March 20, 2010 09:41

I'm in the process of upgrading to new blogengine.net version, and in that process there might be some issues while I get it all set up.

I'm also going to clean up the comment spam pretty heftily - so in case your legit comment vanishes, I'm sorry in advanced, but I just had to prune some 3000 spam comments
I'll also look at incorporate a catpcha as well as automated spam detection built into the engine, so hopefully comment spam and moderation of them will be easier.

And finally - I'm going to try out feedburner for my RSS feed, so I hope I do not break anything by it :)

Thanks for your patience.

Bookmark and Share DotnetKicks dotnetshoutout

Tags:

SQL XML Bulk Load and line breaks

by Allan Svelmøe Hansen February 09, 2010 13:12

I have been working with SQL XML Bulk Load (version 4.0) over the last period of time, and suddenly I encountered an issue where the imported data didn’t match the data in the XML files I received.
Looking closer into the issue, I discovered it was because the SQL XML Bulk Load removed my line breaks entirely from the imported data.

In XML line breaks are defined as standard using only "line feed" (LF). However working in a windows environment line breaks are almost always both carriage return and line feed combined (CR+LF).
However when the SQL XML Bulk Load parser (I would think it is a parsing issue) encounters a CR+LF in a node value it strips both characters from the text which results in the data being imported without the line break. If however the parser encounters only a LF, then that LF is retained in the imported data.

Now this seemed like a strange behaviour, because why would both CR and LF be removed and not only the CR, if it was to rely on the standard. However even more annoying, when the CR+LF is used as line breaks most every other place in Windows environments, why even remove those two characters to begin with?

I contacted Microsoft technical support which seems to have confirmed that it isn’t an error on my account, and that they maybe would look at it for the next release.  (*fingers crossed*)
Whether or not they view it as a bug I do not know – but there is definitely something wrong there. And I would call it a bug, because removing extra information can’t be intentional.

To illustrate my issue I made a very simple XML file containing two nodes:

<root>
<
myNode>Before CRLF
After CRLF
</myNode>
<
myNode>Before LF
After LF
</myNode>
</
root>

In this file, the first line break was made with a CR+LF (or rather Environment.NewLine via .NET, but parsing it matches CR+LF characters)  and the second was made only with the line feed as a line break.
I then imported that file into a database using the schema file:

<?xml version="1.0" encoding="utf-8"?>
<
xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dt="urn:schemas-microsoft-com:datatypes"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema"> 

  <xs:element name="root" sql:is-constant="1">
   <
xs:complexType>
    <
xs:sequence>
     <
xs:element xml:space="preserve" type="xs:token" dt:type="string" name="myNode" sql:relation="LineBreak" />
   </
xs:sequence>
  </
xs:complexType>
 </
xs:element>
</
xs:schema>

Where I tried to do all sorts of XSD “keep my line breaks as they are” tricks (space=”preserve” and so on).
I then ran the import (vb.net code)

Dim strInFile As String = "c:\linebreak.xml" 
Dim strSchema As String = 
"c:\linebreak.xsd" 
Dim objBL As Object = CreateObject("SQLXMLBulkLoad.SQLXMLBulkload.4.0"
)
Try
    objBL.ConnectionString 
"connection string"
    
objBL.KeepIdentity 
= False
    
objBL.XMLFragment 
= True
    
objBL.ErrorLogFile 
"error log file path"
    
objBL.Execute(strSchemastrInFile
)
Catch ex As 
Exception
    
Debug.WriteLine(ex.ToString
)
Finally
    
If Not objBL 
Is Nothing Then
        
objBL 
= Nothing
    End If
End 
Try
  

And when I viewed the imported data, the CR+LF was removed from the first “myNode” and it had no breaks what so ever, and in the second “myNode” the LF was retained.
I was asked by the Microsoft employee I was in contact with to create the issue on Microsoft Connect, so I eagerly await what comes of this for the next release(s) of SQL XML Bulk Load. Hopefully they’ll have this fixed.

The only work around I have found myself is to pre-/post process the files. Before importing, run through the XML file and replace your line breaks with some special character not present in the file (usually ¤ is a good candidate) – and once the import has run, then post process your data and replace the used character with CR+LF again.
It is not a beautiful solution, but it works around the issue and helped myself move forward with the project.
 

Bookmark and Share DotnetKicks dotnetshoutout

Aggregated functions and CASE

by Allan Svelmøe Hansen February 03, 2010 13:11

I was stuck with a problem at work where I had a very complex query pulling data out of some tables, some XML and what not, and a lot of data manipulating in the query to be able to easier populate a DataWarehouse.
I then had to expand this query and pull some extra Boolean information out, which were based on a logical comparison of two strings, which I would normal do with a CASE like CASE string1 = string2 THEN 1.

However because I had a GROUP BY clause, I could not simply do this, because the columns the strings were taken from was not in the group clause and I didn’t want them there.
So I found out that you can actually put an aggregated function around the case using the following syntax:

SELECT 
<GroupByFields
>,
MAX
(
    
CASE 
        
WHEN <NonGroupField=  'some value' THEN 
1
        
ELSE 
0
    
END
)
FROM <TABLE
>
GROUP BY <GroupByFields

 

This saved me from using a common table expression or two more, and kept things simple. In essense this allows you to make conditional SUM and similar using this technique.
Just more evidence that a lot is possible in SQL, and just trying something often reveals interesting results.

Bookmark and Share DotnetKicks dotnetshoutout

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant

About:
Allan Svelmøe Hansen

My real name is Allan Svelmøe Hansen.
I live in Denmark, where I work as a developer for hedal:kruse:brohus using SQL Server and the .NET framework since 2004.
My primary fields of expertise is back-end data integration, database design and optimization.


       View Allan Svelmøe Hansen's profile on LinkedIn     

Disclaimer

The opinions expressed herein are my own personal opinions and thoughts and does not represent my employers view in any way, nor are my results guaranteed for all situations.
Content is presented "as is", with no warranty.