XML (RSS)

XML

Limitation in SQL's XML Support

I managed to outsmart myself again....

I've been working on an application that processes large XML files.  One of the routines involves passing the XML to a stored procedure, which then inserts certain parts of the XML into various tables.  This is done via sp_xml_preparedocument and openxml.  This is fairly trivial and works quite well.

Now, this system processes a lot of XML.  Some of the tables involved have reached 15+ million rows, and insert performance has decreased to a point where I had to take action.  I decided that, in addition to normal SQL tuning, I would break this task into two parts.  First, the XML would be placed, unparsed, into a staging table.  A separate process would take the data from this table, perform the proper parsing via openxml,  and insert the resulting data into the various destination tables.  This separation of work would allow the main process to operate more quickly. 

I created the staging table and tested the system performance.  The system ran 400% faster.  Cool. 

Next, I had to write the process that would retrieve the data from this staging table and insert it into the destination tables.  This is when I discovered my mistake.

The XML is stored in NTEXT fields in the staging table.  I needed to read this into a local variable, so that I could pass it to the sp_xml_preparedocument stored procedure.  This can't be done, because local variables can't be declared as NTEXT.  I was stuck.

I found a good article on the net that explained a workaround.  However, in the article, the author states that he has used this workaround on documents that were “over 36,000 bytes in size.”  My documents sometimes reach 20 MB in size.  His method doesn't work on documents over about 300K.  (You should check it out, though.  It's a good read.)  I had to solve my own problem.

So, as I was running out of time, I decided to fall back to the one solution that I knew would work.  I wrote a .NET process that runs on a background thread.  This process reads the XML from the staging table (via an FOR XML EXPLICIT query), parses this data via a couple of calls to the XmlDocument's SelectSingleNode method, and then uses another stored procedure to insert the data into the destination tables.  This works, but I don't like it.  I would rather have all of this data manipulation occur on the SQL Server, instead of being passed back and forth between the SQL Server and the Application Server.  If anyone wants to send some alternatives my way, I would be most appreciative. 

I can't wait for SQL 2005.  Maybe this kind of thing will go away....

OpenXML Problem in SQL 2000 Standard Edition

I've been developing more stored procedures lately that use the OpenXML function.  I developed these on SQL Server Developer Edition.  Normally, I then deploy them to a server running SQL Server Enterprise Edition.  I've been doing this for years with no problems.

This time, however, the procs did not work.  I kept receiving the following error:

OLE DB provider 'OpenXML' reported an error. The provider did not give any information about the error.

I had deployed the stored procedures to a server running SQL Server Standard Edition.  There is a bug that affectes OpenXML on this edition only.  The KB article and work around can be found here.  I changed the code as the KB suggested, and all worked well.

It figures, doesn't it?  The first time I don't use SQL Enterprise, I write code that exposes a SQL Standard-specific bug?

Don't forget the <xsl:output encoding=""/> when creating text files!

I've written many XSLTs that create formatted text files.  I don't do this very often, and there are usually many months between each occurrence.  So, I often find that the first three characters in the resulting file are the characters 0xEF, 0xBB, and 0xBF.  (These apparently indicate that UTF8 encoding is being used.)  Because these characters do not show up in normal text editors, I don't always notice that they are there.  When I look at the file with a hex editor, however, BINGO!.  There they are.

As you can imagine, this causes problems for some of my clients that do not expect these characters.

To solve the problem, I place <xsl:output indent="no" method="text" encoding="ISO-8859-1"/> in the XSLT.  This removes the encoding characters from the resulting text file.  Now my clients are happy, and so am I.

Note that this works in my situation, but that you may need a different encoding.  Just don't forget the <xsl:output encoding=""/>!

Embeddng XML in an XSLT in .NET

I have a XSLT that needs to reference a table of data, such as states.  This data is very static, so it didn't seem to be worth the trouble to store this data in a table and pass it in as a node set.  (See one my previous posts for an example of this.)  I decided to embed this data as an XML “data island” in my XSLT and reference it via an XSLT variable.  This is pretty standard stuff, but I couldn't get it to work. 

I consulted my copy of XSLT and XPath On The Edge, by Jeni Tennison, and found that I was doing everything correctly.  It still didn't work, so what was wrong?

.NET was the problem.  In the 1.1 version of the framework, the transform method has a fourth argument, which is for a resolver.  I've always passed null (or nothing) here, and everything worked well.  When using the XSLT document() function, however, the resolver becomes important.  Without a resolver, the document function is ignored.  Examine the working code:

XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:st="http://www.mysite.com/States">
     
<xsl:output method="xml" indent="yes" />

      <st:States>
           
<st:State Code="AL" Name="Alabama"/>
           
<st:State Code="MS" Name="Mississippi"/>
           
<st:State Code="TN" Name="Tennessee"/>
     
</st:States>

      <xsl:variable name="VisaCodes" select="document('')//st:States/st:State"/>
</
xsl:stylesheet>

.NET
myTransform.Transform(myXPathDoc, myArgumentList, OutputFile, new XmlUrlResolver());

Pass an XML node set to XSLT from .NET

I recently needed to pass some XML into my XSLT before performing a transform.  Normally, this isn't very useful, because the XML gets converted into a string, not a node set.  However, .NET provides a way around this problem.  Because it took quite a while to discover the method, I thought I would note it here.

.NET Snippet
// create an XsltArgumentList to contain the node set
myArgumentList = new XsltArgumentList();

// load the nodes
NodeDoc = new XmlDocument();
NodeDoc.LoadXml(<Place XML here>);

// create a navigator, so that an iterator can be created
NodeNav = NodeDoc.CreateNavigator();
NodeItr = NodeNav.Select("/Parent/Child");

// populate argument list
myArgumentList.AddParam("NodeInfo", "", NodeItr);

// transform the XML directly to the file
myTransform.Transform(myXPathDoc, myArgumentList, OutputFile, new XmlUrlResolver());

XSLT Snippet
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vc="http://www.me.com/Codes">
<xsl:param name="NodeInfo" select="/.."/>

<xsl:template match="/">
     
<xsl:for-each select="$NodeInfo">
            .
..
      </xsl:for-each>
</xsl:stylesheet>

XPath Performance in .NET (Revisited)

In a previous post, I mentioned the performance increase that can be had when using XPath queries in .NET.  As I said then, you need to use an XPathDocument instead of an XmlDocument.  Well, being the hard-headed person that I am, I had to learn this the hard way, AGAIN.

I needed to process an XML document that contains hundreds of nodes.  These nodes themselves are quite large.  (My test document was 11MB, but in real life, the documents could reach 80MB.)  I apparently lost my mind, and wrote some code similar to this:  (code changed to protect my client!)

ReturnValue = false;
foreach
(XmlNode Node in _SingleNodeList)
{
     if (<PSEUDOCODE: node contains required data?>)
     {
          ReturnValue = true;
          break;
     }
}


During testing, I found that processing my one test document required 22 minutes.  WAY TOO LONG!  I went home and slept on the problem.

The next day, I wrote something like this:

Result = XPathNav.Evaluate(count(<PSEUDOCODE: XPath expression that returns matching nodes>));
NodeCount = (double)Result;
ReturnValue = (NodeCount > 0);

I ran the test again.  It took at total of 12 seconds.  That's right, I reduced the processing time from 22 minutes to 12 seconds simply by using XPath queries properly instead of iterating over an XMLDocument.  When I ran the REAL test, processing 2,029 files, it took 54 minutes.  Looks like I learned a lesson again.  Hopefully, this will save someone else the time that I wasted.

Custom XML Serialization

I recently created an application that needed a fairly deep class hierarchy.  One class contained another class, which contained another, and so on.  I needed to find an easy way to serialize this object and store it in a database.  I found a post on Jeff Kirwan's blog that showed how to serialize an object to XML, which could then be placed into a database.  This seemed to be an easy solution to my problem....

Then I got into trouble.  I needed to store the class data in relational form in the database, so that some of this data could be referenced by other stored procedures.  (The class hierarchy actually represents configuration data for an application.)  This is easy to do with SQL via OPENXML.  (See my Memphis .NET User's Group presentation.)  However, pulling this data out of SQL proved to be a problem.  The class structure was DEEP.  It took a LOT of SQL code to rebuild the XML, so that the object could be deserialized.  I had to find a better way.

I turned to the ISerializable interface.  By implementing this interface, you can control how your class is serialized.  Cool!  This was the solution to my problem, right?  Wrong!  The ISerializable interface does not work when you use the XMLSerialier.  I wanted XML and the ISerializable interface gave me a binary serialization. 

After thinking a few unmentionable thoughts about the world in general, I finally found a real solution.  There is an undocumented interface called IXmlSerializable.  If you look this up in the .NET help, you will see that they say this is an internal interface and that you should not use it.  However, if you look here, you can see that this interface IS documentated in Longhorn.  Feeling a little better now, I tried it.  It works great.

Serialization Methods:


XmlSchema IXmlSerializable.GetSchema()
{
     return null;
}

void IXmlSerializable.ReadXml(XmlReader Reader)
{
     // code to read in XML document goes here
}

void IXmlSerializable.WriteXml(XmlWriter Writer)
{
     // code to write XML document goes here
}

Now I can have a deep class structure in my code, and serialize this as a very flat XML document.  Then, instructing SQL Server to create this XML document from the database is very simple and easy to maintain.

Add IntelliSense to your XSLT files in .NET

Have you tried creating and editing XSLT files in .NET?  If so, you know that the XML editor doesn't provide much in the way of help.  Fesersoft comes to the rescue!  The have developed an XSLT schema, so that .NET will provide IntelliSense for XSLT files.  Check out their .NET page and see for yourself.

Way to go Fesersoft!

XPath Performance in .NET

I continually hear people complaining about the performance of XPath queries in .NET, so I thought I would address it here.  Most of the time, the problem is that people are using the wrong classes.  (You know, with 6,000+ classes in .NET, I am always afraid that I will spend hours coding something just to find out that .NET provides it for free.)  So, the trick here is to use the correct class. Don't perform XPath queries against an XMLDocument or an XMLDataDocument. Use an XPathDocument instead.

Data Layer Sample:

' create a command object to run the stored procedure
Command = New SqlCommand("uspEmployeeListXML",Connection)
Command.CommandType = CommandType.StoredProcedure

' open the connection
Connection.Open()

' create xml read to acces the xml
XmlRdr = Command.ExecuteXmlReader()

' move to the first content node,
' bypassing any schema, comments, etc.
XmlRdr.MoveToContent()

' populate xml XPath document
XPathDoc = New XPathDocument(XmlRdr)

' return the document
Return XPathDoc

ASP.NET Code-behind Sample:

' obtain an XML XPath Document
myXPathDoc = Data.GetXPathDoc

' create a Transform object to represent the XSLT stylesheet
myTransform = New XslTransform()

' load the stylesheet into the object
myTransform.Load(Server.MapPath("~/ManualXSLT.xslt"))

' transform the XML directly to the page
myTransform.Transform(myXPathDoc,Nothing, Response.Output, Nothing)

In one of my tests, an HTML page that took 30 seconds to render with an XMLDocument took 3 seconds with an XPathDocument.

For more examples of this kind of code, refer to my presentation entitled “XSLT & XPath”, found on the Memphis .NET Users Group Presentation page.