31 August 2013

Introducing Xml Specification Compare and Bonus XPath Discovery

Introducing Xml Specification Compare and Bonus XPath Discovery

A while back I faced the problem of unit testing code that generated XML according to some specification. Using XNode.DeepEquals method was not possible as it is too restrictive; it requires all nodes (and other parts of the XML) to be the same, and in the same order.
The XML documents used as messages or configuration in many specifications do not require this rigour; XML documents may be considered equal even if they differ in order of siblings, namespace prefixes, comments, white space, etc..

I confess I somehow managed to overlook Microsoft's XML Diff and Patch Tool when I first looked for a solution to this problem.
As it was I solved my unit testing needs with a quick and dirty function that would generate "normalized" XML documents by sorting and ignoring nodes and other relevant parts of the XML and deep comparing the normalized result.

While this solution was enough for my needs it was not a full solution and I resolved to write a true XML comparer suitable for specifications.
It was only after I had completed most of the coding for this comparer that I discovered the aforementioned XML Diff and Patch Tool. The results of my efforts is the XML Specification Compare library (also available as Nuget packages).

Performance wise my implementation and Microsoft's XML Diff and Patch Tool are roughly the same. My tests show an 8% speed advantage to XML Specification Compare when comparing equal files. I cannot say, however, whether this is a consistent advantage or just a construct of the test documents I selected. When comparing unequal documents XML Specification Compare has a greater speed advantage, but this is to be expected since Microsoft's XML Diff and Patch Tool generates a diff document while XML Specification Compare stops on the first node it cannot match. There are still some advantages to my implementation:

  • It's open source and available as source to embed in your project.
  • It ignores placement of namespace definitions (xmlns attributes).
    The following documents are considered equal in XML Specification Compare, but are, erroneously in my opinion reported as different by Microsoft's XML Diff and Patch Tool (with all ignore options set except for ignore namespace):

    <?xml version="1.0" encoding="UTF-8"?>
    <root xmlns="http://default.com">
       <a:elem xmlns:a="http://a.com"/>
    </root>
    <?xml version="1.0" encoding="UTF-8"?>
    <root xmlns:a="http://a.com" xmlns="http://default.com">
       <a:elem />
    </root>

The downside of XML Specification Compare is that it currently lacks any option to turn off ignores, while Microsoft's XML Diff and Patch Tool allows you to specify which features of the XML document to ignore. I will probably implement some of the configuration options in the future, (an option to respect siblings order in particular).

XPath Discovery - Bonus Implementation

Xml Specification Compare has classes for integration with NUnit. When an assertion failed I wanted to provide a useful error message with a simple XPath to the node that could not be matched.

I tried to find an existing solution, but to date (then and now) I have not found code that would provide a nice XPath to elements, attributes or text nodes. So I've implemented this as well.

The XPath discovery library supports the following Linq to XML classes: XElement, XAttribute, XText, XCData and XComment.
Given an instance of any of this classes it will generate a simple XPath of the form: /elemA[i]/elemb[j]/.../(elemN[u]|@attr|text()|comment()).

The XPath discovery library is available in code only as a Nuget package.