31 August 2013

Introducing Xml Specification Compare and Bonus XPath Discovery

Introducing Xml Specification Compare and Bonus XPath Discovery

A while back I faced the problem of unit testing code that generated XML according to some specification. Using XNode.DeepEquals method was not possible as it is too restrictive; it requires all nodes (and other parts of the XML) to be the same, and in the same order.
The XML documents used as messages or configuration in many specifications do not require this rigour; XML documents may be considered equal even if they differ in order of siblings, namespace prefixes, comments, white space, etc..

I confess I somehow managed to overlook Microsoft's XML Diff and Patch Tool when I first looked for a solution to this problem.
As it was I solved my unit testing needs with a quick and dirty function that would generate "normalized" XML documents by sorting and ignoring nodes and other relevant parts of the XML and deep comparing the normalized result.

While this solution was enough for my needs it was not a full solution and I resolved to write a true XML comparer suitable for specifications.
It was only after I had completed most of the coding for this comparer that I discovered the aforementioned XML Diff and Patch Tool. The results of my efforts is the XML Specification Compare library (also available as Nuget packages).

Performance wise my implementation and Microsoft's XML Diff and Patch Tool are roughly the same. My tests show an 8% speed advantage to XML Specification Compare when comparing equal files. I cannot say, however, whether this is a consistent advantage or just a construct of the test documents I selected. When comparing unequal documents XML Specification Compare has a greater speed advantage, but this is to be expected since Microsoft's XML Diff and Patch Tool generates a diff document while XML Specification Compare stops on the first node it cannot match. There are still some advantages to my implementation:

  • It's open source and available as source to embed in your project.
  • It ignores placement of namespace definitions (xmlns attributes).
    The following documents are considered equal in XML Specification Compare, but are, erroneously in my opinion reported as different by Microsoft's XML Diff and Patch Tool (with all ignore options set except for ignore namespace):

    <?xml version="1.0" encoding="UTF-8"?>
    <root xmlns="http://default.com">
       <a:elem xmlns:a="http://a.com"/>
    </root>
    <?xml version="1.0" encoding="UTF-8"?>
    <root xmlns:a="http://a.com" xmlns="http://default.com">
       <a:elem />
    </root>

The downside of XML Specification Compare is that it currently lacks any option to turn off ignores, while Microsoft's XML Diff and Patch Tool allows you to specify which features of the XML document to ignore. I will probably implement some of the configuration options in the future, (an option to respect siblings order in particular).

XPath Discovery - Bonus Implementation

Xml Specification Compare has classes for integration with NUnit. When an assertion failed I wanted to provide a useful error message with a simple XPath to the node that could not be matched.

I tried to find an existing solution, but to date (then and now) I have not found code that would provide a nice XPath to elements, attributes or text nodes. So I've implemented this as well.

The XPath discovery library supports the following Linq to XML classes: XElement, XAttribute, XText, XCData and XComment.
Given an instance of any of this classes it will generate a simple XPath of the form: /elemA[i]/elemb[j]/.../(elemN[u]|@attr|text()|comment()).

The XPath discovery library is available in code only as a Nuget package.

22 February 2013

Software Practices and Fairy Tales

Let me tell you a fairy tale

Once upon a time, not so long ago and not really that far away there was a Manager.
He was a good Manager, well liked by his subordinates and appreciated by his superiors, and he liked his job. But lately he had been charged with a new type of Project a Software Project.
This caused the Manager much anguish, for Software Projects where difficult beasts to master. They always seemed to take longer than expected, even when taking into account that they would take longer than expected, and even when completed the client never got what he actually wanted.
So the Manager went on top of a tall office building to gaze at the city and reflect; and lo and behold he saw a skyscraper being built, hundreds of people working in harmony toward one end.
"Why," he thought "can architects build a glorious skyscraper while I struggle to produce simple programs?"
And he was enlightened.
Architects, after all, did not gather together hundreds of workers and tell them, I want you to build me a skyscraper. There were months even years of preparation before even a single stone was moved. The same should be true for software:
First business analysts would analyse the problem to be solved along with the client, until a scope and solution was agreed upon.
Then software architects would create a model (a blue print) for the application that would implement the solution, and only then would programmers start coding.
Finally after testing everything is as it should the application would be deployed.
As so it was, and from that day forward software projects where always on time, and the Manager lived stress free ever after.

The Waterfall Model Was Actually Successful

The tale above describes the software development life cycle Waterfall methodology. It is now, in the age of the Internet and Agile methodologies, vogue to look down upon the Waterfall model as out of touch with reality and antiquated.

Some detractors of the Waterfall method claim it aims to turn software development into an assembly line where programmers mindlessly follow the architect blue print to churn out applications. In their view the Waterfall method fails because it fails to recognize software development is a creative and complex process, and any "blue print" no matter how carefully designed is going to differ to some degree from the actual implementation. This is a great disservice to the pioneers of software project management.
Waterfall is modeled after civil engineering and manufacturing, but it's goal is not to treat software development as an assembly line; rather it aims to impose the same rigor necessary to bring new products to market.
Building a skyscraper after all is not just a bunch of builders following an architect's master plan; behind any such enterprise there's an army of people whose job is to deal with all the inconsistencies between the plan and reality, and still deliver a skyscraper (hopefully without going over budget too much.)
The failure of the Waterfall and related methodologies in the Internet age is due to a change in circumstances. For projects running in Internet time it is more important to be able to deliver incremental functionality fast and cope with constantly changing requirements, than to predict the resources and time needed to deliver the ultimate product.
To claim any non Agile methodology is pure useless bureaucracy, imposed by control obsessed managers on programmers without regard to how software is really developed, is going too far.

Everything Has a Faerie Tale

My intention in this article is not to praise or condemn the Waterfall model or any other methodology since, but rather to address the "Fairy Tale" description that invariably accompanies methodologies and processes.

Software development practices are always evolving, and any new practice, whether it is a new design concept, methodology or even a new library is first described in using a "Fairy Tale".
Let's take ORMs for instance, according to the Fairy Tale description an ORM library makes writing applications using a relational database easy by mapping object oriented models into relational schemas in a database; but if they are so good how come untold hours are wasted trying to make the ORM do things which using raw database access and SQL would be trivial. Does this mean the ORM promise of data-oriented applications without SQL a, er, fairy tale? Well, in this case probably or maybe not object relational mapping is a complex problem.

When first analyzing requirements and organizing them as user stories or use cases, it is customary to start with the most common scenario without problems; the Main Success Scenario (MSS) in use case parlance. How to deal with all the problems and edge cases is left for later user stories; so it is not surprising "Fairy Tale" descriptions are used.
The thing to remember is they are just that, fairy tales, the best possible outcome when there are no incompatibilities. There are no panaceas in software development, practices may deliver on their Fairy Tale, but only if you take the time to learn how to apply them, when to apply them and when you are better off without them.