Thursday, August 28, 2008

Compare two XML strings in Python

I had to compare two XML strings for some unit tests, and if you want to do it without considering the indentation, or the newlines, it is a little bit tricky.

I thought that parsing the original xml and returning it again (using minidom), I'd got a raw string without any meaningless space, or any newline, but actually it returned the original string. Using toprettyxml() method also returns a trivial result, based on the original string (even when you specify the indent and the newline characters).

So the best way I've found by now is to write a custom function that returns what I want, an XML string without any trivial character between tag and tag. Here you have the code:

def raw_xml(xml_str):
    from xml.dom import minidom
    xml = minidom.parseString(xml_str)
    return u''.join([unicode(line).strip() for line in xml.toprettyxml().splitlines()])

2 comments:

  1. and can we convert the return values to string and do normal string compare?

    ReplyDelete