Wednesday, April 13, 2011

XSL remove empty nodes

Below are two neat XSL snippets that I wrote a few years back that remove empty nodes in a transformation.

Sometimes, XML payloads may contain tens or even hundreds of NULL elements, such as the following example:
<Person>
  <FirstName>Ahmed</FirstName>
  <MiddleName/>
  <LastName>Aboulnaga</LastName>
  <CompanyInfo>
    <CompanyName>IPN Web</CompanyName>
    <Title/>
    <Department>
      <DepartmentId/>
      <DepartmentName/>
      <DepartmentLocation/>
    </Department>
  </CompanyInfo>
</Person>

For performance reasons, it may be best just to remove these NULL elements by removing these empty nodes, resulting in a payload that is much smaller in size such as:
<Person>
  <FirstName>Ahmed</FirstName>
  <LastName>Aboulnaga</LastName>
  <CompanyInfo>
    <CompanyName>IPN Web</CompanyName>
  </CompanyInfo>
</Person>

In some cases, the cost savings is huge. For example, the first payload is 279 bytes while the second one, after removing empty nodes, is 148 bytes. This resulted in a 47% decrease in message size. In a high volume environment, that's 47% less memory needed, 47% less network bandwidth needed, and 47% less database space needed.

In regards to the snippets below, either add them as a new transformations in your SOA code, or include them in an existing one. I suggest you trying them out yourselves and comparing to see which one works best for you.

This first one works great, but removes all empty nodes up the root element. This has caused issues in the case where there is no data in the payload at all. But try it to see if it suits your needs, as it has worked great for us in the past:
<?xml version = '1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:dvm="http://www.oracle.com/XSL/Transform/java/oracle.tip.dvm.LookupValue">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="@*|node()">
    <xsl:if test=". != '' or ./@* != ''">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>
This was later updated by a client of mine to keep at least the root element of every tree to avoid the scenario above. Since it is more cautious in nature, it does not remove every NULL element, but avoids the single pitfall in the transformation above:
<?xml version = '1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:dvm="http://www.oracle.com/XSL/Transform/java/oracle.tip.dvm.LookupValue">

  <xsl:output method="xml" indent="yes" />
    <xsl:template match="@*|node()">
      <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
    </xsl:template>
  <xsl:template match="@*[.='']"/>
  <xsl:template match="*[not(node())]"/>
</xsl:stylesheet>

Ahmed Aboulnaga

3 comments:

AndreasT said...

Very useful tip, thanks.
In my XML file it works very well, but for one circumstance.

Consider the following structure in the source:

<CopyDest>
<CopyDestRole>
<MsgType V="H" DN="Some data" />
</CopyDestRole>
<!-- more elements -->
</CopyDest>

Applying your test transforms this into:

<CopyDest>
<CopyDestRole />
<!-- more elements -->
</CopyDest>

The Msgtype element is removed even though it contains attributes. (Because it is further down in a hierarchy of otherwise empty elements?)

Changing your test to:
test=". != '' or .//@* != ''"
solves this problem and only one remaining situation needs to be handled and here is where I have problems.

Consider this structure in the source where everything is empty:

...
<Dept>
<Name />
<Id />
<TypeId V="" DN="" />
</Dept>
...

Applying the test transforms this into:

<Dept />

What I am looking for is a way to get rid of this last empty element.
Any ideas?

Anonymous said...

AndreasT thanks to your reply to the post you quickly ended a day of beating the head against the wall for me.

As for your remaining problem, I do not experience the same issue. Using your scenario everything is removed leaving it empty. Using the first set of xsl code above and your suggested modification to the test clause.

Anonymous said...

2 Andreas:

This will remove all empty attributes and nodes which do not have any text or subnodes with non-empty attributes