Tech Blog

Aggregating messages and removing duplicates in a BizTalk Map

Aggregating messages is a fairly common task in BizTalk.
By “aggregating” I mean taking two separate messages with repeating elements
and combining them into a new message which contains the elements of both messages
– the same as doing a Union in SQL.

However, what if you want to remove duplicates?

It’s not as easy as it seems, and in truth the only way I have found to do this is
via custom XSLT.

Combining two messages

This is actually fairly easy: you use a single Looping functoid, with two inputs
and one output.
You can then either link the elements, or use the Mass Copy functoid to copy
the element data across:
Standard BizTalk Aggregation Map which allows duplicates (Click to enlarge)

So if I had these two messages:

Message 1:
<ns0:Employees >=http://TestIndexMap.Employees>

  <Employee firstName=Karin lastName=Smith dept=Managers empNumber=100 />

  <Employee firstName=Daniel lastName=Smith dept=Staff empNumber=101 />

</ns0:Employees>

Message 2:
<ns0:Employees >=http://TestIndexMap.Employees>

  <Employee firstName=Heidi lastName=Klum dept=Models empNumber=200 />

  <Employee firstName=Elle lastName=MacPherson dept=Models empNumber=201 />

  <Employee firstName=Daniel lastName=Smith dept=Staff empNumber=101 />

  <Employee firstName=Naomi lastName=Campbell dept=Models empNumber=203 />

</ns0:Employees>

I would end up with this message:
<ns0:Employees >=http://TestIndexMap.Employees>

  <Employee firstName=Karin lastName=Smith dept=Managers empNumber=100 />

  <Employee firstName=Daniel lastName=Smith dept=Staff empNumber=101 />

  <Employee firstName=Heidi lastName=Klum dept=Models empNumber=200 />

  <Employee firstName=Elle lastName=MacPherson dept=Models empNumber=201 />

  <Employee firstName=Daniel lastName=Smith dept=Staff empNumber=101 />

  <Employee firstName=Naomi lastName=Campbell dept=Models empNumber=203 />

</ns0:Employees>

Note that the Employee with empNumber 101
is repeated.

Removing Duplicates

What if I wanted to remove duplicates from the messages?
i.e. instead of the above Combined Message, suppose that I wanted this:
<ns0:Employees >=http://TestIndexMap.Employees>

  <Employee firstName=Karin lastName=Smith dept=Managers empNumber=100 />

  <Employee firstName=Daniel lastName=Smith dept=Staff empNumber=101 />

  <Employee firstName=Heidi lastName=Klum dept=Models empNumber=200 />

  <Employee firstName=Elle lastName=MacPherson dept=Models empNumber=201 />

  <Employee firstName=Naomi lastName=Campbell dept=Models empNumber=203 />

</ns0:Employees>

When you use the Looping Functoid with two inputs, you will end up with two separate <xsl:for-each> loops
in the XSLT.
The XSLT for the above map looks like this:
<?xml version=1.0 encoding=UTF-16?>

<xsl:stylesheet >=http://www.w3.org/1999/XSL/Transform >=urn:schemas-microsoft-com:xslt >=http://schemas.microsoft.com/BizTalk/2003/var exclude-result-prefixes=msxsl
var s0
version=1.0 >=http://schemas.microsoft.com/BizTalk/2003/aggschema >=http://TestIndexMap.Employees>


  <xsl:output omit-xml-declaration=yes method=xml version=1.0 />

  <xsl:template match=/>

    <xsl:apply-templates select=/s0:Root />

  </xsl:template>

  <xsl:template match=/s0:Root>

    <ns0:Employees>

      <xsl:for-each select=InputMessagePart_0/ns0:Employees/Employee>

        <Employee>

          <xsl:copy-of select=./@* />

          <xsl:copy-of select=./* />

        </Employee>

      </xsl:for-each>

      <xsl:for-each select=InputMessagePart_1/ns0:Employees/Employee>

        <Employee>

          <xsl:copy-of select=./@* />

          <xsl:copy-of select=./* />

        </Employee>

      </xsl:for-each>

    </ns0:Employees>

  </xsl:template>

</xsl:stylesheet>

What you need to do is put some sort of condition over one of the loops that says
“only copy the current item if it doesn’t exist in the other message”.

It’s this “if it doesn’t exist in the other message” that can be tricky.
If you use an XPath statement, then you incur the penalty of a full document scan
each time you iterate through the loop.
Depending on the size of your messages, this can be costly.

The best way of doing it would be to build an index of unique IDs (i.e. primary key
values!) you can use to check if the item exists.
Luckily, there’s a dedicated XSLT function for this: the <xsl:key> element.

This builds an index of items which you can search.
And it’s very very fast.
You can read more about it here and here

(Unfortunately, there’s no functoid for this element).

Expanding the above sample, we will use the empNumber attribute
as our unique ID.
The pseudo code for the map will be:

  1. Build an index of all the empNumber attributes
    in Message 1
  2. Loop through all the items in Message 1, and copy them all to the Combined Message
  3. Loop through the items in Message 2: if there is no empNumber in
    our index which matches the current empNumber,
    then copy the item across

The XSLT for this is:
(note: in order to get the base XSLT, I created the above map with a single
looping functoid and two mass copy functoids, and exported the XSLT using the Validate
Map
command. I then modified this XSLT file, and set the Custom XSL Path property
on the map to point to my modified file)
<?xml version=1.0 ?>

<xsl:stylesheet >=http://www.w3.org/1999/XSL/Transform >=urn:schemas-microsoft-com:xslt >=http://schemas.microsoft.com/BizTalk/2003/var exclude-result-prefixes=msxsl
var s0
version=1.0 >=http://schemas.microsoft.com/BizTalk/2003/aggschema >=http://TestIndexMap.Employees>


  <xsl:output omit-xml-declaration=yes method=xml version=1.0 />

  <!– This
next line generates an index of the empNumber values in the first message
–>

  <xsl:key name=duplicates match=InputMessagePart_0/ns0:Employees/Employee use=@empNumber/>

  <xsl:template match=/>

    <xsl:apply-templates select=/s0:Root />

  </xsl:template>

  <xsl:template match=/s0:Root>

    <ns0:Employees>

      <!– Loop
through the Employee elements in the first message
–>

      <xsl:for-each select=InputMessagePart_0/ns0:Employees/Employee>

        <Employee>

          <!– Copy
across all elements and attributes in this element
–>

          <xsl:copy-of select=./@* />

          <xsl:copy-of select=./* />

        </Employee>

      </xsl:for-each>

      <!– Loop
through the Employee elements in the second message
–>

      <xsl:for-each select=InputMessagePart_1/ns0:Employees/Employee>

              <!– We
query the index to see if there is an Employee element with


              this
empNumber value in the first message.


              If
not, then we copy across this Employee element
–>

              <xsl:if test=count(key(‘duplicates’,
@empNumber)) = 0
>


                 <Employee>

                     <!– Copy
across all elements and attributes in this element
–>

                     <xsl:copy-of select=./@* />

                     <xsl:copy-of select=./* />

                 </Employee>

        </xsl:if>

      </xsl:for-each>

    </ns0:Employees>

  </xsl:template>

</xsl:stylesheet>

The <xsl:key> element
takes three parameters:
<xsl:key name=””
match=”” use=””>

name is a unique name for this index (can be anything you want)
match is the XPath to the element you want to create an index of (and is relative
to the input message)
use is the XPath to a value on the element (in match) that you want to search
on.

(match and use are actually more powerful than I’ve described, but that’s
beyond the scope of this post – see the links above for further reading on what you
can do with these parameters).

So in my example, match points to the Employee element
(i.e. I will create an index of Employee elements), and use is the empNumber attribute
(as this is what I want to search on).
In fact, the <xsl:key> element
is very powerful, and you can use it to create quite complicated indexes.

To perform a lookup in the index, you use the key() function.
This function takes two parameters: key
(name, value)


name is the name of the index to use
value is the value to lookup in the index (i.e. the value referred to in the use parameter)

I’ve put together a sample solution which shows how this works.
You can download it here:    TestAggregateMaps
Solution.zip (25.78 KB)


>

Back to Tech Blog