How to validate Xml Documents against schemas in BizTalk

I got asked a question the other day: How would you validate an incoming message
against a schema if the message was the request part of a request-response pair and
you wanted to return a response if the request wasn’t valid?

In the example given, an orchestration had been exposed as a web service, and the
requirement was to validate the incoming message. If the message did not validate
they wanted to return a response message with an error message in it.

I gave two of the ways I would do it, but that wasn’t what they were expecting: they
were expecting the simplest (and computationally slowest) way of doing it. And I realised
that many people use this mechanism as they don’t know there’s any other way.

Why do I say this? I’ll explain as I give my solutions.
First of all: The solution that was expected was to use an orchestration to do the
validation – as the person explained to me, that was the only way to get the response
message back to the same “connection” i.e. have it go back out as a response to the
matching request.
As you’ll see this is not true.
In this post I’ll cover the ways to do validation.
In the next post, I’ll cover how you correlate the response back to the client who
is waiting for a response.

Let me say one thing: BizTalk is not magic. There is no magic (thanks Nakor).
There’s simply some COM+ applications, some .NET assemblies, instances of a Windows
Service, some database tables… and a lot of unmanaged code.
What gets confusing are all the concepts layered on top of this – BizTalk does its
best to “hide” what’s really going on from you, and unfortunately a lot of BizTalk
developers don’t dig any deeper than that.

Schema Validation and SOA

When you create a web service, you are explicitly defining a contract between that
service and a client of that service. There are many parts to that interface, but
to keep things simple, I’m only interested in the schema part of it – i.e. what message
does the interface accept as input, and what message does it return as output. In
a doc/lit world, it should be one XML message in, one XML message out.

Options for validating Schemas

1. Validating at the End Point

So where’s the best place to validate your schemas? At the end point.
That is, in the Web Service ASMX code itself.
More importantly, if the incoming message isn’t valid then you should raise a SOAP
fault – you shouldn’t return an error message. To me, this is a fundamental tenet
of good SOA design.

Think about what happens if you call a method in a class.
Say the method signature was:
string DoSomething(string number)

Assume that this method expects a number passed in as a string, and returns
some information about that number (I’ll gloss over why you’d ever have a method like
this!).

If you pass it “fred” (instead of “123”) you’d expect the method to throw an exception
– not to return you a string with a message saying an error had occurred.
Why should a Web Service be any different?

Why go to all the trouble of rolling your own message schema for dealing with invalid
messages when you have a system already for returning detailed error information:
SOAP Faults.
Additionally, when you’re using BizTalk why would you knowingly allow an invalid message
into BizTalk? You wouldn’t allow a stranger into your home whilst you checked their
credentials would you? Why waste processor cycles on the BizTalk server (and trips
to the MessageBox) dealing with a message it can’t process?

[If you want a hassle-free way of validating
messages in the Web Service, look at the sample code I posted in this post: Validating
Schemas in Web Methods using Attributes

It provides for a way of decorating a WebMethod with an attribute which does all the
validation work for you, so no code needs to be placed in your methods.
Additionally, it explains the problem with using auto-generated schemas in your WSDL
(which is what happens when you use the Web Services Publishing Wizard in BizTalk).

Aside 1: I have to add that you also need to question the wisdom of validating a schema.
You can never guarantee that a message is valid. It might pass schema validation and
still be invalid. Unfortunately, XML Schema Definitions don’t allow for a completely
unambiguous specification of a message – you have to accept this when you choose to
use XSDs and therefore understand the complexities they can add.]

2. Validating in the Pipeline

This is probably the most common way of dealing with things.
You create a custom receive pipeline, with both the XmlDisassembler and XmlValidator components
in it, and you set the “Validate document structure” to true on the XmlDisassembler.
(It’s important to know the difference between the two here: the XmlDisassembler will
validate the document structure, the XmlValidator will (additionally) validate
any restrictions specified in the schema).
Note: for a send pipeline, you can just use the XmlValidator, and additionally
the XmlAssembler if you wish to demote Context Properties into your sent message.

What happens if the document doesn’t validate?
In BizTalk 2004, an exception would be thrown in the pipeline – if you didn’t handle
this then the message would be lost, and the only way you’d know something had gone
wrong is when your client timed out (if the process started from a Web Service call),
and you found an entry in the event log.
To get around this we ended up using components like Stephane Bouillon’s EnhancedValidator,
which would wrap the XmlValidator, catch the validation exception and generate a new
message which was placed in the MessageBox. We could then write an orchestration or
send port which could process this message

In BizTalk 2006, if you turn off Failed Message Routing then you get the 2004 behaviour.
If you turn on Failed Message Routing, then an Error Report is created by demoting
certain Context Properties promoting some new Context Properties to indicate that
the message is now an Error Report and dropping this message in the MessageBox (it’s
important to realise that an Error Report is not a new message – it’s the received
message with some additional Context properties).

3. Validating in an Orchestration

This is the option that people seem to go for when they want to send a response back
to a waiting Web Service client, as it’s the easiest way of doing this.
For validating in an orchestration, you have to write the validation code in a C#
class, and call that class from within your orchestration (examples of how to validate
an XML instance against a schema in C# can be found here).
It’s interesting to note that if you use a Transform in your orchestration, this will
also cause the schema to be semi-validated – under the covers, BizTalk is using XslTransform and XPathDocument classes,
which need valid XML to work correctly. However, this might be a bit late to discover
that your message is invalid.

Personally, I’ve never seen the point of doing this validation in an orchestration
– why write code when it’s written for you already in the XmlValidator component??
😉

If you’ve found another mechanism for validating instances (or I’ve missed one) please
let me know in the comments, or using the mail icon at the bottom of the page.

The next post will cover how to respond when the request part of a request/response
pair of messages is invalid – how do you send a response back to the client.

Previous Post Back to Tech Blog Next Post

Tech Blog

How to validate Xml Documents against schemas in BizTalk

Categories