Background: The large(ish) BizTalk system I’ve been working on for the last few months consumes a few external Web Services. Whilst doing some early performance tuning, I noticed that I was getting pretty serious request-response latency from the primary BizTalk Web Service… Like over 45 secs, instead of the expected 0.5 – 2 secs…
Fortunately, I routinely put tracing/audit calls in my BizTalk code which includes timing info – and I always log the times around orchestration entry/exit… So I was quite surprised to see that a request to one of the external Web Services was taking 15secs to respond… especially as the web service was hosted (temporarily) on the same server…
Although I didn’t have the source code, with the help of Lutz Roeder’s trusty .NET Reflector I was poking around the innards within minutes… and couldn’t find anything that would cause a delay of 15 secs.
What’s more, after further testing I found that the problem only occurred the first time after IIS was restarted. After the first time, it would return within 2 secs (still not very good, but much better). One thing I noticed about the Web Service was that it made use of some code in an unmanaged assembly (old school C++ DLL)…
On a hunch, I wrote a simple test Web Service which simply referenced some functions in the DLL (using the DllImport attribute)… and bang, a 15 sec delay before the Web Service constructor was called.
I also tried the same trick in a console app… and there was no delay.
By this time I was fairly stumped. This issue only happened on our Load Test server: it didn’t happen on the Dev or Test servers. And it seemed to only happen to code hosted under IIS.
I figured it was time to dig a bit deeper and see what IIS was doing: starting up Process Monitor (to monitor file and registry access) and TCP View (to monitor network activity), I called my test Web Service.
Filtering the Process Monitor logs to the w3wp.exe process (the IIS Worker Process for Application Pools) I noticed that w3wp was reading a lot of security and cryptography keys from the registry:
Then I looked at the TCP View log. And I noticed something very interesting coming from the w3wp process: a connection was being attempted to http://crl.versign.com and was being blocked.
How do I know it was being blocked? Well, the TCP View log showed a SYN_SENT, which appeared and was visible for 15 secs before disappearing:
Now if you’re not familiar with how TCP works, then you might miss the significance of this: When a TCP client attempts to contact a TCP server, the first thing performed is a handshake. This involves the client sending a SYN, the server responding with a SYN-ACK, and the client sending a SYN-ACK-ACK.
In the TCP View tool, the rightmost column indicates the state of the local TCP connection end-point. You’ll see that in this case, the state is SYN_SENT: this state indicates that the local TCP client is waiting for a SYN-ACK response from the SYN it sent.
Why would this be? Well, many reasons, but mostly because the address can’t be reached e.g. a proxy server is required, or a firewall rule is blocking the call.
And what’s the relevance of crl.verisign.com? A CRL (Certificate Revocation List) is used to indicate which Certificates (issued by a Certificate Authority) are no longer valid.
And this is only done when you are using an assembly which has been digitally signed.
(Note: Microsoft became really really serious about CRL’s back in 2001 when VeriSign accidentally released two Microsoft certificates to a third party – which meant that the third party could sign code that appeared to have come from Microsoft).
So what does this have to do with my Web Service and a 15 second delay?
Looking through the assemblies in the Web Service, I noticed that the unmanaged assembly was digitally signed. Aha! So when the unmanaged assembly was referenced in a Web Service, IIS would hand off control to the Windows Cryptography API (CAPI) and ask it to verify the certificate. And this involves retrieving a CRL for the certificate.
The thing that drew my attention was this comment:
Some people behind a very strict firewall, ignoring outgoing connections instead of rejecting them, had a timeout while loading our assemblies. That timeout (at least 15 seconds) is hardcoded in WinVerifyTrust!
Eureka! That’s exactly my problem!
And it explained why it was only happening on our Load Test server: Our Dev and Test servers have access to the internet, whilst the Load Test server does not: it has no proxy server configured.
To test this theory, I configured the Load Test server to use a proxy server. If you haven’t done this before, you can’t do it via Internet Explorer settings – that only configures a proxy for application which go via wininet.dll i.e. the Internet Explorer API, which uses the proxy server settings in the Internet Explorer/Internet Options/Connections/LAN Settings panel.
If you run proxycfg –u, this will import your Internet Explorer proxy server settings:
So I did this, restarted IIS and presto! The problem went away.
However, giving the Load Test server access to the Internet is a fairly large security hole (at least it is in our network, where the Load test server is behind both the firewall and DMZ).
So, we contacted the vendor who supplied the unmanaged DLL, and managed to get them to release a new version which wasn’t digitally signed.
Which solved the problem of the latency in the Web Service (which accounted for 15 secs of the 45 secs latency).
Solving the remaining 30 secs latency is the subject of next week’s post.
Back to Tech Blog Next Post