Tech Blog

BizTalk and Certificate Revocation Lists (CRLs) – Part II

(This is the third in a series of three posts about CRLs – the first was Web
Services and the 15 Second Delay
, and the second was BizTalk
and Certificate Revocation Lists (CRLs) – Part I
).

Note: A lot of the information in this post comes from a great MSDN article
located here.

Caveat: My client uses 64-bit servers (AMD Opterons), running 64-bit
versions of Windows 2003 R2 and BizTalk 2006. IIS is running in 32-bit
compatibility mode
 (as we use Sharepoint). I haven’t yet worked out if the
CRL problem occurs on 32-bit servers – I definitely haven’t noticed the problem on
our 32-bit servers as of yet.

For 2 months, my BizTalk application was working fine. The system passed performance
testing, and was deployed on the Live servers in preparation for final connectivity
testing.

Then one Monday, last week, the test team complained that they were experiencing sporadic
timeouts. On the same day, I was doing some testing on an unrelated BizTalk application
on a separate server… and I noticed that I would occasionally get request-response
latency approaching 70 secs…

Given that the same day I’d noticed I no longer had access to iTunes Radio from that
morning (bah!), I assumed that changes had been made to our proxy sever or firewall.
I fired up TCP View on the server I was working on, and there was our old friend SYN_SENT:
something was blocking access to the CRL again. I spoke to the Tech Support team and
discovered that no changes had been made to the proxy server. Leaving them to check
for changes to our firewall and security policies, I decided to do some research into
why this delay exists (if the call is blocked) and if there was a way around it. Here’s
what I discovered (refer to this article
for a more in-depth explanation of Certificates and CRLs):

  • Any given Digital Certificate contains a property called the CRL Distribution Point
    which is a collection of URIs.

  • When a certificate is validated, a CRL retrieval attempt is made using each URI in
    the list. Retrieval stops with the first URI to return a valid CRL

  • When a valid CRL is obtained, it is stored in the Certificate Store for
    the Local Machine (under Certificates (Local Computer)/Intermediate Certification
    Authorities/Certificate Revocation Lists
    )

  • A CRL is a certificate in its own right and as such, it contains an expiry/update
    date called the Next Update date

  • If the CRL already exists in the Certificate Store and is still valid
    then this CRL is used; otherwise an attempt is made to download an updated CRL

  • URI schemas valid for CRLs include http://, ldap://, and file:// – it is the Publisher
    of the certificate who decides upon the contents of the CRL Distribution Point

  • In large corporations, it is common to use
    Active Directory (AD) as the provider of CRLs
    : AD can download the required CRLs
    and either publish them to a master location, or distribute them to servers that need
    them

One thing I was curious about was this 15 second delay which kept popping up.

The Xceed
Software post
I had read had made reference to there being a 15 second delay hard-coded
into the WinVerifyTrust API
call.

Looking through the
documentation for WinVerifyTrust
I noticed two things:

  1. Microsoft recommend the use of CertGetCertificateChain for
    validating a certificate (instead of WinVerifyTrust)

  2. That WinVerifyTrust enumerated a registry key (HKLMSOFTWAREMicrosoftCryptographyProvidersTrust)
    to find out what API call to use to verify the trust of the given object

I’m not about to trace what WinVerifyTrust does to actually check the CRL, but I’d
suspect that it ends up delegating to either CertGetCertificateChain or CertVerifyRevocation (and
I’d bet that internally, CertGetCertificateChain calls CertVerifyRevocation to verify
the CRL for a given certificate).

Suffice to say that CertGetCertificateChain will build a chain of certificates starting
from the given certificate, and building the chain all the way up to the root CA,
and will optionally check the revocation status for each certificate in the chain;
whilst CertVerifyRevocation will verify the revocation status for a single certificate.

And both of them take, as one of their parameters, a struct called CERT_REVOCATION_PARA.

The format of that structure is:

typedef
struct _CERT_REVOCATION_PARA {
  DWORD cbSize;
  PCCERT_CONTEXT pIssuerCert;
  DWORD cCertStore;
  HCERTSTORE* rgCertStore;
  HCERTSTORE hCrlStore;
  LPFILETIME pftTimeToUse;
  DWORD dwUrlRetrievalTimeout;
  BOOL fCheckFreshnessTime;
  DWORD dwFreshnessTime;
  LPFILETIME pftCurrentTime;
  PCERT_REVOCATION_CRL_INFO pCrlInfo;
  LPFILETIME pftCacheResync;
  PCERT_REVOCATION_CHAIN_PARA pChainPara;
} CERT_REVOCATION_PARA,
 *PCERT_REVOCATION_PARA;

Heh, look, there’s a member called dwUrlRetrievalTimeout.

Wonder if that’s relevant??? 😉

The documentation has this to say:

This member contains the time-out limit,
in milliseconds. If zero, the revocation handler’s default time-out is used.

And what’s the revocation handler’s default time-out?

Well, Microsoft doesn’t specify this directly… but I notice in a related
knowledge base post
, that a value of 15000 milliseconds is used… i.e. 15 seconds!

So that’s as far as we can go with that – unless IIS includes an option to configure
this timeout, then we can’t change it (and they do, sort of).

Whilst researching this post, I noticed that one solution that is frequently
touted
is to modify the following registry key:

HKCUSoftwareMicrosoftWindowsCurrentVersionWinTrustTrust
ProvidersSoftware PublishingState

But that’s not much use, as that’s for the Current User (hence the HKCU). Great if
I was using my own local user account for the application pools, bad if I’m using
a non-interactive user account (which we are). Plus I’m not sure this would work for
IIS… maybe I’ll try it at some stage.

(Note: looks like Microsoft are aware of this issue, because in
Windows Vista/Longhorn there’s now a Group Policy setting
which lets you set this
default timeout for non-interactive processes i.e. IIS App Pools!!)

So what’s the solution in this case?

Well, unless the technical support guys can work out what they changed to block CRL
access (I suspect they turned on authentication on the proxy), we have four choices:

  1. Use Active Directory to store and publish CRLs (which we should have been
    doing from the start IMO)
    This is Microsoft’s preferred way of doing it for large customers.
    More information on configuring CRLs with AD can be found here.
  2. Manually download the required CRL and install it
    This is my preferred solution for this particular issue, and is detailed
    below.
  3. Disable CRL checking for the server
    This is an interesting one. I’m not convinced that this can be done – there
    are a few posts about how to do this, including one on how to do it for IIS here.
    However, this seems to be related to certificate exchange for HTTP request/responses,
    as opposed to certificate validation for signed code, which is a whole different thing.
    Plus, turing off certificate checking is a rather large security hole as you don’t
    know if a given certificate is still valid. 

  4. Change the default CRL timeout period for CAPI
    I
    noticed in the Knowledge Base article for an
    update
    to IIS 5.0
    that new registry keys had been added, including allowing a value called ChainUrlRetrievalTimeoutMilliseconds to
    be set.

    Then when browsing through the PKI documentation, I noticed a reference to the same
    registry keys, plus a note saying “this setting was first introduced with MS04-011”
    (the IIS 5.0 update linked to above).

    So it looks like it is possible to set the default timeout.

    I haven’t tried this, so can’t verify that it works, but to me it’s not the correct
    solution: the CRL should be available, either from AD or the URL, or by installing
    it manually – setting the timeout to a lower value seems to be just ignoring the problem,
    plus creates a potential security hole as you can’t be sure that the certificate used
    to sign code is valid anymore.

Manually downloading and installing a CRL

Needless to say, I thought I’d have a go manually downloading the CRL and installing
it – and it worked a treat. Problem solved (at least until the next CRL update is
needed, which is August 2007). Still, gives us a breather to get it properly sorted.

Finding the URL to the certificate is easy: look in the certificate details for the
CRL Distribution Point, and copy the URL from there. In this case, it’s the Microsoft
Code Signing Public Certification Authority CRL: http://crl.microsoft.com/pki/crl/products/CodeSignPCA.crl

You can put this URL in a web browser, and download the certificate.

(Note: if you’re doing this in Windows Server 2003, you’ll need to add crl.microsoft.com
to your list of Trusted Sites, otherwise you won’t be able to download the CRL file)

Once you have the file, you can install it following the instructions here:


And lo and behold, the problem was fixed.
At least, it is fixed until August 30th 2007 when the CodeSignPCA.crl expires… 😉
But by then, I’m sure we’ll have found a permanent fix!

Back to Tech Blog