Just renewed my SSL certificate thanks to StartSSL (using their very nice and fairly straight forward automated system), it got me thinking on all the issues that I have encountered and heard about in regards to SSL certificate and domain renewal.

In a previous job there was an incident where a SSL certificate expired, this went unnoticed (it was a analytics service, not a core service) for 5 weeks, not too bad right? Well, unfortunately due to an additional oversight in the client code, if a request failed, it would retry the request after a time out of 0.5 seconds, indefinitely!. This lead to hundreds of thousands of clients hammering the (auto-scaling) load balancer with requests that were dropped as SSL verification failed. The end result, a large bill that could have been avoided. This issue is hardly unique, even the largest cloud providers have been hit by similar issues (see Windows Azure Service Disruption from Expired Certificate or Google.com domain getting transferred or Microsoft losing its Hotmail domain).

It is hard to have a universal list of best practices as it is dependent on the size of the organisation, but some good ideas:

  • Avoid single point of failure, never have an SSL cert (or domain name) associated with an individual developers email address. Instead use a specific mailing list.
  • Monitoring, even secondary services (that are considered ‘best effort’), need active monitoring.
    • Monitoring results should be sent to mailing list not a specific individual
    • When setting up monitoring, don’t forget about monitoring costs as well as uptime!
  • Consolidate all domain and SSL certs with a single trusted provider
  • Purchase with a Company credit card with auto renew

What best practices would you recommend? What bullets have you dodged (or got hit by!)? Any services out there that you use to avoid issues?


Leave a Reply

Your email address will not be published. Required fields are marked *