What is Drastic (Disaster) Recovery
Some people say DR is not possible or difficult to implement in a Cloud Platform such as the PaaS model. This is in contrast to Cloud Platform IaaS as you have more control over hardware/configuration in a IaaS environment than you do in PaaS. But hopefully after you have read this article, you'll soon realise it is very easy now in Windows Azure.
|Figure 1: Basic drastic recovery fail over configuration|
This means the Europe data centre is currently serving up the users requests all the time and the USA data centre is in passive/sleep state - in other words not being used unless of a failure in the Europe data centre. When/if this failure occurs, we need to switch from Europe to the USA data centre. This is not easy with the current configuration because the users/actors will have to switch URLs from Europe to the USA data centre - this is far from ideal as often the user/actor probably wouldn't know when to try the other URL, it really needs to be seamless.
So we really want to use a single URL that has the ability to reference both data centres when we need to without the users actually knowing.
A simple way to implement DR in Windows Azure
This is where DNS (Domain Name Service) play a very important role in hardware infrastructure and helps us solve this problem relatively easilly.
Now consider an amendment to the above diagram (figure 1) to now abstract the user from the *.cloudapp.net domain name using a internet DNS registrar and a CNAME record that resolves to the Azure data centre required. Remember, each cloudapp sub domain represents a single data centre region. When you design for a drastic recovery solution, you wouldn't normally use the same data centre as it kind of defeats the purpose of having a DR strategy.
|Figure 2: Basic drastic recovery fail over configuration with DNS|
Of course they could run a trace (TRACERT) on http://myamazingapp.com and see where it resolves to. In fact, I have made the above configuration on a application I have deployed in Azure right now. If I run TRACERT on my sub domain: http://remotemedia.simonrhart.com I get the following:
|Figure 3: Running TRACERT on my sample app hosted in Azure|
We know that IP address is a real Azure data centre as it is registered to Microsoft. Here is the result from running a whois on the resolved IP address:
WHOIS information for 126.96.36.199:
[Querying whois.arin.net] [Redirected to whois.ripe.net:43] [Querying whois.ripe.net] [whois.ripe.net] % This is the RIPE Database query service. % The objects are in RPSL format. % % The RIPE Database is subject to Terms and Conditions. % See http://www.ripe.net/db/support/db-terms-conditions.pdf % Note: this output has been filtered.
% To receive output for a database update, use the "-B" flag. % Information related to '188.8.131.52 - 184.108.40.206' inetnum: 220.127.116.11 - 18.104.22.168 descr: Microsoft Limited org: ORG-MA42-RIPE netname: UK-MICROSOFT-20081107 country: GB admin-c: AS9763-RIPE tech-c: EN603-RIPE tech-c: BR329-ARIN status: ALLOCATED PA mnt-by: RIPE-NCC-HM-MNT mnt-lower: MICROSOFT-MAINT mnt-domains: MICROSOFT-MAINT mnt-routes: MICROSOFT-MAINT source: RIPE # Filtered organisation: ORG-MA42-RIPE org-name: Microsoft Limited org-type: LIR address: Microsoft Darren Norman One Microsoft Way WA 98052 Redmond UNITED STATES phone: +1 (425) 703 6647 fax-no: +1 425 936 7329 e-mail: email@example.com admin-c: NORM1-RIPE admin-c: NORM1-RIPE admin-c: NORM1-RIPE mnt-ref: MICROSOFT-MAINT mnt-ref: RIPE-NCC-HM-MNT mnt-by: RIPE-NCC-HM-MNT source: RIPE # Filtered person: Allie Settlemyre address: Microsoft Limited address: One Microsoft Way, address: Redmond, WA 98052 address: USA phone: +1 (425) 705 0516 phone: +1 (425) 936 7329 e-mail: firstname.lastname@example.org nic-hdl: AS9763-RIPE source: RIPE # Filtered person: Bharat Ranjan address: Microsoft Corporation address: Redmond, WA, 98102 address: One Microsoft Way address: USA phone: +1 (425) 706 3230 fax-no: +1 (425) 936 7329 nic-hdl: BR329-ARIN source: RIPE # Filtered e-mail: email@example.com person: Edet Nkposong address: Microsoft, One Microsoft Way,Redmond, WA 98052 address: USA e-mail: firstname.lastname@example.org phone: +14257071045 nic-hdl: EN603-RIPE mnt-by: MICROSOFT-MAINT source: RIPE # Filtered
So that is wonderful isn't it? DR and failover problem sorted. Well kindof. It's not perfect as it's very manual. If the European data centre where my application is deployed goes down, I need to know about it so I can tell my DNS registrar to change the CNAME record to point to the application that is deployed in the DR data centre - North Central US.
This means I will have to log into my DNS registrar and change the CNAME when a failure occurs like so:
|Figure 4: Setting up a CNAME record|
Surely there is a better way?
Windows Azure Traffic Manager
Although what I have talked about above will work, it's fairly simple and I have done this for some time. But thankfully there is a better way. Microsoft has made available in Community Technical Preview (CTP) a feature called Windows Azure Traffic Manager.
Unlike the way the beta programmes work in Azure, you can start using the Traffic Manager right away. There is no request to make in order to start using it - as per the beta programme
Windows Azure Traffic Manager can solve you're failover DR strategy without having to touch any DNS server/registrar once it's setup and more. It supports the following:
- Performance – traffic is forwarded to the closest hosted service in terms of network latency
- Round Robin – traffic is distributed equally across all hosted services
- Failover – traffic is sent to a primary service and, if this service goes offline, to the next available service in a list
So Traffic Manager will solve our problem of having to manually update the DNS registrar with the new Azure data centre DNS cloudapp domain name. Great, how do I do it?
Enabling Traffic Manager
To start using Traffic Manager you need to use the Windows Azure Management Portal to create a policy.
To do this navigate to the Windows Azure Management Portal and sign-in: http://windows.azure.com. Then click Virtual Network > Get Started With Traffic Manager.
See figure 5 below:
|Figure 5: Getting started with Windows Azure Traffic Manager|
Once you click the Get Started with Traffic Manager button, you'll see a dialog box similar to the following popup:
|Figure 6: Creating a Traffic Manager Policy|
Notice, there is a lab that you can do that covers all this setup of Traffic Manager here: http://msdn.microsoft.com/en-us/gg197529 but I have included here for the bigger picture of what specifically Traffic Manager is designed to solve and how you would solve these problems without it.
I have filled in the policy above as per the original high-level architecture diagram in figure 1 above. Note: DNS names are different from my diagram but the concept and design is the same.
Some data here is important, one piece is the DNS time to live (TTL). This is the maximum time users will have to wait until the DNS server gets updated with the new URL should a failure occur. The default is 5 minutes (300 seconds). The other important peice of information is the Traffic Manager DNS Prefix field.
Well, the Traffic Manager DNS Prefix field can be anything we want (so long as it hasn't been used already) as the users will never see it. Later we will reconfigure our DNS registrar to point to this DNS address.
Once I click OK, the policy is then created and it is active in Traffic Manager:
|Figure 7: Our policy in traffic manager|
|Figure 8: That's it, our DNS configured and never needs to change again!|
We are simply handing the problem of failover over to Windows Azure. So in the above case, Azure will handle changing the DNS CNAME configuration should a failure occur.
Making Sure Traffic Manager is Working
What we now need to do is test that the Traffic Manager failover feature is working correctly.
If we now run a trace route on our new traffic manager URL it should resolve to the Europe data centre (in my case http://remotemedia.cloudapp.net) - remember I have two data centres 1 in Europe (active) and 1 in North Central US (passive):
|Figure 9: Tracing traffic manager configuration|
Now I want to force a failure so I can test the failover. This is easy, all I need to do is shutdown the Europe data centre services like so:
|Figure 10: Shutting down active node in Windows Azure|
Once 5 minutes has elapsed, I'll run the same trace route command via a command-prompt like so:
|Figure 11: Tracing now that Europe services are down|
Also, if I run the trace one layer out from my custom domain: http://remotemedia.simonrhart.com, I get the expected failover data centre as above [remotemedia-dr.cloudapp.net]:
|Figure 12: Running a trace route from my custom domain|
How does all this look, consider the new amended high-level architecture diagram in figure 13 below:
|Figure 13: Complete high-level architecture diagram using Traffic Manager for DR|
So I think the Windows Azure Traffic Manager is a good solution at solving your Windows Azure failover needs. Checkout the Traffic Manager training lab for a hands-on exercise on how to use it in more detail.
In this article, I have also used a public DNS registrar, but if your users are within a corporate LAN but you want to make use of a public cloud platform like Windows Azure, the same concepts apply to an internal DNS server farm.
In this blog post, I wanted to show how DR can be done in a PaaS model like Windows Azure - hopefully you can see how easy it is with Windows Azure Traffic Manager.