[Progress News] [Progress OpenEdge ABL] Always On: Multi-Region Failover and Disaster Recovery in Sitefinity Cloud

Status
Not open for further replies.
A

Anton Tenev

Guest
This is not a drill. The What’s New in Sitefinity Cloud series is back. With a bang. Without a hitch.


Read on for the full scoop of the latest Sitefinity Cloud add-on ensuring the highest possible uptime of your website or application and the business continuity of your project. Multi-region failover is like an insurance policy—great to have, but hopefully never need.

PaaS-grade Availability: Continuity-as-a-Service​


One of the key things about platforms as a service, such as Sitefinity Cloud, is ensuring uninterrupted service and operations. From small business to enterprise, organizations just need their websites and applications up and running. Business continuity is critical. Simple as that.

The solution? Multi-region Failover is a decisive next step in Sitefinity Cloud’s sustained effort towards providing a 99.9% uptime SLA. We’re bolstering protection against critical incidents and infrastructure outage, raising the bar in terms of availability and disaster recovery.

Multi-region Failover in Sitefinity Cloud​


Multi-region failover is providing a robust disaster recovery option to Sitefinity Cloud Customers, enabling engineering teams to restore service availability to visitors, content creators and developers within 30 minutes or less in the event of an infrastructure outage.

In a multi-region failover setup, every environment, for which you have enabled failover, has a replica of its entire service stack in a secondary region. In the event of an infrastructure failure or outage affecting the primary region that cannot be resolved quickly enough, the secondary region will onboard the traffic and restore services.

The failover mechanism in Sitefinity Cloud is based on a hot-warm model with a Primary and a Secondary region where the latter is being kept on standby and ready to switch to in case of a disaster or critical failure affecting the primary region.

In a hot-warm model, the Secondary region stack is not always on, making it easier to manage the process and resources. When needed though, it can be deployed quickly enough and conveniently take over until the service in the Primary region is fully restored.

Under the Hood: Infrastructure for Failover​


The Failover mechanism has been precisely configured to work across the multiple levels of the Sitefinity Cloud infrastructure. Parts of it are automated, but human intervention is required at critical stages of the process to prevent a disproportionate response to false positive incidents. In other words, the ultimate decision whether to switch to the Failover region or roll back to the Primary region, if troubleshooting succeeds, is that of the Sitefinity Cloud on-duty team—all within those 30 minutes or less.

The diagram below describes the process and the infrastructure components involved in detail, from the CDN level where the web application firewall and caching rules are applied, through the Azure stack (App Service instances, Redis cache, SQL database and the Azure search), to the Sitefinity Cloud Management Portal.


Sitefinity Cloud Failover Infrastructure


The Failover mechanism is perfectly aligned with the Sitefinity Cloud Management Portal pipelines. To ensure code consistency across all regions, the CI/CD process is configured to deploy new releases to the Secondary region App Service whenever a package is successfully pushed through the environments in the Primary region.

On database level, there is synchronous data replication in place to make sure the SQL instances in both regions are synced and prevent data loss in the event of a failover. Every two hours, the search indexes are synced as well between the Primary and the Secondary region. Now, this implies a potential delta in the search index data in case of a disaster recovery, but shorter sync intervals can be considered on a case-by-case basis.

Moreover, the auto-scaling rules you may have in place in your primary region are fully accounted for in the Failover mechanism. So, for example if a critical incident occurs during a traffic spike with extra instances spun up to accommodate the increased load, the Secondary region will auto-scale to the same number of app instances.

Another important thing to note is how Failover clicks with CDN. Sitefinity Cloud employs Cloudflare’s content delivery network, a geo-distributed system of servers for ultrafast delivery of content and media. The assets served by Sitefinity are cached under these CDN nodes, which are fully autonomous of the server hosting the Sitefinity application.

Other than providing high speed of delivery, CDN itself is an extra layer of insurance in the event of a Sitefinity application downtime. Effectively, your website(s) remain available to visitors, serving the content that has been cached under CDN.

Wrap-up​


To sum it all up, in the event of a critical failure in the Primary region, the Sitefinity Cloud on-duty team gets notified and starts troubleshooting. But at the same time, they start warming up the Secondary region, bringing it to a hot standby—ready to onboard the web application traffic.

Depending on whether the troubleshooting succeeds, the on-duty team has two options: call the failover off or Complete Failover to make the switch to region 2. So, two automated processes and a qualified decision-maker are seeing to it that your application is back up and running within 30 minutes or less in the event of critical outage.

The Failover mechanism is available per environment, so you can enable it on Production, Staging and even Content Authoring—which maybe a good idea if you have already opted for the Content Pipeline.

Here’s the rundown of the key benefits of the Sitefinity Cloud Multi-region Failover:

  1. Reliable disaster recovery mechanism to guarantee business continuity
  2. Available as an add-on regardless of the Sitefinity Cloud tier you’re on
  3. Available per environment (Production, Staging, Content Authoring)
  4. Sufficiently automated but triggered manually to prevent false positives
  5. Auto-scaling rules and data replication between regions

Multi-region Failover is available as an add-on service at an extra cost. Check out the Sitefinity Cloud Tiers for details and get in touch with your Sitefinity Cloud rep today. If you want to know more about the nuts and bolts, take a closer look at the Sitefinity Cloud Failover Mechanism documentation. In case of emergency, break glass. Oh wait, with the Multi-region Failover add-on you have another option. Do nothing and do it well.

Get in Touch to Learn More

Continue reading...
 
Status
Not open for further replies.
Top