Watch the Video version –
It’s one of the most important topics for BizTalk Architects but most confusing as well. I’ll try my best to present the pros and cons of different HA & DR options available and used in the BizTalk world.
What is High Availability (HA) and Disaster Recovery (DR) and why do we need them?
High Availability – High availability ensures that the system can operate continuously without fail. It shouldn’t have any single point of failure and should ensure continuous operation or uptime.
Disaster Recovery – Disaster Recovery is a detailed plan on how to respond and regain access and functionality in case of unplanned incidents such as natural disasters, power outages, cyber-attacks and any other disruptive events.
BizTalk Server consists of 3 main components –
- BizTalk Processing Server ( Host Instances for processing)
- BizTalk Databases (Persistence of Data)
- ENTSSO – Single Sign-On (Encryption)
Because of the clear-cut demarcation between Data and Processing, it becomes extremely easy to Scale BizTalk Hosts(processing server) and BizTalk Databases independently, solving issues of High Availability and Load Balancing.
Note: – For True High Availability or Disaster Recovery complete end-to-end components (i.e. frontend, middleware, backend) should be highly available and configured for DR. Here our focus is only on BizTalk Server.
BizTalk Server High Availability
Designing a BizTalk Server deployment that provides high availability involves implementing redundancy for each functional component involved in an application integration or business process integration scenario.
BizTalk Server HA focusses broadly on the below points –
- Processing Server High Availability – BizTalk Application Server
- In-Proc Host Instances
- Isolated Host Instances (For incoming message from IIS)
- BizTalk Databases High Availability Options
- SQL Active-Passive Failover Cluster
- SQL Server Always On
- SSO Master Secret High Availability Options
- Cluster ENTSSO Service on Existing SQL Server Cluster or independent cluster
BizTalk Application (Processing) Server High Availability
BizTalk Server lets you separate hosts and run multiple host instances to provide high availability for key functions such as receiving messages, processing orchestrations, and sending messages.
In-Process Host Instances
BizTalk Server automatically distributes workload across multiple servers through host instances, you do not require any additional clustering or load balancing mechanisms.
Every host should have at least 2 host instances running.
Note – Adapters like FTP, SFTP, POP3 result in duplicate or incomplete message processing if multiple host instances are running, due to the nature of these protocols. These adapters don’t put an exclusive lock on the file while reading, which causes duplicate/incomplete read if multiple host instances are running.
To avoid this many customers, disable other host instances except one, but this provides only manual failover. It’s recommended to use Windows Clustering, which ensures only one active host instance.
Refer to the below articles for more detail –
Isolated Host Instances –
Hosts running the receive handler for the HTTP, SOAP, WCF-BasicHTTP, WCF-WebHTTP and other IIS-related adapters require a load-balancing mechanism such as Network Load Balancing (NLB) to provide high availability.
BizTalk Databases High Availability
High availability for the BizTalk Server databases can be achieved in two ways –
- SQL Server Active-Passive Failover Cluster ( Traditional)
- SQL Server Always On (BizTalk 2016 onwards)
SQL Server Active-Passive Failover Cluster
Typically consists of two or more database computers configured in an active/passive server cluster configuration. These computers share a common disk resource (such as a RAID5 SCSI disk array or storage area network) and use Windows Clustering to provide backup redundancy and fault tolerance. The shared disk should be highly available, many customers use SAN drives.
- Traditional SQL Server clustering method.
- A bit better performant that SQL Server Always-On
- As the datafiles are stored in a common location, it may act as a single point of failure. Many customers use highly available file share like SAN to overcome this limitation.
- Secondary node can’t be used for any operation.
SQL Server Always-On
Always-On works with the concept of synchronous data replication from Active node to passive nodes. All the SQL Servers configured in one availability group has their own datafiles and are not shared.
An availability group supports a replicated environment for a discrete set of databases, known as availability databases.
You can create an availability group for high availability (HA) – A group of databases that failover together.
Deploying AlwaysOn Availability Groups requires a Windows Server Failover Clustering (WSFC) cluster. Each availability replica of a given availability group must reside on a different node of the same WSFC cluster. A WSFC resource group is created for every availability group that you create. The WSFC cluster monitors this resource group to evaluate the health of the primary replica. The following illustration shows an availability group that contains one primary replica and four secondary replicas.
Clients can connect to the primary replica of a given availability group using an availability group listener. An availability group supports one set of primary databases and one to eight sets of corresponding secondary databases.
- Doesn’t have single point of failure as all the SQL Servers have their own copy of Databases which is replicated via sync service
- Secondary databases can be used in Read-Only mode. This may be helpful in serving reports, websites etc. for BAM or other databases from secondary nodes.
- As every request comes only to primary node and then gets replicated to secondary nodes, it may add little latency. Better network speed can mitigate this issue.
Refer to the below article for more details –
Master Secret (SSO) High Availability
It has two main components –
- SSO key – Generated at the time of configuration as SSO<text>.bak file. It should be backed up carefully in TFS, mails etc. along with Password. It’s required at the time of restore.
- SSO Service – It should be clustered to ensure atleast 1 Master Secret sub-service is running to cache the key.
Detailed Explanation –
BizTalk Server uses SSO to secure information for port configuration. It’s encrypted and stored in the SSO database. Each BizTalk server has an SSO – Enterprise Single Sign-On service (ENTSSO.exe) which performs all these operations.
When an SSO service starts up, it retrieves the encryption key from the master secret server. This encryption key is called the master secret. The master secret server has an additional sub-service that maintains and distributes the master secret. After a master secret is retrieved, the SSO service caches it. Every 60 seconds, the SSO service synchronizes the master secret with the master secret server.
If the master secret server fails, and the SSO service detects the failure in one of its refresh intervals, the SSO service and all run-time operations that were running before on the server failed, including decryption of credentials, continue successfully. However, you cannot encrypt new credentials or port configuration data. Therefore, the BizTalk Server environment has a dependency on the availability of the master secret server.
Making the Master Secret Server Key HA
It’s critical to back up the master secret as soon as it is generated. If you lose it, you lose the data that the SSO system encrypted by using that master secret. For more information about backing up the master secret, see How to Back Up the Master Secret (http://go.microsoft.com/fwlink/?LinkID=151934) in BizTalk Server Help.
Making Master Secret Service HA–
Use Windows Clustering on a separate master secret server cluster or use an existing database cluster. The services provided by the master secret server do not consume many resources, and typically do not affect database functionality or performance when installed on a database cluster.
The following figure shows how you can make the master secret server highly available.
Refer to the below article for detailed steps on how to cluster Master Secret –
Cluster ENTSSO Service on existing SQL Server Cluster
We can configure ENTSSO on the existing SQL Server Cluster as shown below –
Limited Change and no extra resources – As Master Secret service is configured on existing SQL Server Failover Cluster, it utilizes existing setup and requires very little changes.
Messages suspended on one server can resume processing on another
Limitation – System will require downtime for patching etc.
Custom HA – Creating Parallel BizTalk Server Environment
Another option is to create a parallel environment (PROD2). All the incoming traffic can be diverted using Network Load Balancer to PROD1 and PROD2 environments.
1) Increased Processing Capacity – Load Balancing
Additional BizTalk and SQL server setup bring additional processing capacity. As more applications are about to come to existing system more capacity will be required.
- Zero Downtime – Downtime requirement during patching or any other mishappening can be averted now.
Deployment or patching activity can be first performed in PROD1. Meantime all the requests can be rendered by PROD2 providing true HA. Later PROD1 can be brought online and changes can be performed in PROD2.
- High Risk Tolerance – Many times patching or any other activity brings complete environment to halt. This can be avoided with this setup, as one server may act as playground or buffer for changes.
- Requires additional resources – Although, due to performance limitation if 2 additional processing servers are required, it’s better to increase it as a parallel setup. As only additional cost will be of 1 SQL Server.
- Both the environments act as a different BizTalk Group – This means both the groups are totally unrelated. Message in one group has no relation to another.
So, messages suspended in 1 group can’t be retrieved on another. However, as per ABFSS requirements suspended messages are not required to be process after some time.
Custom HA – Creating Secondary setup in cloud for HA and DR
Secondary setup can also be hosted in Azure cloud as (Infrastructure as a Service) IaaS VMs.
- Increased Processing Capability in cloud.
- High Availability in Cloud
- Access to Azure Capabilities and Benefits
- Integration to Azure SaaS/PaaS becomes easy
- Provides Disaster Recovery as well if configured in different region
Note – Using Express Route can result in great connectivity speed between on-prem and cloud servers.
Requires additional resources
Benefits of IaaS –
- Eliminates capital expense and reduces ongoing cost. IaaS sidesteps the upfront expense of setting up and managing an on-site datacentre, making it an economical option for start-ups and businesses testing new ideas.
- Improves business continuity and disaster recovery. Achieving high availability, business continuity and disaster recovery is expensive since it requires a significant amount of technology and staff. But with the right service level agreement (SLA) in place, IaaS can reduce this cost and access applications and data as usual during a disaster or outage.
- Innovate rapidly. As soon as you have decided to launch a new product or initiative, the necessary computing infrastructure can be ready in minutes or hours, rather than the days or weeks—and sometimes months—it could take to set up internally.
- Respond quicker to shifting business conditions. IaaS enables you to quickly scale up resources to accommodate spikes in demand for your application— during the holidays, for example—then scale resources back down again when activity decreases to save money.
- Focus on your core business. IaaS frees up your team to focus on your organization’s core business rather than on IT infrastructure.
- Increase stability, reliability and supportability. With IaaS there is no need to maintain and upgrade software and hardware or troubleshoot equipment problems. With the appropriate agreement in place, the service provider assures that your infrastructure is reliable and meets SLAs.
- Better security. With the appropriate service agreement, a cloud service provider can provide security for your applications and data that may be better than what you can attain in-house.
- Gets new apps to users faster. Because you don’t need to first set up the infrastructure before you can develop and deliver apps, you can get them to users faster with IaaS.
Inbuilt monitoring of BizTalk using Application Insight, Log Analytics and Alerts.
Disaster Recovery of BizTalk Server
Disaster recovery procedures improve system availability by employing steps to resume operation of a failed system. Disaster recovery differs from fault tolerance in that disaster recovery typically requires manual intervention to restore the failed system whereas a fault-tolerant design can continue to operate without manual intervention if the system encounters a failure condition.
Refer below articles for detailed steps –
- Planning for Disaster Recovery
- Configuring for Disaster Recovery
- Recovering from a Disaster
- Advanced Information About Backup and Restore2
Here we will only focus on DR of BizTalk Server, excluding –
- Non-BizTalk applications
- Application source code
- Application operations
Using BizTalk Backup Job and Log Shipping – Only MS Supported DR way
The only supported way of backing up the BizTalk databases is by using the out of the box Backup BizTalk Server Job combined the BizTalk Log Shipping for automatic restoration.
Configure and activate the BizTalk Backup Job to generate the only supported BizTalk backup files and ensure you are able to restore them by planning and testing a disaster recovery plan. The success criteria for a Disaster Recovery plan happens only when you test your scenario.
BizTalk DR Setup High Level Overview –
1) BizTalk Application Servers –
Configure DR BizTalk Servers with “Join an existing group” settings. BizTalk Servers should be part of same group to understand and process messages.
2) ENTSSO Servers –
Configure ENTSSO with “Join an existing group” settings.
3) BizTalk Databases –
Configure BizTalk Backup Job to create backup files and log shipping to restore it to DR servers
Detailed steps are given in another document.
- Only supported DR methodology.
- DR Servers can be brought within very less time
- As DR and DC system both are part of same group, this allow suspended messages in DR to be recovered in DR.
Limitations – Bringing DR Servers to operation requires manual intervention and may take some time.
Using Parallel DR Setup in Cloud/On-Prem (Custom -Not supported by MS)
Create parallel BizTalk Setup in DR environment in On-Prem or Cloud.
- Load Balancing – Load can be split between DR and DC servers. If latency is not a major concern.
- No Downtime – As DR is always available, during disaster downtime is not required
- DR will always be tested and Updated – As DR is always used for message processing it will always be updated with latest code and tested
• It’s a custom DR, where both the BizTalk groups are un-related. Messages being processed on one server will not be present in another. This poses risk of data loss during disaster.
As, complete message processing is atomic. So, resuming suspended instances/in-flight on DR server is not required.
Configuring this parallel setup in the cloud will provide HA and DR both as cloud setup can be set in another region.
Contact Me –
Check out my other blogs –
Export MSI and Binding files to a Folder programmatically C#
C# Programmatically Create BizTalk Host, Host Instances and Set Orchestration (Xlang) and Messaging Polling Interval (Performance Tuning)
Create BizTalk Host and Host Instances using C#. Also, update Polling Interval – Messaging and Orchestration for better performance and less polling on SQL Server
Watch the Video – BizTalk Automated Migration Tool Demo – Microsoft has recently released BizTalk 2020 with lots of new features and enhancements. There are many reasons to Migrate and can be broadly summarized as below – Avail Microsoft SupportBizTalk 2016 Mainstream Support Ends – 1st November 2022All other previous versions like 2013R2, 2013 or…