Step-by-Step Guide for SQL Server Always on Failover
Introduction
SQL Server Always On Availability Groups provide high availability and disaster recovery capabilities for SQL Server databases. Synchronous commit mode ensures that data is committed to both the primary and secondary replicas, guaranteeing data integrity during failover. This guide provides detailed steps for performing both manual and automatic failovers in an Always On Availability Group.
Part 1: Preliminary Checks and Preparations
Before initiating any failover, whether manual or automatic, ensure the environment is healthy and ready for the transition.
Step 1: Prerequisites Check
– Synchronization State: Confirm that all secondary replicas are in the SYNCHRONIZED state.
– Query to Check Synchronization State:
SELECT
ag.name AS [AvailabilityGroupName],
ar.replica_server_name AS [ReplicaServerName],
drs.synchronization_state_desc AS [SynchronizationState]
FROM sys.dm_hadr_availability_replica_states AS drs
JOIN sys.availability_replicas AS ar
ON drs.replica_id = ar.replica_id
JOIN sys.availability_groups AS ag
ON ar.group_id = ag.group_id
WHERE drs.synchronization_state_desc = ‘SYNCHRONIZED’;
– Action: Ensure that all replicas are in the SYNCHRONIZED state to avoid any data loss during failover.
– Verify Health of Availability Group:
– Query to Check Availability Group Health:
SELECT
ag.name AS [AvailabilityGroupName],
ags.primary_replica AS [PrimaryReplica],
ags.operational_state_desc AS [OperationalState]
FROM sys.dm_hadr_availability_group_states AS ags
JOIN sys.availability_groups AS ag
ON ags.group_id = ag.group_id;
– Action: Ensure that the OperationalState indicates a healthy state for a successful failover.
Step 2: Validate Readiness for Failover
– Check Active Transactions: Ensure no active transactions could be disrupted during the failover.
– Query to Identify Open Transactions:
DBCC OPENTRAN;
– Action: Make sure there are no long-running transactions that could be affected during the failover.
Part 2: Manual Failover Procedure
If a manual failover is required, follow these steps to initiate and verify the failover.
Step 3: Initiate Manual Failover
– Using SSMS:
- Connect to the primary replica in SQL Server Management Studio (SSMS).
- Navigate to Always On High Availability > Availability Groups.
- Right-click the desired Availability Group and select Failover.
- Follow the Failover Wizard steps to complete the failover.
– Using T-SQL:
ALTER AVAILABILITY GROUP [YourAvailabilityGroupName] FAILOVER;
– Action: This command initiates the failover to a secondary replica.
Step 4: Validate Manual Failover
– Verify New Primary Replica:
SELECT
ag.name AS [AvailabilityGroupName],
ags.primary_replica AS [PrimaryReplica]
FROM sys.dm_hadr_availability_group_states AS ags
JOIN sys.availability_groups AS ag
ON ags.group_id = ag.group_id;
– Action: Ensure that the PrimaryReplica has been updated to the intended secondary replica.
– Verify Database State:
SELECT
db.name AS [DatabaseName],
dbrs.synchronization_state_desc AS [SynchronizationState],
dbrs.database_state_desc AS [DatabaseState]
FROM sys.databases AS db
JOIN sys.dm_hadr_database_replica_states AS dbrs
ON db.database_id = dbrs.database_id
WHERE db.replica_id = dbrs.replica_id;
– Action: Ensure that all databases are ONLINE and SYNCHRONIZED.
Part 3: Automatic Failover Configuration and Monitoring
Automatic failover ensures minimal downtime by automatically transferring the primary role to a secondary replica in case of a failure.
Step 5: Configure Automatic Failover
– Requirements for Automatic Failover:
– Synchronous Commit Mode: Both primary and secondary replicas must be in synchronous commit mode.
– Automatic Failover Set: Both replicas must be part of an automatic failover set.
– Configure Automatic Failover:
– Add Secondary Replica with Automatic Failover Capability:
ALTER AVAILABILITY GROUP [YourAvailabilityGroupName]
MODIFY REPLICA ON ‘YourSecondaryReplica’
WITH (
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
FAILOVER_MODE = AUTOMATIC
);
– Action: This ensures that the secondary replica can assume the primary role automatically in the event of failure.
Step 6: Validate Automatic Failover Configuration
– Query to Validate Failover Configuration:
SELECT
replica_server_name,
availability_mode_desc,
failover_mode_desc
FROM sys.availability_replicas
WHERE group_id = (SELECT group_id FROM sys.availability_groups WHERE name = ‘YourAvailabilityGroupName’);
– Action: Verify that availability_mode_desc is SYNCHRONOUS_COMMIT and failover_mode_desc is AUTOMATIC for relevant replicas.
Part 4: Post-Failover Actions and Validation
After any failover, it is crucial to validate that all components are operating correctly.
Step 7: Monitor Post-Failover Performance
– Monitor Replication Health:
– Query to Monitor Health:
SELECT
ag.name AS [AvailabilityGroupName],
drs.replica_server_name AS [ReplicaServerName],
drs.synchronization_health_desc AS [SynchronizationHealth],
drs.role_desc AS [CurrentRole]
FROM sys.dm_hadr_availability_replica_states AS drs
JOIN sys.availability_groups AS ag
ON drs.group_id = ag.group_id;
– Action: Ensure that the synchronization health is HEALTHY for all replicas.
Step 8: Reconfigure Backup Jobs
– Update Backup Jobs: Adjust any backup jobs to reflect the new primary replica to ensure backups are taken from the correct server.
– Action: Modify SQL Server Agent jobs or other automation tools to maintain continuity of backups.
Step 9: Testing and Validation
– Test Failover: Regularly conduct DR drills to validate the failover process and ensure there are no issues.
– Communicate with Stakeholders: Notify all relevant stakeholders of the failover, including application teams, users, and management.
Conclusion
Failing over an Always on Availability Group requires careful preparation and validation to ensure data integrity and minimize downtime. By following these steps, you can execute a successful failover while maintaining high availability and consistency. Continuous monitoring and proactive testing are essential to ensure the robustness of your SQL Server Always On environment.
