
Introduction / Issue
During routine monitoring of SQL Server Always On Availability Groups, one of the secondary replica databases was found in a NOT SYNCHRONIZING state. Upon further investigation, its status showed RECOVERY_PENDING, meaning SQL Server could not bring the database online. This directly impacted the high availability setup, as data movement between replicas had stopped.
Why We Need to Do / Cause of the Issue
An Always On Availability Group (AG) is only reliable when all its underlying components—disk, transaction log, network, and replica synchronization—are functioning properly.
Common causes for a database entering RECOVERY_PENDING in an AG include:
- Low disk space on data or log drives.
- Corrupted or missing log/data files.
- Transaction log sequence breaks between replicas.
- Network interruptions during log shipping/synchronization.
- Database replication suspended due to excessive log growth.
The impact is significant: application failover protection is weakened, reports and queries depending on the replica fail, and the HA/DR promise of Always On is compromised.
How Do We Solve
The following steps were executed to troubleshoot and resolve the issue:
- Check SQL Error Log for Clues:
EXEC sp_readerrorlog;
This helps identify if the problem is related to disk space, log corruption, or replication issues.
- Confirm Database State:
SELECT name, state_desc, recovery_model_desc
FROM sys.databases
WHERE name = ‘DatabaseName’;
Status confirmed as RECOVERY_PENDING.
- Check AG Health:
SELECT d.name, drs.synchronization_state_desc, drs.synchronization_health_desc, drs.is_suspended
FROM sys.databases d
LEFT JOIN sys.dm_hadr_database_replica_states drs
ON d.database_id = drs.database_id
WHERE d.name = ‘DatabaseName’;
Output showed NOT SYNCHRONIZING and is_suspended = 1.
- First Attempt – Resume Data Movement:
ALTER DATABASE [DatabaseName] SET HADR RESUME;
If the issue is due to temporary network glitches, disk pressure, or a restart, this step can restore synchronization.
- If Resume Fails – Reinitialize Database on Secondary:
- Remove database from AG:
ALTER AVAILABILITY GROUP [AGName] REMOVE DATABASE [DatabaseName];
-
- Restore a fresh full + log backup to the secondary in NORECOVERY mode:
BACKUP DATABASE [DatabaseName] TO DISK=’C:\Backup\DatabaseName.bak’ WITH INIT, COMPRESSION;
BACKUP LOG [DatabaseName] TO DISK=’C:\Backup\DatabaseName_Log.trn’ WITH INIT, COMPRESSION;
RESTORE DATABASE [DatabaseName] FROM DISK=’C:\Backup\DatabaseName.bak’ WITH NORECOVERY;
RESTORE LOG [DatabaseName] FROM DISK=’C:\Backup\DatabaseName_Log.trn’ WITH NORECOVERY;
-
- Re-add to AG:
ALTER AVAILABILITY GROUP [AGName] ADD DATABASE [DatabaseName];
- Prevention & Monitoring:
- Keep at least 20% free space on data and log drives.
- Ensure frequent log backups to avoid log bloat.
- Automate monitoring of suspended states with a resume script:
IF EXISTS (
SELECT 1 FROM sys.dm_hadr_database_replica_states
WHERE database_name = ‘DatabaseName’ AND is_suspended = 1
)
BEGIN
ALTER DATABASE [DatabaseName] SET HADR RESUME;
END
Conclusion
The Always On Availability Group issue was resolved by systematically checking the error logs, confirming the state, attempting to resume synchronization, and finally reinitializing the database on the secondary replica. This ensured no data loss and restored synchronization between primary and secondary replicas. Preventive measures such as disk monitoring, regular log backups, and automated suspension handling were put in place to avoid recurrence.
Through this structured approach, the high availability promise of Always On was effectively restored, ensuring application continuity and database reliability.