Problem
My company is heavily dependant on SQL Server transactional replication and once in awhile the default alerts are not sufficient and sometimes we want to be able to disable alerts when we perform maintenance. In this tip I will show you a few scripts I have implemented to allow me to better manage our transactional replication.
Solution
I wrote a few customized jobs that cover the following:
- Putting Replication Alerts in Maintenance Mode
- Check Replication Distributor Agents for Continuous Replication
- Check Replication LogReader Agents
Assumptions for the provided SQL Job scripts.
- Assumes there is a database called "DBA"
- Assumes the code (stored procedure) exists in the "DBA" database
- Assumes the operator name for the DBA is "Database Administrator"
Based on the above you may need to make some adjustments to the scripts for your environment.
Putting Replication Alerts in Maintenance Mode
Basically, when I am working on replication issues we do not want to get the alerts every minute, so I used to disable the alerts manually using SSMS.
The problem is that if I disable the alerts and it takes a long time to fix the issue it is easy to forget to turn them back on. Also, I want other people to know I have disabled the alerts. So, I created two simple jobs to handle this.
DBA - Replication Alert Disable (Click here to get Scripts)
This jobs disables the alerts and also notifies the DBA Group that the alerts have been disabled.
EXEC msdb.dbo.sp_update_alert @name = N'Replication: agent failure', @enabled = 0 ; EXEC msdb.dbo.sp_update_alert @name = N'Replication: agent retry', @enabled = 0 ; |
DBA - Replication Alert Enable (Click here to get Scripts)
In my environment this job runs at 10am every morning (except Saturday and Sunday) for DEV and TEST and every day for Production, so that if DEV or TEST fails on the weekend I don't typically worry about that, but if Production fails I want to know right away.
However, if I know this doesn't need to be fixed right away I can always change the job's schedule start date to a later date. Another thing I could've done is to check if the alert was already disabled and if it was, send an email that the status was changed.
EXEC msdb.dbo.sp_update_alert @name = N'Replication: agent failure', @enabled = 1 ; EXEC msdb.dbo.sp_update_alert @name = N'Replication: agent retry', @enabled = 1 ; |
Check Continuous Replication Distributor Agents (Click here to download Script)
For continuous replication agents, once in awhile I see someone stopped the agent for maintenance reasons and forgot to turn it back on or it may have failed for some reason. So, I wrote a script to get all the continuous distribution jobs and if the job is not running I get an email notification.
DECLARE @is_sysadmin INT DECLARE @job_owner sysname DECLARE @job_id uniqueidentifier DECLARE @job_name sysname DECLARE @running int DECLARE @cnt int DECLARE @msg varchar(8000) DECLARE @msg_header varchar(4000) DECLARE @categoryid int SELECT @job_owner = SUSER_SNAME() ,@is_sysadmin = 1 ,@running = 0 ,@categoryid = 10 -- Distributor jobs CREATE TABLE #jobStatus (job_id UNIQUEIDENTIFIER NOT NULL, last_run_date INT , last_run_time INT , next_run_date INT , next_run_time INT , next_run_schedule_id INT , requested_to_run INT , request_source INT , request_source_id sysname COLLATE database_default NULL, running int , current_step INT , current_retry_attempt INT , job_state INT) INSERT INTO #jobStatus EXECUTE master.dbo.xp_sqlagent_enum_jobs @is_sysadmin, @job_owner--, @job_id -- select j.name, js.command, jss.running from msdb.dbo.sysjobsteps js join msdb.dbo.sysjobs j on js.job_id = j.job_id join #jobStatus jss on js.job_id = jss.job_id where step_id = 2 and subsystem = 'Distribution' and command like '%-Continuous' and jss.running <> 1 -- Not running |
I have made this a store procedure called replDistributorStatusGet. You can create the proc and schedule it to run every hour or so (depending on your environment) and get email notifications if the job is not running.
Check Replication LogReader Agents (Click here to download Script)
In the middle of troubleshooting, I sometimes found the LogReader Agent has stopped. It can be that someone stopped it and forgot to restart it or it stopped unexpectedly, but I didn't get an alert. So I created a job that runs once an hour and checks the Log Reader Agents to see if they are running.
The basic idea is that I want to make sure the log reader is either In Progress or Idle at all times.
select la.name,la.publisher_db, case lh.runstatus when 1 then 'Start' when 2 then 'Succeed' when 3 then 'In progress' when 4 then 'Idle' when 5 then 'Retry' when 6 then 'Fail' else 'Unknown' end as runstatus , lh.time, lh.comments from distribution..MSlogreader_history lh inner join distribution..MSlogreader_agents la on lh.agent_id = la.id inner join ( select lh.agent_id, max(lh.time) as LastTime from distribution..MSlogreader_history lh inner join distribution..MSlogreader_agents la on lh.agent_id = la.id group by lh.agent_id) r on r.agent_id = lh.agent_id and r.LastTime = lh.time where lh.runstatus not in (3,4) -- 3:In Progress, 4: Idle |
I made this a stored procedure called replLogReaderStatusGet and my colleague Roumen Raditchkov added an extra check to see if the job itself is running or not. This was due performance issues we had and some of the logging was set to Non Logged for some of the log reader agents.
Next Steps
- I tried to made these scripts as simple as possible to do the minimum work and to give you an idea of what can be done, but if you are really into this, you can create a table driven approach and write a script/application to put a specific server and/or database(s) in maintenance mode.
- It is a good idea to create a central place to configure all the alerts and maintenance tasks, so you get less alerts and can also have a central location for history
- Download all of the scriptsand enhance your replication configuration
- Read additional tips on replication
- Read more tips from this author