昨天一台SQL Server 2008R2的数据库在凌晨5点多抛出下面告警信息:
The log scan number (620023:3702:1) passed to log scan in database 'xxxx' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.
乍一看,还以为数据库损坏了(data corruption),但是在做完DBCC CHECKDB后,发现其实数据库其实是完好无损的。那么肯定是跟Replication有关。但是在搜索了相关资料,仅仅在The process could not execute ‘sp_repldone/sp_replcounters” 这篇博客中找到了类似错误的资料:
Common Causes
- The last LSN in Transaction Log is less than what the LSN Log Reader is trying to find. An old backup may have been restored on top of Published Database. After the restore, the new Transaction Log doesn't contain the data now distributor & subscriber(s) have.
- Database corruption.
How to fix this
- Ensure database consistency by running DBCC CHECKDB on the database.
- If an old backup was restored on top of published database then use sp_replrestart
- If going back to the most recent transaction log backup is not an option then execute sp_replrestart on publisher in published database. This stored procedure is used when the highest log sequence number (LSN) value at the Distributor does match the highest LSN value at the Publisher.
- This stored procedure will insert compensating LSNs (No Operation) in the publisher database log file till one the compensating LSN becomes more than the highest distributed LSN in distribution database for this published database. After this it inserts this new high LSN in the msrepl_transactions table in the distribution database and executes sp_repldone on published database to update the internal structures to mark a new starting point for log reader agent.
- Ensure that the log reader agent is stopped and there is no incoming transactions on the published database, when this SP is executed.
- Since transactions may have been lost, we recommend to reinitialize the subscriber(s) and/or recreate publication/subscription(s). For large databases consider using “Initialize from Backup” as discussed in SQL Book Online.
但是在这个案例当中, 数据库既没有损坏,也没有还原过。 只能是Replication出现了错误,但是在SQL Server的Replication中又没有找到相关错误信息,本身这个是AWS的DMS自动生成的Replication,很多内部信息不太清楚(例如,是否出现异常),官方也没有找到很详细的介绍这个错误的相关资料。在此记录一下。
参考资料:
https://blogs.msdn.microsoft.com/repltalk/2010/02/19/the-process-could-not-execute-sp_repldonesp_replcounters/