一、redis持久化对于灾难恢复的意义
在实际使用中,redis突然挂掉了,进程死了,或者所在的机器没了,遇到了灾难性的故障,因为redis的数据存在内存中,内存中的数据就都没有了,很重要的缓存数据等等,redis会重启,重启之后要费很大的劲去恢复。
如果单单把数据放到内存中,是没有任何办法应对一些灾难性的故障的,所以redis中的持久化是很重要的。再通过定期备份数据,是可以恢复一部分数据的。
意义
在企业级的redis集群架构中,持久化的主要意义就是做灾难恢复,数据恢复。
二、Redis的RDB和AOF两种持久化机制
RDB 是 Redis 默认的持久化方案。在指定的时间间隔内,执行指定次数的写操作,则会将内存中的数据写入到磁盘中。即在指定目录下生成一个dump.rdb文件。Redis 重启会通过加载dump.rdb文件恢复数据。
AOF :Redis 默认不开启。它的出现是为了弥补RDB的不足(数据的不一致性),所以它采用日志的形式来记录每个写操作,并追加到文件中。Redis 重启的会根据日志文件的内容将写指令从前到后执行一次以完成数据的恢复工作。
1.1 RDB
RDB核心规则配置,打开redis配置文件/etc/redis/6379.conf(默认是redis.conf)
################################ SNAPSHOTTING ################################
#
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1
save 300 10
save 60 10000
# By default Redis will stop accepting writes if RDB snapshots are enabled
# (at least one save point) and the latest background save failed.
# This will make the user aware (in a hard way) that data is not persisting
# on disk properly, otherwise chances are that no one will notice and some
# disaster will happen.
#
# If the background saving process will start working again Redis will
# automatically allow writes again.
#
# However if you have setup your proper monitoring of the Redis server
# and persistence, you may want to disable this feature so that Redis will
# continue to work as usual even if there are problems with disk,
# permissions, and so forth.
stop-writes-on-bgsave-error yes
# Compress string objects using LZF when dump .rdb databases?
# For default that's set to 'yes' as it's almost always a win.
# If you want to save some CPU in the saving child set it to 'no' but
# the dataset will likely be bigger if you have compressible values or keys.
rdbcompression yes
# Since version 5 of RDB a CRC64 checksum is placed at the end of the file.
# This makes the format more resistant to corruption but there is a performance
# hit to pay (around 10%) when saving and loading RDB files, so you can disable it
# for maximum performances.
#
# RDB files created with checksum disabled have a checksum of zero that will
# tell the loading code to skip the check.
rdbchecksum yes
# The filename where to dump the DB
dbfilename dump.rdb
# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
#
# The Append Only File will also be created inside this directory.
#
# Note that you must specify a directory here, not a file name.
dir /var/redis/6379
说明
save <指定时间间隔> <执行指定次数更新操作>
官方出厂配置默认是 900秒内有1个更改,300秒内有10个更改以及60秒内有10000个更改,则将内存中的数据快照写入磁盘。
若不想用RDB方案,可以把 save "" 的注释打开
dbfilename dump.rdb
指定本地数据库文件名,一般采用默认的 dump.rdb
dir /var/redis/6379
指定本地数据库存储目录
RDB持久化机制优缺点
优点
(1)RDB会生成多个数据文件,每个数据文件都代表了某一个时刻中redis的数据,这种多个数据文件的方式,非常适合做冷备,可以将这种完整的数据文件发送到一些远程的安全存储上去,比如说云服务上,以预定好的备份策略来定期备份redis中的数据
(2)RDB对redis对外提供的读写服务,影响非常小,可以让redis保持高性能,因为redis主进程只需要fork一个子进程,让子进程执行磁盘IO操作来进行RDB持久化即可
(3)相对于AOF持久化机制来说,直接基于RDB数据文件来重启和恢复redis进程,更加快速。
(4)适合大规模的数据恢复。
缺点
(1)数据的完整性和一致性不高,因为RDB可能在最后一次备份时宕机了。
(2)备份时占用内存,因为Redis 在备份时会独立创建一个子进程,将数据写入到一个临时文件(此时内存中的数据是原来的两倍哦),最后再将临时文件替换之前的备份文件。
1.2 AOF
AOF :Redis 默认不开启。它的出现是为了弥补RDB的不足(数据的不一致性),所以它采用日志的形式来记录每个写操作,并追加到文件中。Redis 重启的会根据日志文件的内容将写指令从前到后执行一次以完成数据的恢复工作。
AOF 核心规则配置,打开redis配置文件/etc/redis/6379.conf(默认是redis.conf)
############################## APPEND ONLY MODE ############################### # By default Redis asynchronously dumps the dataset on disk. This mode is # good enough in many applications, but an issue with the Redis process or # a power outage may result into a few minutes of writes lost (depending on # the configured save points). # # The Append Only File is an alternative persistence mode that provides # much better durability. For instance using the default data fsync policy # (see later in the config file) Redis can lose just one second of writes in a # dramatic event like a server power outage, or a single write if something # wrong with the Redis process itself happens, but the operating system is # still running correctly. # # AOF and RDB persistence can be enabled at the same time without problems. # If the AOF is enabled on startup Redis will load the AOF, that is the file # with the better durability guarantees. # # Please check http://redis.io/topics/persistence for more information. appendonly no # The name of the append only file (default: "appendonly.aof") appendfilename "appendonly.aof" # The fsync() call tells the Operating System to actually write data on disk # instead of waiting for more data in the output buffer. Some OS will really flush # data on disk, some other OS will just try to do it ASAP. # # Redis supports three different modes: # # no: don't fsync, just let the OS flush the data when it wants. Faster. # always: fsync after every write to the append only log. Slow, Safest. # everysec: fsync only one time every second. Compromise. # # The default is "everysec", as that's usually the right compromise between # speed and data safety. It's up to you to understand if you can relax this to # "no" that will let the operating system flush the output buffer when # it wants, for better performances (but if you can live with the idea of # some data loss consider the default persistence mode that's snapshotting), # or on the contrary, use "always" that's very slow but a bit safer than # everysec. # # More details please check the following article: # http://antirez.com/post/redis-persistence-demystified.html # # If unsure, use "everysec". # appendfsync always appendfsync everysec # appendfsync no # When the AOF fsync policy is set to always or everysec, and a background # saving process (a background save or AOF log background rewriting) is # performing a lot of I/O against the disk, in some Linux configurations # Redis may block too long on the fsync() call. Note that there is no fix for # this currently, as even performing fsync in a different thread will block # our synchronous write(2) call. # # In order to mitigate this problem it's possible to use the following option # that will prevent fsync() from being called in the main process while a # BGSAVE or BGREWRITEAOF is in progress. # # This means that while another child is saving, the durability of Redis is # the same as "appendfsync none". In practical terms, this means that it is # possible to lose up to 30 seconds of log in the worst scenario (with the # default Linux settings). # # If you have latency problems turn this to "yes". Otherwise leave it as # "no" that is the safest pick from the point of view of durability. no-appendfsync-on-rewrite no # Automatic rewrite of the append only file. # Redis is able to automatically rewrite the log file implicitly calling # BGREWRITEAOF when the AOF log size grows by the specified percentage. # # This is how it works: Redis remembers the size of the AOF file after the # latest rewrite (if no rewrite has happened since the restart, the size of # the AOF at startup is used). # # This base size is compared to the current size. If the current size is # bigger than the specified percentage, the rewrite is triggered. Also # you need to specify a minimal size for the AOF file to be rewritten, this # is useful to avoid rewriting the AOF file even if the percentage increase # is reached but it is still pretty small. # # Specify a percentage of zero in order to disable the automatic AOF # rewrite feature. auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb # An AOF file may be found to be truncated at the end during the Redis # startup process, when the AOF data gets loaded back into memory. # This may happen when the system where Redis is running # crashes, especially when an ext4 filesystem is mounted without the # data=ordered option (however this can't happen when Redis itself # crashes or aborts but the operating system still works correctly). # # Redis can either exit with an error when this happens, or load as much # data as possible (the default now) and start if the AOF file is found # to be truncated at the end. The following option controls this behavior. # # If aof-load-truncated is set to yes, a truncated AOF file is loaded and # the Redis server starts emitting a log to inform the user of the event. # Otherwise if the option is set to no, the server aborts with an error # and refuses to start. When the option is set to no, the user requires # to fix the AOF file using the "redis-check-aof" utility before to restart # the server. # # Note that if the AOF file will be found to be corrupted in the middle # the server will still exit with an error. This option only applies when # Redis will try to read more data from the AOF file but not enough bytes # will be found. aof-load-truncated yes
AOF机制redis 默认关闭,开启需要手动把no改为yes
appendonly yes
2 指定本地数据库文件名,默认值为 appendonly.aof
appendfilename "appendonly.aof"
3 指定更新日志条件
# appendfsync always
appendfsync everysec
# appendfsync no
说明:
always:同步持久化,每次发生数据变化会立刻写入到磁盘中。性能较差当数据完整性比较好(慢,安全)
everysec:出厂默认推荐,每秒异步记录一次(默认值)
no:不同步
4 配置重写触发机制
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
解说:当AOF文件大小是上次rewrite后大小的一倍且文件大于64M时触发。一般都设置为3G,64M太小了。
AOF持久化机制的优缺点
优点
(1)AOF可以更好的保护数据不丢失,一般AOF会每隔1秒,通过一个后台线程执行一次fsync操作,最多丢失1秒钟的数据
(2)AOF日志文件以append-only模式写入,所以没有任何磁盘寻址的开销,写入性能非常高,而且文件不容易破损,即使文件尾部破损,也很容易修复
(3)AOF日志文件即使过大的时候,出现后台重写操作,也不会影响客户端的读写。因为在rewrite log的时候,会对其中的指导进行压缩,创建出一份需要恢复数据的最小日志出来。再创建新日志文件的时候,老的日志文件还是照常写入。当新的merge后的日志文件ready的时候,再交换新老日志文件即可。
(4)AOF日志文件的命令通过非常可读的方式进行记录,这个特性非常适合做灾难性的误删除的紧急恢复。比如某人不小心用flushall命令清空了所有数据,只要这个时候后台rewrite还没有发生,那么就可以立即拷贝AOF文件,将最后一条flushall命令给删了,然后再将该AOF文件放回去,就可以通过恢复机制,自动恢复所有数据
缺点
(1)对于同一份数据来说,AOF日志文件通常比RDB数据快照文件更大
(2)AOF开启后,支持的写QPS会比RDB支持的写QPS低,因为AOF一般会配置成每秒fsync一次日志文件,当然,每秒一次fsync,性能也还是很高的
(3)以前AOF发生过bug,就是通过AOF记录的日志,进行数据恢复的时候,没有恢复一模一样的数据出来。所以说,类似AOF这种较为复杂的基于命令日志/merge/回放的方式,比基于RDB每次持久化一份完整的数据快照文件的方式,更加脆弱一些,容易有bug。不过AOF就是为了避免rewrite过程导致的bug,因此每次rewrite并不是基于旧的指令日志进行merge的,而是基于当时内存中的数据进行指令的重新构建,这样健壮性会好很多。
三、RDB和AOF到底该如何选择
(1)不要仅仅使用RDB,因为那样会导致你丢失很多数据。
(2)也不要仅仅使用AOF,因为那样有两个问题,第一,你通过AOF做冷备,没有RDB做冷备,来的恢复速度更快; 第二,RDB每次简单粗暴生成数据快照,更加健壮,可以避免AOF这种复杂的备份和恢复机制的bug。
(3)综合使用AOF和RDB两种持久化机制,用AOF来保证数据不丢失,作为数据恢复的第一选择; 用RDB来做不同程度的冷备,在AOF文件都丢失或损坏不可用的时候,还可以使用RDB来进行快速的数据恢复。
四、持久化机制的数据恢复实验
RDB
1.redis里保存两条测试数据。
停掉redis进程,重启redis后,数据还在。
原理:redis-cli SHUTDOWN这种方式去停掉redis,其实是一种安全退出的模式,redis在退出的时候会将内存中的数据立即生成一份完整的rdb快照,
2.继续保存几条新的数据,然后通过kill -9 杀死redis进程
重启redis后,步骤1的数据还在,步骤2新加的数据已丢失。
由于没触发RDB保存机制,新增加的数据没有保存到dump.rdb文件。
3.手动增加一个save机制
save 5 1
写入几条数据,等待5秒钟,会发现自动进行了一次dump rdb快照,在dump.rdb中发现了数据
重复执行步骤2,发现数据存在,没有丢失。
AOF
AOF持久化,默认是关闭的,默认是打开RDB持久化。
appendonly yes,可以打开AOF持久化机制。
在生产环境里面,一般来说AOF都是要打开的,除非你说随便丢个几分钟的数据也无所谓。
打开AOF持久化机制之后,redis每次接收到一条写命令,就会写入日志文件中,当然是先写入os cache的,然后每隔一定时间再同步一下。
而且即使AOF和RDB都开启了,redis重启的时候,也是优先通过AOF进行数据恢复的,因为aof数据比较完整
1.先仅仅打开RDB,写入一些数据,然后kill -9杀掉redis进程,接着重启redis,发现数据没了,因为RDB快照还没生成
2.打开AOF的开关,启用AOF持久化
3.写入一些数据,观察AOF文件中的日志内容
其实你在appendonly.aof文件中,可以看到刚写的日志,它们其实就是先写入os cache的,然后1秒后才fsync到磁盘中,只有fsync到磁盘中了,才是安全的,要不然光是在os cache中,机器只要重启,就什么都没了
4.kill -9杀掉redis进程,重新启动redis进程,发现数据被恢复回来了,就是从AOF文件中恢复回来的
redis进程启动的时候,直接就会从appendonly.aof中加载所有的日志,把内存中的数据恢复回来。
AOF破损文件的修复
如果redis在append数据到AOF文件时,机器宕机了,可能会导致AOF文件破损
用redis-check-aof --fix命令来修复破损的AOF文件
其他
AOF和RDB同时工作
(1)如果RDB在执行snapshotting操作,那么redis不会执行AOF rewrite; 如果redis再执行AOF rewrite,那么就不会执行RDB snapshotting
(2)如果RDB在执行snapshotting,此时用户执行BGREWRITEAOF命令,那么等RDB快照生成之后,才会去执行AOF rewrite
(3)同时有RDB snapshot文件和AOF日志文件,那么redis重启的时候,会优先使用AOF进行数据恢复,因为其中的日志更完整