复制解决了单点问题,满足了故障恢复和负载均衡的需求,也是稍后Redis Sentinel和Cluster的基础。
配置复制,主要是SLAVEOF命令的使用,其可以建立复制关系,断开复制关系,和切换主节点。
127.0.0.1:6379> help SLAVEOF
SLAVEOF host port
summary: Make the server a slave of another instance, or promote it as master
复制的过程中,数据同步是影响整个复制是否高效的关键环节。数据同步,分为全量复制,和部分复制。
全量复制,一般用于初次复制场景,它会把主节点全部数据(RDB快照)发送给丛节点。
部分复制,用于处理复制中因网络闪断等原因造成的数据丢失场景,当从节点再次连接上主节点后,若条件允许,主节点会补发丢失的数据给从节点。
全量复制,在日志中体现如下:
110153:S 19 Jul 00:44:11.946 # MASTER timeout: no data nor PING received...
110153:S 19 Jul 00:44:11.946 # Connection with master lost.
110153:S 19 Jul 00:44:11.946 * Caching the disconnected master state.
110153:S 19 Jul 00:44:11.946 * Connecting to MASTER 127.0.0.1:6379
110153:S 19 Jul 00:44:11.946 * MASTER <-> SLAVE sync started
110153:S 19 Jul 00:44:11.946 * Non blocking connect for SYNC fired the event.
110153:S 19 Jul 00:44:11.955 * Master replied to PING, replication can continue...
110153:S 19 Jul 00:44:11.956 * Trying a partial resynchronization (request 49ff7b0c8271d78a1eef1e366513e978e412b959:254183).
110153:S 19 Jul 00:44:11.958 * Full resync from master: 49ff7b0c8271d78a1eef1e366513e978e412b959:2342950
110153:S 19 Jul 00:44:11.958 * Discarding previously cached master state.
110153:S 19 Jul 00:44:12.149 * MASTER <-> SLAVE sync: receiving 6801810 bytes from master
110153:S 19 Jul 00:44:12.175 * MASTER <-> SLAVE sync: Flushing old data
110153:S 19 Jul 00:44:12.203 * MASTER <-> SLAVE sync: Loading DB in memory
110153:S 19 Jul 00:44:12.284 * MASTER <-> SLAVE sync: Finished with success
部分复制,在日志中体现如下:
110153:S 19 Jul 00:38:40.437 # MASTER timeout: no data nor PING received...
110153:S 19 Jul 00:38:40.438 # Connection with master lost.
110153:S 19 Jul 00:38:40.438 * Caching the disconnected master state.
110153:S 19 Jul 00:38:40.438 * Connecting to MASTER 127.0.0.1:6379
110153:S 19 Jul 00:38:40.440 * MASTER <-> SLAVE sync started
110153:S 19 Jul 00:38:40.441 * Non blocking connect for SYNC fired the event.
110153:S 19 Jul 00:38:40.442 * Master replied to PING, replication can continue...
110153:S 19 Jul 00:38:40.442 * Trying a partial resynchronization (request 49ff7b0c8271d78a1eef1e366513e978e412b959:2312).
110153:S 19 Jul 00:38:40.442 * Successful partial resynchronization with master.
110153:S 19 Jul 00:38:40.442 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
部分复制,也就是psync命令运行的过程(其是对早期Redis只支持全量复制的优化),要满足3个条件。
主从节点各自复制偏移量。
主节点复制积压缓冲区。
主节点运行ID。
1. 复制偏移量
主节点处理完写入命令后,会把命令的字节长度做累加记录。从节点会每秒钟上报自身的复制偏移量给主节点。两者体现在info replication的统计信息中。
127.0.0.1:6379> info replication
# Replication
role:master
...
slave0:ip=127.0.0.1,port=6389,state=online,offset=4703828,lag=1
master_repl_offset:4703828
通过对比主从节点的复制偏移量(master_repl_offset-slave0:offset),可以判断数据是否一致。
2. 复制积压缓冲区
其为保存在主节点上的一个固定长度队列,默认大小1MB(通过repl-backlog-size调整),当主节点有连接的从节点时被创建,主节点响应写命令时,会把命令发送给从节点,还会写入复制积压缓冲区。由于缓冲区本质上是先进先出的定长队列,能实现保存最近已复制数据的功能,用于部分复制和复制命令丢失的数据补救。该缓冲区的统计信息如下,据此可算出复制积压缓冲区内可用偏移量的范围:[repl_backlog_first_byte_offset, repl_backlog_first_byte_offset+repl_backlog_histlen]。
127.0.0.1:6379> info replication
# Replication
role:master
...
repl_backlog_active:1
repl_backlog_size:67108864 /* Backlog circular buffer size */
repl_backlog_first_byte_offset:2 /* Replication offset of first byte in the backlog buffer. */
repl_backlog_histlen:4704779 /* Backlog actual data length */
由于该缓冲区大小有限,就有可能复制数据没来得及传送的从节点,就被挤出了,此时只能触发全量复制了。
3. 主节点运行ID
Redis节点启动后,会动态分配一个40位的十六进制字符串作为运行ID,其主要用来唯一标识Redis节点。从节点保存主节点的运行ID,表明自己正在复制哪个主节点。Redis节点关闭再启动后,运行ID会随之变化,从节点将做全量复制。
最后看下复制参数
masterauth abcdefg
requirepass abcdefg
slave-read-only yes
repl-disable-tcp-nodelay no
repl-backlog-size 64mb
repl-ping-slave-period 10
repl-timeout 60
repl-backlog-ttl 3600
slave-serve-stale-data yes
#min-slaves-to-write 0
#min-slaves-max-lag 10
若感兴趣可关注订阅号”数据库最佳实践”(DBBestPractice).