复制解决了单点问题,满足了故障恢复和负载均衡的需求,也是稍后Redis Sentinel和Cluster的基础。

配置复制,主要是SLAVEOF命令的使用,其可以建立复制关系,断开复制关系,和切换主节点。 

127.0.0.1:6379> help SLAVEOF

  SLAVEOF host port

  summary: Make the server a slave of another instance, or promote it as master

复制的过程中,数据同步是影响整个复制是否高效的关键环节。数据同步,分为全量复制,和部分复制。

全量复制,一般用于初次复制场景,它会把主节点全部数据(RDB快照)发送给丛节点。

部分复制,用于处理复制中因网络闪断等原因造成的数据丢失场景,当从节点再次连接上主节点后,若条件允许,主节点会补发丢失的数据给从节点。

全量复制,在日志中体现如下:

110153:S 19 Jul 00:44:11.946 # MASTER timeout: no data nor PING received...

110153:S 19 Jul 00:44:11.946 # Connection with master lost.

110153:S 19 Jul 00:44:11.946 * Caching the disconnected master state.

110153:S 19 Jul 00:44:11.946 * Connecting to MASTER 127.0.0.1:6379

110153:S 19 Jul 00:44:11.946 * MASTER <-> SLAVE sync started

110153:S 19 Jul 00:44:11.946 * Non blocking connect for SYNC fired the event.

110153:S 19 Jul 00:44:11.955 * Master replied to PING, replication can continue...

110153:S 19 Jul 00:44:11.956 * Trying a partial resynchronization (request 49ff7b0c8271d78a1eef1e366513e978e412b959:254183).

110153:S 19 Jul 00:44:11.958 * Full resync from master: 49ff7b0c8271d78a1eef1e366513e978e412b959:2342950

110153:S 19 Jul 00:44:11.958 * Discarding previously cached master state.

110153:S 19 Jul 00:44:12.149 * MASTER <-> SLAVE sync: receiving 6801810 bytes from master

110153:S 19 Jul 00:44:12.175 * MASTER <-> SLAVE sync: Flushing old data

110153:S 19 Jul 00:44:12.203 * MASTER <-> SLAVE sync: Loading DB in memory

110153:S 19 Jul 00:44:12.284 * MASTER <-> SLAVE sync: Finished with success

部分复制,在日志中体现如下:

110153:S 19 Jul 00:38:40.437 # MASTER timeout: no data nor PING received...

110153:S 19 Jul 00:38:40.438 # Connection with master lost.

110153:S 19 Jul 00:38:40.438 * Caching the disconnected master state.

110153:S 19 Jul 00:38:40.438 * Connecting to MASTER 127.0.0.1:6379

110153:S 19 Jul 00:38:40.440 * MASTER <-> SLAVE sync started

110153:S 19 Jul 00:38:40.441 * Non blocking connect for SYNC fired the event.

110153:S 19 Jul 00:38:40.442 * Master replied to PING, replication can continue...

110153:S 19 Jul 00:38:40.442 * Trying a partial resynchronization (request 49ff7b0c8271d78a1eef1e366513e978e412b959:2312).

110153:S 19 Jul 00:38:40.442 * Successful partial resynchronization with master.

110153:S 19 Jul 00:38:40.442 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

部分复制,也就是psync命令运行的过程(其是对早期Redis只支持全量复制的优化),要满足3个条件。

  • 主从节点各自复制偏移量。

  • 主节点复制积压缓冲区。

  • 主节点运行ID。

1. 复制偏移量

主节点处理完写入命令后,会把命令的字节长度做累加记录。从节点会每秒钟上报自身的复制偏移量给主节点。两者体现在info replication的统计信息中。

127.0.0.1:6379> info replication

# Replication

role:master

...

slave0:ip=127.0.0.1,port=6389,state=online,offset=4703828,lag=1

master_repl_offset:4703828

通过对比主从节点的复制偏移量(master_repl_offset-slave0:offset),可以判断数据是否一致。

2. 复制积压缓冲区

其为保存在主节点上的一个固定长度队列,默认大小1MB(通过repl-backlog-size调整),当主节点有连接的从节点时被创建,主节点响应写命令时,会把命令发送给从节点,还会写入复制积压缓冲区。由于缓冲区本质上是先进先出的定长队列,能实现保存最近已复制数据的功能,用于部分复制和复制命令丢失的数据补救。该缓冲区的统计信息如下,据此可算出复制积压缓冲区内可用偏移量的范围:[repl_backlog_first_byte_offset, repl_backlog_first_byte_offset+repl_backlog_histlen]。

127.0.0.1:6379> info replication

# Replication

role:master

...

repl_backlog_active:1

repl_backlog_size:67108864 /* Backlog circular buffer size */

repl_backlog_first_byte_offset:2 /* Replication offset of first byte in the backlog buffer. */

repl_backlog_histlen:4704779 /* Backlog actual data length */

由于该缓冲区大小有限,就有可能复制数据没来得及传送的从节点,就被挤出了,此时只能触发全量复制了。

3. 主节点运行ID

Redis节点启动后,会动态分配一个40位的十六进制字符串作为运行ID,其主要用来唯一标识Redis节点。从节点保存主节点的运行ID,表明自己正在复制哪个主节点。Redis节点关闭再启动后,运行ID会随之变化,从节点将做全量复制。

最后看下复制参数

masterauth abcdefg

requirepass abcdefg

slave-read-only yes

repl-disable-tcp-nodelay no

repl-backlog-size 64mb

repl-ping-slave-period 10

repl-timeout 60

repl-backlog-ttl 3600

slave-serve-stale-data yes

#min-slaves-to-write 0

#min-slaves-max-lag 10

若感兴趣可关注订阅号”数据库最佳实践”(DBBestPractice).

qrcode_for_gh_54ffa7e55478_258.jpg