We can identify a cluster and crash it with the following command:
$ redis-cli -p 7000 cluster nodes | grep master 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385482984082 0 connected 5960-10921 2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 master - 0 1385482983582 0 connected 11423-16383 97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422
Ok, so 7000, 7001, and 7002 are masters. Let's crash node 7002 with the DEBUG SEGFAULT command:
$ redis-cli -p 7002 debug segfault Error: Server closed the connection
Now we can look at the output of the consistency test to see what it reported.
18849 R (0 err) | 18849 W (0 err) | 23151 R (0 err) | 23151 W (0 err) | 27302 R (0 err) | 27302 W (0 err) | ... many error warnings here ... 29659 R (578 err) | 29660 W (577 err) | 33749 R (578 err) | 33750 W (577 err) | 37918 R (578 err) | 37919 W (577 err) | 42077 R (578 err) | 42078 W (577 err) |
As you can see during the failover the system was not able to accept 578 reads and 577 writes, however no inconsistency was created in the database. This may sound unexpected as in the first part of this tutorial we stated that Redis Cluster can lost writes during the failover because it uses asynchronous replication. What we did not said is that this is not very likely to happen because Redis sends the reply to the client, and the commands to replicate to the slaves, about at the same time, so there is a very small window to lose data. However the fact that it is hard to trigger does not mean that it is impossible, so this does not change the consistency guarantees provided by Redis cluster.
We can now check what is the cluster setup after the failover (note that in the meantime I restarted the crashed instance so that it rejoins the cluster as a slave):
$ redis-cli -p 7000 cluster nodes 3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385503418521 0 connected a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385503419023 0 connected 97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385503419023 3 connected 11423-16383 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385503417005 0 connected 5960-10921 2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385503418016 3 connected
Now the masters are running on ports 7000, 7001 and 7005. What was previously a master, that is the Redis instance running on port 7002, is now a slave of 7005.
The output of the CLUSTER NODES command may look intimidating, but it is actually pretty simple, and is composed of the following tokens:
Node ID ip:port flags: master, slave, myself, fail, ... if it is a slave, the Node ID of the master Time of the last pending PING still waiting for a reply. Time of the last PONG received. Configuration epoch for this node (see the Cluster specification). Status of the link to this node. Slots served...
*Manual failover
Sometimes it is useful to force a failover without actually causing any problem on a master.