Akka Clustering — Seed Need Joining Process
Akka cluster is a technique used to build scalable , resilient and distributed applications. A cluster is composed of nodes and the applications spans across these nodes.
In practise it is the ActorSystem that resides on each node and they talk to each other using the gossip protocol. There are three main concepts associated to an akka cluster :
node A logical member of a cluster. There could be multiple nodes on a physical machine. Defined by a hostname:port:uid tuple.
cluster A set of nodes joined together through the membership service.
leader A single node in the cluster that acts as the leader. Managing cluster convergence and membership state transitions.
A sample application.conf for an application using akka cluster :
akka {
actor {
provider = "cluster"
} remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 0
}
}
cluster {
seed-nodes = [
"akka.tcp://ClusterSystem@127.0.0.1:2551",
"akka.tcp://ClusterSystem@127.0.0.1:2552"
]
seed-node-timeout = 5s
}
}
Seed nodes are point of contacts for new nodes that want to join the cluster. After booting up, a node would send messages to the seed nodes. The message is to inform them about its status and readiness to join the cluster.
Lets look in detail the messages that are used in this process.
- InitJoin : This message is sent by the new node to all the seed nodes (except itself). The new node sends this message to all the configured seed noes and then waits for a reply.
- InitJoinAck: This is the reply sent by existing nodes in the cluster to a InitJoin message from a new node.
Once the new node has received an InitJoinAck message from one of the seed nodes then the ack’s from other nodes are ignored.
Retry and failure scenarios
If the new node does not receive any InitJoinAck within seed-node-timeout duration then the InitJoin messages are re-sent.
These retries are done only when the new node is not marked as the first node in the seed node list. If the new node is the first seed node and there is no positive reply from the other seed nodes within this timeout it will join itself to bootstrap the cluster.
seed-nodes = [
"akka.tcp://ClusterSystem@node1:2551",
"akka.tcp://ClusterSystem@node2:2552",
"akka.tcp://ClusterSystem@node2:2553"
]
seed-node-timeout = 5s
min-nr-of-members = 1
So for a configuration like above here is what would happen if the cluster is started for the very first time and the machines are booted in order of node1 followed by node2 and finally node 3.
- node1 boots up and sends InitJoin messages to other seed nodes. Obviously it would not receive any reply. 5 seconds after sending the first InitJoin message , node1 would join itself and bootstrap the cluster. (no retries because this is the first node in the seed node list) .The cluster consists of node1 only for now and if all the nodes have some roles then its a fully working akka cluster application.
- node2 boots up and sends InitJoin messages to other seed nodes. node1 receives the message and replies with InitJoinAck. node2 joins the cluster. However if there are no replies received by node 2 within 5 seconds then the InitJoin messages are sent again.
- node3 boots up and sends InitJoin messages to other seed nodes. Both node1 and node2 reply with the ack’s and the ack is recieved by node 3. It also joins the cluster.
So, whenever there is a problem with your cluster like nodes not able to join or potential split brain issues ( :( you don’t want this ) then looks for akka logs (INFO level) that emulate these steps. The sending and receiving of these messages should give you some idea about what could be potentially wrong.
Happy clustering!!