Fork me on GitHub

Even simpler scalability with Akka through RegistryActor

Let's imagine, your system is deployed on one node, where you running actors that literally draining out the resources. You want to add some more nodes running actors of the same type, to balance the load across the cluster. Ideally, when you add a new node to existing infrastructure you neither have to create proxies to remote actors on the original node manually, nor change configuration of any node to let it know that an additional element was added. In other words, when any node is added to the system, all other nodes should know automatically, what actors does the new node have, and use them as if they were standard local actors.

Akka's Actor Registry

Two things you find yourself [re-]implementing, when you're tied to standard Scala Actors (some production environments are still have to use Java 5, which is unfortunately unsupported by Akka) and making a bet on concurrency through actors, are Supervision Tree (from Erlang, related to Supervisors in Akka) and Registry of Actors.

Though Actor Registry in Akka on the background has a fairly concise implementation (a kind of a smart singleton wrapper around concurrent HashMaps that keeps references to all the actors running on the node) it's a powerful abstraction that's hard to survive without, when you're using Actors in the real-world. E.g. registry significantly simplifies building of load balancers, as long as you no longer should specify explicitly the workers to share the load, but rather the balancer itself looks up for the actors by type or ID on the start-up or during the lifetime in the registry.

The only thing that Akka actor registry lacks as of now is the interface to access it remotely. Adding such an interface makes solving the problem stated above a no-brainer.

Registry Actor

Living in a world of Actors, the first idea you have, when you need to create a remote interface to something, is to create an actor accessible remotely (aka RemoteActor). To a first approximation, there should be an actor that handles messages with the links to the actors on remote nodes, creating proxies and registering them in the local actor registry:

As a prerequisite for future extension, there should also be a map of references to the actor registries running on other nodes (and the way to add and exchange links to registries in runtime). When a registry actor receives and resolves new reference to another registry actor, it sends back the link to self, and all other known registries (so that both registry actors have the same consistent sets of links):

Registering Actors on startup

The second step towards resolving the problem is automatically registering local actors at the remote registry, when when the link to it is added:

Every new node should initially know about at least one node running on the cluster (neighboring node):

Thus, when a new actor registry lets know the "neighbor" about itself, it starts a chain reaction of all other actor registries populating references of their local actors to the new registry and vice versa, so that all the registries at the end are aware of all the actors running in the cluster (and accessing them either through local interface or though a proxy (RemoteActorRef)).

Registering Actors started during the life-time

Akka's ActorRegistry has a simple notification mechanism that allows to handle events raised when an actor is registered/unregistered from the system (by default, all actors (except for RemoteActorRefs, in Akka 1.0-M1) register themselves in the registry on start/shutdown). It can be used to populate links to the new actors across the system:

Using Actor Registry

Now when we automatically get references to all the actors in the cluster, we can create a balancer that will distribute messages across actors of the same type:

Problem Solved

Assume there's a node with 3 actors of type `SimpleActor` running:

This node knows nothing about the infrastructure of cluster in future, and at the moment it only runs remote API to the registry - RegistryActor. Say, we want to use actors (3 instances of `SimpleActor`) running on the node #1, to share the load on `SimpleActor` actors running on the node #2. Node #2 has the same definition as node #1 (for the only difference that node#1 is explicitly configured as a neighboring host).

Let's see, if the messages sent to the balancer are distributed between local and remote actors:

The test runs fine. which means that more than 3 actors running locally were involved, and therefore, remote actors registered locally were used:

Reinventing the wheel

As it was once mentioned in the Akka mail lists, one day ActorRegistry will have remote interface out-of-box. Until that time, you'll have to end up with your own solution, or use experimental support of JCluster that generally targets the same problem, but uses a different approach.

The code of the RegistryActor is available at GitHub. It will change over time, when I'll be using it in production.