Automatic scaling of Cassandra clusters

Baakind, Tor Andreas

Master thesis

View/Open

Baakind-Master.pdf (2.164Mb)

Year

2013

Abstract

The purpose of this thesis is to create an automatic scaling implementation for Cassandra clusters. The automatic scaler should never lower the overall performance of the cluster in a way that results in a bad user experience. It should also be able to successfully scale up and down nodes, and the cluster should continue as if nothing happened. Last but not least, it is desirable that the automatic scaler performs equally, or better than, the person who is in charge of administrating the database.

In this thesis we have developed an early version of an autoscaler that may run alongside a Cassandra instance. The implementation is split into two separate implementations: a master-, and an agent-implementation. The master will be deployed to the same server as the application using the cluster, even though this is not required. The agent implementation will be deployed to, and run alongside, all nodes that are a part of the cluster. The agent will monitor the node`s resource usage, and send messages back to the master if the usage increases above, or decreases below certain thresholds.

We performed a set of test cases to prove that the implementation works as intended. The test cases recorded the nodes resource-usage to determine the impact our implementation makes to the overall performance.