Deploying datastores for IoT & Big Data: mongoDB on K8s. Part 1
This blog post series is intended to give an overview of how datastores capable of supporting high volumes of data from IoT devices and Big Data services can be deployed on Kubernetes. To start with, the StatefulSet primitive will be used to set up and deploy a mongoDB Replica Set (cluster). Part 2 demonstrates how other Kubernetes primitives such as Secret can be applied to secure our initial, dummy deployment. Part 3 of this series explains how to shard and further secure our mongoDB cluster.
It is assumed that you already have an up and running K8s environment, such as minikube. All the examples have been developed using minikube on macOS Catalina with VirtualBox.
First of all, a new, clean namespace named
datastores has to be created to develop this part.
📖 StatefulSet Primitive
A StatefulSet is a K8s Controller that manages the deployment and scaling of a set of Pods based on an identical container spec. However, conversely to what happens with Deployment Controllers, these Pods are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
🖥️ Basic Deployment of a mongoDB Replica Set
Then, we need to create a K8s headless Service intended to expose our StatefulSet, as follows:
📌 Our Service will be bound to Pods labelled as
Also it would be convenient to set up a ConfigMap to capture any configuration option needed.
📌 The name given to our replica set is:
Afterwards, we can declare our StatefulSet as follows:
📌 We need to bind (through the
serviceName) the StatefulSet with the Service that was created initially:
📌 Our StatefulSet is composed by 3 replicas that will be incarnated by 3 differentiated Pods.
📌 We run Pods labelled as
app: mongoDB-replica in mongoDB’s replica set mode (
📌 We mount a volume
mongo-volume-for-replica that will be made available through a PVC.
volumeClaimTemplates we define the template of the PVCs that will be automatically created for each Pod.
After applying the manifest shown above, the status of our K8s cluster will be similar to:
We can ping our Pods by name (as they are already bound to the Service named
mongo-db-replica) as follows:
📌 Pods pertaining to a StatefulSet are distinguishable and keep their own identity. That’s why we can address them by
📌 The identifier of a Pod pertaining to a StatefulSet is formed by concatenating the name of the StatefulSet (
mongo-db-statefulset) with a dash (
-) and order number (
We can observe that 3 different PVCs have been created to satisfy the storage demands of the 3 Pods that compose our mongoDB cluster:
📌 The name of each Pod’s PVC is formed by concatenating the name given to the volume claim template (
mongo-volume-for-replica) with a dash (
-) and the id of the Pod writing to the volume.
Configuring the mongoDB Replica Set
The next step would be to use our datastore, for instance, using the mongoDB shell client:
Afterwards it can be observed that one of our Pods will become the Primary while the rest will be just Secondary. In my deployment, the Pod
1 of the Statefulset (
mongo-db-statefulset-1) was elected as leader. Thus, we can connect to such Pod through the mongoDB shell and create a new DB, a collection and a document as follows:
If we want to check that the data is also available to be read on the Secondary replicas, we can do the following (Pod
0 and Pod
2 are my Secondary replicas):
In this case at shell start up it is executed a sentence (
🧱 Replica Set Management
Stopping the Replica Set cluster
We can stop our mongoDB datastore cluster by scaling it to
0, as follows:
Now we can check the status in our namespace
The PVCs are still there so that our data has not been lost.
Restarting the Replica Set cluster
We can restart our mongoDB replica set by scaling out to
Our Pods have come back to life. In my deployment the new leader election after scaling back resulted in Pod
2 now being the Primary and Pods
0 being Secondary.
Killing one Pod and forcing a new leader election
We can manually delete a Pod, for instance the Primary, and check that a new leader election happen and that the controller automatically restores the Pod instance of our StatefulSet.
Kubernetes provides powerful primitives to deploy a clustered mongoDB datastore service. Furthermore, we can deploy a secured and sharded mongoDB so that we can give production-grade support to IoT and Big Data Applications which demand higher scalability.