Preparing for Termination of Instances

In this lesson, we will set up a new ConfigMap, create a namespace, define a ServiceAccount, and also explore a CronJob that we will later use for the experiment.

Now we have metrics stored in Prometheus, and we can visualize them using Grafana and Kiali. We should be ready to create even more mayhem than before.

What can we do?#

Let’s not introduce anything drastically new. Let’s destroy a Pod. Now you might say, “Hey, I already know how to destroy a Pod. You showed me that.” If that’s what you’re thinking, you’re right. Nevertheless, we are going to terminate, but, this time, we are not going to target a specific app. We are going to destroy a completely random Pod in the go-demo-8 Namespace.

You will not see a significant difference between destroying a Pod of the go-demo-8 application and destroying a completely random Pod from the go-demo-8 Namespace. We are not running much in that Namespace right now. But, in a real-world situation, you would have tens, or even hundreds of applications running in, let’s say, the production Namespace, In such a case, destroying a Pod selected randomly among many applications could have quite unpredictable effects. But, we don’t have that many. We have only three apps (repeater, go-demo-8, and MongoDB). Still, even with only those three, randomizing which Pod will be terminated might result in unexpected results.

We’ll leave speculations for some other time, and we’ll go ahead and destroy a random Pod.

Inspecting the ConfigMap defined in experiments-any-pod.yaml#

As always, we’ll start by taking a quick look at the definition we’re going to apply.

The output is as follows.

---

apiVersionv1
kindConfigMap
metadata:
  namechaostoolkit-experiments
data:
  health-instances.yaml: |
    version: 1.0.0
    title: What happens if we terminate an instance?
    description: Everything should continue working as if nothing happened
    tags:
    - k8s
    - pod
    - deployment
    steady-state-hypothesis:
      titleThe app is healthy
      probes:
      - nameall-apps-are-healthy
        typeprobe
        tolerancetrue
        provider:
          typepython
          funcall_microservices_healthy
          modulechaosk8s.probes
          arguments:
            nsgo-demo-8
    method:
    - typeaction
      nameterminate-app-pod
      provider:
        typepython
        modulechaosk8s.pod.actions
        functerminate_pods
        arguments:
          randtrue
          nsgo-demo-8
      pauses
        after10

That ConfigMap is very similar to those we used in the previous section. It defines a single experiment that will be available to processes inside the cluster. The hypothesis of the experiment is the same old one that validates whether all the applications in the go-demo-8 Namespace are healthy. Then, we have the method that will destroy a random Pod inside the go-demo-8 Namespace, and pause for 10 minutes. What matters is that, this time, we are not limiting the Pod selector to a specific application. Any Pod in that Namespace will be eligible for termination.

Creating a Namespace#

Before we apply that definition, we’ll need to create a Namespace. Since we are going to destroy a random Pod from the go-demo-8 Namespace, it probably wouldn’t be a good idea to run our experiments there. We’d risk terminating a Pod of the experiment. Also, if we are exploring how to make the destruction more randomized, we might choose to run experiments across more than one Namespace. So, we are going to create a new Namespace that will be dedicated to chaos experiments.

Applying the new ConfigMap#

Now we can apply the definition with the ConfigMap that has the experiment we want to run.

Remember that experiment validates whether all the applications in the go-demo-8 Namespace are running correctly and that the method will terminate a random Pod from that namespace. We’re not choosing from which Deployment or StatefulSet that Pod should come from. Also, please note that the verification relies on Kubernetes health checks, which are not very reliable.

Inspecting the definition of ServiceAccount in sa-cluster.yaml#

We want to run an experiment that will perform actions inside a different Namespace. We want to separate the experiments from the resources manipulated through actions. For that, we’ll need to define a ServiceAccount that will be slightly different from the one we used before.

The output is as follows.

---

apiVersionv1
kindServiceAccount
metadata:
  namechaostoolkit

---

apiVersionrbac.authorization.k8s.io/v1beta1
kindClusterRoleBinding
metadata:
  namechaostoolkit
roleRef:
  apiGrouprbac.authorization.k8s.io
  kindClusterRole
  namecluster-admin
subjects:
  - kindServiceAccount
    namechaostoolkit
    namespacechaos

This time, we’re using ClusterRoleBinding instead of RoleBinding. In the past, our experiments were running in the same Namespace as the applications that were targeted. As a result, we could use RoleBinding, which is namespaced. But now we want to be able to run the experiments in the chaos Namespace and allow them to execute some actions on resources in other Namespaces. By binding the ServiceAccount to the ClusterRoleBinding, we’re defining cluster-wide permissions. In turn, that binding is using pre-defined ClusterRole called cluster-admin, which is available in every Kubernetes distribution (that I know of). It should be easy to guess the level of permissions a role called cluster-admin provides.

All in all, with that ServiceAccount, we’ll be able to do almost anything anywhere inside of the cluster.

Applying the ServiceAccount definition#

Let’s apply the definition so that the ServiceAccount is available for our future experiments.

Inspecting the CronJob defined in periodic-fast.yaml#

The last thing we need is a definition of a CronJob that will run our experiments.

---

apiVersionbatch/v1beta1
kindCronJob
metadata:
  namehealth-instances-chaos
spec:
  concurrencyPolicyForbid
  schedule"*/2 * * * *"
  jobTemplate:
    metadata:
      labels:
        apphealth-instances-chaos
    spec:
      activeDeadlineSeconds600
      backoffLimit0
      template:
        metadata:
          labels:
            apphealth-instances-chaos
          annotations:
            sidecar.istio.io/inject"false"
        spec:
          serviceAccountNamechaostoolkit
          restartPolicyNever
          containers:
          - namechaostoolkit
            imagevfarcic/chaostoolkit:1.4.1
            args:
            - --verbose
            - run
            - --journal-path
            - /results/health-instances.json
            - /experiment/health-instances.yaml
            env:
            - nameCHAOSTOOLKIT_IN_POD
              value"true"
            volumeMounts:
            - nameexperiments
              mountPath/experiment
              readOnlytrue
            - nameresults
              mountPath/results
              readOnlyfalse
            resources:
              limits:
                cpu20m
                memory64Mi
              requests:
                cpu20m
                memory64Mi
          volumes:
          - nameexperiments
            configMap:
              namechaostoolkit-experiments
          - nameresults
            persistentVolumeClaim:
              claimNamechaos

---

kindPersistentVolumeClaim
apiVersionv1
metadata:
  namechaos
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage1Gi

That CronJob is very similar to the one we used in the previous section. However, this time, it’ll be scheduled to run every two minutes since I wanted to save you even more from waiting for the outcomes. The only other significant difference is that it’ll run a different experiment. It’ll periodically execute health-instances.yaml, which is defined in the ConfigMap we created earlier.


In the next lesson, we will terminate a random application instance and observe the outcome.

Exploring Kiali Dashboards
Terminating Random Application Instances
Mark as Completed
Report an Issue