Preparing for Termination of Instances

In this lesson, we will set up a new ConfigMap, create a namespace, define a ServiceAccount, and also explore a CronJob that we will later use for the experiment.

We'll cover the following

What can we do?
Inspecting the ConfigMap defined in experiments-any-pod.yaml
Creating a Namespace
Applying the new ConfigMap
Inspecting the definition of ServiceAccount in sa-cluster.yaml
Applying the ServiceAccount definition
Inspecting the CronJob defined in periodic-fast.yaml

Now we have metrics stored in Prometheus, and we can visualize them using Grafana and Kiali. We should be ready to create even more mayhem than before.

What can we do?#

Let’s not introduce anything drastically new. Let’s destroy a Pod. Now you might say, “Hey, I already know how to destroy a Pod. You showed me that.” If that’s what you’re thinking, you’re right. Nevertheless, we are going to terminate, but, this time, we are not going to target a specific app. We are going to destroy a completely random Pod in the go-demo-8 Namespace.

You will not see a significant difference between destroying a Pod of the go-demo-8 application and destroying a completely random Pod from the go-demo-8 Namespace. We are not running much in that Namespace right now. But, in a real-world situation, you would have tens, or even hundreds of applications running in, let’s say, the production Namespace, In such a case, destroying a Pod selected randomly among many applications could have quite unpredictable effects. But, we don’t have that many. We have only three apps (repeater, go-demo-8, and MongoDB). Still, even with only those three, randomizing which Pod will be terminated might result in unexpected results.

We’ll leave speculations for some other time, and we’ll go ahead and destroy a random Pod.

Inspecting the ConfigMap defined in `experiments-any-pod.yaml`#

As always, we’ll start by taking a quick look at the definition we’re going to apply.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

---

apiVersion: v1
kind: ConfigMap
metadata:
  name: chaostoolkit-experiments
data:
  health-instances.yaml: |
    version: 1.0.0
    title: What happens if we terminate an instance?
    description: Everything should continue working as if nothing happened
    tags:
    - k8s
    - pod
    - deployment
    steady-state-hypothesis:
      title: The app is healthy
      probes:
      - name: all-apps-are-healthy
        type: probe
        tolerance: true
        provider:
          type: python
          func: all_microservices_healthy
          module: chaosk8s.probes
          arguments:
            ns: go-demo-8
    method:
    - type: action
      name: terminate-app-pod
      provider:
        type: python
        module: chaosk8s.pod.actions
        func: terminate_pods
        arguments:
          rand: true
          ns: go-demo-8
      pauses: 
        after: 10

That ConfigMap is very similar to those we used in the previous section. It defines a single experiment that will be available to processes inside the cluster. The hypothesis of the experiment is the same old one that validates whether all the applications in the go-demo-8 Namespace are healthy. Then, we have the method that will destroy a random Pod inside the go-demo-8 Namespace, and pause for 10 minutes. What matters is that, this time, we are not limiting the Pod selector to a specific application. Any Pod in that Namespace will be eligible for termination.

Creating a Namespace#

Before we apply that definition, we’ll need to create a Namespace. Since we are going to destroy a random Pod from the go-demo-8 Namespace, it probably wouldn’t be a good idea to run our experiments there. We’d risk terminating a Pod of the experiment. Also, if we are exploring how to make the destruction more randomized, we might choose to run experiments across more than one Namespace. So, we are going to create a new Namespace that will be dedicated to chaos experiments.

Enter to Rename, Shift+Enter to Preview

Applying the new ConfigMap#

Now we can apply the definition with the ConfigMap that has the experiment we want to run.

Enter to Rename, Shift+Enter to Preview

Remember that experiment validates whether all the applications in the go-demo-8 Namespace are running correctly and that the method will terminate a random Pod from that namespace. We’re not choosing from which Deployment or StatefulSet that Pod should come from. Also, please note that the verification relies on Kubernetes health checks, which are not very reliable.

Inspecting the definition of ServiceAccount in `sa-cluster.yaml`#

We want to run an experiment that will perform actions inside a different Namespace. We want to separate the experiments from the resources manipulated through actions. For that, we’ll need to define a ServiceAccount that will be slightly different from the one we used before.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: chaostoolkit

---

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: chaostoolkit
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: chaostoolkit
    namespace: chaos

This time, we’re using ClusterRoleBinding instead of RoleBinding. In the past, our experiments were running in the same Namespace as the applications that were targeted. As a result, we could use RoleBinding, which is namespaced. But now we want to be able to run the experiments in the chaos Namespace and allow them to execute some actions on resources in other Namespaces. By binding the ServiceAccount to the ClusterRoleBinding, we’re defining cluster-wide permissions. In turn, that binding is using pre-defined ClusterRole called cluster-admin, which is available in every Kubernetes distribution (that I know of). It should be easy to guess the level of permissions a role called cluster-admin provides.

All in all, with that ServiceAccount, we’ll be able to do almost anything anywhere inside of the cluster.

Applying the ServiceAccount definition#

Let’s apply the definition so that the ServiceAccount is available for our future experiments.

Enter to Rename, Shift+Enter to Preview

Inspecting the CronJob defined in `periodic-fast.yaml`#

The last thing we need is a definition of a CronJob that will run our experiments.

Enter to Rename, Shift+Enter to Preview

---

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: health-instances-chaos
spec:
  concurrencyPolicy: Forbid
  schedule: "*/2 * * * *"
  jobTemplate:
    metadata:
      labels:
        app: health-instances-chaos
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 0
      template:
        metadata:
          labels:
            app: health-instances-chaos
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          serviceAccountName: chaostoolkit
          restartPolicy: Never
          containers:
          - name: chaostoolkit
            image: vfarcic/chaostoolkit:1.4.1
            args:
            - --verbose
            - run
            - --journal-path
            - /results/health-instances.json
            - /experiment/health-instances.yaml
            env:
            - name: CHAOSTOOLKIT_IN_POD
              value: "true"
            volumeMounts:
            - name: experiments
              mountPath: /experiment
              readOnly: true
            - name: results
              mountPath: /results
              readOnly: false
            resources:
              limits:
                cpu: 20m
                memory: 64Mi
              requests:
                cpu: 20m
                memory: 64Mi
          volumes:
          - name: experiments
            configMap:
              name: chaostoolkit-experiments
          - name: results
            persistentVolumeClaim:
              claimName: chaos

---

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: chaos
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

That CronJob is very similar to the one we used in the previous section. However, this time, it’ll be scheduled to run every two minutes since I wanted to save you even more from waiting for the outcomes. The only other significant difference is that it’ll run a different experiment. It’ll periodically execute health-instances.yaml, which is defined in the ConfigMap we created earlier.

In the next lesson, we will terminate a random application instance and observe the outcome.

Exploring Kiali Dashboards

Terminating Random Application Instances

Mark as Completed

Report an Issue

Before We Begin

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

Conclusion

Preparing for Termination of Instances

What can we do?#

Inspecting the ConfigMap defined in `experiments-any-pod.yaml`#

Creating a Namespace#

Applying the new ConfigMap#

Inspecting the definition of ServiceAccount in `sa-cluster.yaml`#

Applying the ServiceAccount definition#

Inspecting the CronJob defined in `periodic-fast.yaml`#

Preparing for Termination of Instances

What can we do?#

Inspecting the ConfigMap defined in experiments-any-pod.yaml#

Creating a Namespace#

Applying the new ConfigMap#

Inspecting the definition of ServiceAccount in sa-cluster.yaml#

Applying the ServiceAccount definition#

Inspecting the CronJob defined in periodic-fast.yaml#

Inspecting the ConfigMap defined in `experiments-any-pod.yaml`#

Inspecting the definition of ServiceAccount in `sa-cluster.yaml`#

Inspecting the CronJob defined in `periodic-fast.yaml`#