Setting up Chaos Toolkit in Kubernetes

In this lesson, we set up Chaos Toolkit in Kubernetes by defining configurations using ConfigMap and creating a ServiceAccount.

Defining configuration in Kubernetes using the ConfigMap#

Before we start running chaos experiments inside the Kubernetes cluster, we’ll need to set up at least two things:

  1. We’ll need experiment definitions stored somewhere in the cluster. The most common and the most logical way to define a configuration in Kubernetes is to use ConfigMap.
  2. Apart from having experiment definitions readily available, we will also need to create a ServiceAccount that will provide the necessary privileges to processes that will run the experiments.

Inspecting the definition of ConfigMap#

Let’s start with the ConfigMap.

The output, limited to the first experiment, is as follows.

...
apiVersionv1
kindConfigMap
metadata:
  namechaostoolkit-experiments
data:
  health-http.yaml: |
    version: 1.0.0
    title: What happens if we terminate an instance of the application?
    description: If an instance of the application is terminated, the applications as a whole should still be operational.
    tags:
    - k8s
    - pod
    steady-state-hypothesis:
      titleThe app is healthy
      probes:
      - nameapp-responds-to-requests
        typeprobe
        tolerance200
        provider:
          typehttp
          timeout3
          verify_tlsfalse
          urlhttp://go-demo-8.go-demo-8/demo/person
          headers:
            Hostgo-demo-8.acme.com
    method:
    - typeaction
      nameterminate-app-pod
      provider:
        typepython
        modulechaosk8s.pod.actions
        functerminate_pods
        arguments:
          label_selectorapp=go-demo-8
          randtrue
          nsgo-demo-8
      pauses
        after2
...

We can see that it is a “standard” Kubernetes ConfigMap that contains a few experiments. I did not put all those we explored so far since we won’t need them all in this section. Nevertheless, you should be able to add additional experiments, as long as you don’t reach the limit of 1MB. That’s the limitation of etcd, which is the registry where Kubernetes stores its objects. If you do need more than that, you can use multiple ConfigMaps.

The vital part of that definition is the data section.

We can see that there is the health-http.yaml key, which, as you hopefully know, will be transformed into a file with the same name when we mount that ConfigMap. The value is a definition of an experiment. The steady steady-state-hypothesis is checking whether the application is healthy. It does that through the probe that sends requests to the app. The action of the method is terminating one of the Pods of the go-demo-8 application.

Further down, we can see that there are a few other experiments. We won’t go through them because all those defined in that ConfigMap are the same as the ones we used before. In this section, we are not trying to figure out new ways to create chaos, but rather how to run experiments inside a Kubernetes cluster.

All in all, our experiments will be defined as ConfigMaps, and they will be available to the processes inside the cluster.

Applying the definition#

Let’s apply that definition and move forward.

Describing the ConfigMap#

Next, to be on the safe side, we are going to describe the ConfigMap and confirm that it is indeed created and available inside our cluster.

We can see that the output is, more or less, the same as the content that we already saw in the YAML file that we used to create that ConfigMap.

Creating a ServiceAccount#

The next one in line is the ServiceAccount. We need it to provide sufficient permissions to chaos processes that we’ll run in the go-demo-8 Namespace.

Inspecting the definition of sa.yaml#

Let’s take a look at the sa.yaml file.

The output is as follows.

---

apiVersionv1
kindServiceAccount
metadata:
  namechaostoolkit

---

apiVersionrbac.authorization.k8s.io/v1beta1
kindRole
metadata:
  namechaostoolkit
rules:
apiGroups:
  - ""
  - "extensions"
  - "apps"
  resources:
  - pods
  - deployments
  - jobs
  verbs:
  - list
  - get
  - watch
  - delete
apiGroups:
  - ""
  resources:
  - "persistentvolumeclaims"
  verbs:
  - list
  - get
  - create
  - delete
  - update
  - patch
  - watch

---

apiVersionrbac.authorization.k8s.io/v1beta1
kindRoleBinding
metadata:
  namechaostoolkit
roleRef:
  apiGrouprbac.authorization.k8s.io
  kindRole
  namechaostoolkit
subjects:
  - kindServiceAccount
    namechaostoolkit

We can see that there are a few things defined there. To begin with, we have the ServiceAccount called chaostoolkit. As you hopefully already know, ServiceAccounts themselves are just references to something, and they don’t serve much of a purpose without binding them to roles which, in turn, define permissions. So, that YAML definition contains a Role with the rules that determine which actions (verbs) can be executed on specific types of apiGroups and resources. Further on, we have a RoleBinding that binds the Role to the ServiceAccount.

The short explanation of that definition is that it will allow us to do almost anything we might ever need to do, but limited to a specific Namespace.

In a real-world situation, you might want to be more restrictive than that. Those permissions will effectively allow processes in that Namespace to do anything inside it. On the other hand, it is tough to be restrictive with permissions we need to give to our chaos experiments. Theoretically, we might want to affect anything inside a Namespace or even the whole cluster through experiments. So, no matter how strong our desire is to be restrictive with the permissions in general, we might need to be generous to chaos experiments. For them to work correctly, we most likely need to allow a wide range of permissions. As a minimum, we have to permit them to perform the actions we decided we’ll run.

From permissions point of view, the only real restriction that we’re setting up is that we are creating the RoleBinding and not a ClusterRoleBinding. Those permissions will be assigned to the ServiceAccount inside that Namespace. As a result, we’ll limit the capability of Chaos Toolkit to that Namespace, and it will not be able to affect the whole cluster.

Later on, we might dive into an even wider range of permissions. We might choose to create a subset of experiments that are operating on the level of the whole cluster. But, for now, limiting permissions to a specific Namespace should be enough. Those permissions, as I already mentioned, will allow Chaos Toolkit to do almost anything inside a particular Namespace.

Applying the definition#

All that is left, for now, is to apply that definition and create the ServiceAccount, the Role, and the RoleBinding.

Now we should be able to run our experiments inside the Kubernetes cluster. We have the ConfigMap that contains the definitions of the experiments, and we have the ServiceAccount bound to the Role with sufficient permissions.


The next lesson discusses the types of experiment executions.

Deploying the Application
Types of Experiment Executions
Mark as Completed
Report an Issue