Running One-Shot Experiments

In this lesson, we will learn about Kubernetes jobs and apply a job to our cluster for one-shot experiments.

We'll cover the following

Inspecting the once.yaml file
- Kubernetes jobs
Inspecting definition of the image vfarcic/chaostoolkit
Applying the definition
Checking the pods
Inspecting the logs
Deleting the Job

For now, we’re going to focus on one-shot experiments that we’ll run manually, or through pipelines, or any other means that we might see fit.

Inspecting the `once.yaml` file#

Let’s take a look at yet another Kubernetes YAML file.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

---

apiVersion: batch/v1
kind: Job
metadata:
  name: go-demo-8-chaos
spec:
  activeDeadlineSeconds: 600
  backoffLimit: 0
  template:
    metadata:
      labels:
        app: go-demo-8-chaos
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      serviceAccountName: chaostoolkit
      restartPolicy: Never
      containers:
      - name: chaostoolkit
        image: vfarcic/chaostoolkit:1.4.1-2
        args:
        - --verbose
        - run
        - /experiment/health-http.yaml
        env:
        - name: CHAOSTOOLKIT_IN_POD
          value: "true"
        volumeMounts:
        - name: config
          mountPath: /experiment
          readOnly: true
        resources:
          limits:
            cpu: 20m
            memory: 64Mi
          requests:
            cpu: 20m
            memory: 64Mi
      volumes:
      - name: config
        configMap:
          name: chaostoolkit-experiments

Kubernetes jobs#

What matters is that we are defining a Kubernetes Job.

A Job creates one or more Pods and ensures that a specified number of them successfully terminate. As Pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the task (Job) is complete. Deleting a Job will clean up the Pods it created.

We can see from that description (taken from Kubernetes docs) that a Job is an excellent candidate to run something once while being able to track the status and, especially, exit codes of the processes inside it. If we limit ourselves to core Kubernetes components, Jobs are probably the most appropriate type of Kubernetes resources we can use when we want to run something like an experiment. The processes inside containers in a Job will start, run, and finish. Unlike most other Kubernetes resources, Jobs are not restarted or recreated when they are successful or when Pods are deleted. That’s one of the main differences when compared to, let’s say, Deployments and StatefulSets.

On top of that, we can configure Jobs not to restart on failure. That’s what we’re doing in that definition by setting spec.template.spec.restartPolicy to Never. Experiments can be successful or failed, and no matter the outcome of an experiment, the Pod created by that Job will run only once.

Most of that specification is not very exciting. What matters is that we set the serviceAccountName to chaostoolkit. That should give the Job sufficient permissions to do whatever we need it to do.

We defined only one container. We could have more if we’d like to run multiple experiments. But, for our purposes, one should be more than enough to demonstrate how experiments defined as Jobs work.

I encourage you to create your own container image that will contain Chaos Toolkit, the required plugins, and anything else that you might need. But, to simplify things, we’ll use the one I created (vfarcic/chaostoolkit). We’ll discuss that image soon.

We can see from the args that we want the output to be verbose, and that it should run the experiment defined in /experiment/health-http.yaml. If you’re wondering where does that file come from, the short answer is “from the ConfigMap”. We’ll discuss it in a moment.

Then we have the env variable CHAOSTOOLKIT_IN_POD set to true. Chaos Toolkit might need to behave slightly differently when running inside a Pod, so we’re using that variable to ensure it knows where it is.

Further on, we have volumeMounts. We are mounting something called config as the directory /experiment. That “something” is defined in the volumes section, and it is referencing the ConfigMap chaostoolkit-experiments we created earlier. That way, all the entries from that ConfigMap will be available as files inside the /experiment directory in the container.

Finally, we also defined the resources so that Kubernetes knows how much memory and CPU we’re requesting, and what should be the limits.

Inspecting definition of the image `vfarcic/chaostoolkit`#

Before we move on, let’s take a quick look at the definition of the image vfarcic/chaostoolkit.

If you are a Windows user, the open command might not work. If that’s the case, please copy the address from the command that follows, and paste it into your favorite browser.

Enter to Rename, Shift+Enter to Preview

This repository only has Dockerfile. Open it, and you’ll see the definition of the container image. It is as follows.

FROM python:3.8.1-alpine

LABEL maintainer="Viktor Farcic <viktor@farcic.com>"

RUN apk add --no-cache --virtual build-deps gcc g++ git libffi-dev linux-headers python3-dev musl-dev && \
    pip install --no-cache-dir  -q -U pip && \
    pip install --no-cache-dir chaostoolkit && \
    pip install --no-cache-dir chaostoolkit-kubernetes && \
    pip install --no-cache-dir chaostoolkit-istio && \
    pip install --no-cache-dir chaostoolkit-slack && \
    pip install --no-cache-dir slackclient==1.3.2 && \
    apk del build-deps

ENTRYPOINT ["/usr/local/bin/chaos"]
CMD ["--help"]

I will assume that you are already familiar with Dockerfile format and that you know that it is used to define instructions that tools can use to build container images. There are quite a few builders that use such definitions, including Docker and Kaniko.

As you can see, the definition of the image (Dockerfile) is relatively simple and straightforward. It is based on python since that’s a requirement for Chaos Toolkit, and it installs the CLI and the plugins we’ll need. The entry point is the path to the chaos binary. By default, it’ll output help if we do not overwrite the command (CMD).

Applying the definition#

Now that we have explored the Job that will run the experiment, we are ready to apply it and see what happens.

Enter to Rename, Shift+Enter to Preview

We can see that the Job go-demo-8-chaos was created.

Checking the pods#

Next, we’ll take a look at the Pods in that Namespace. To be more specific, we’re going to retrieve all the Pods and filter them by using selector app=go-demo-8-chaos since that’s the name of the label that we put to the template of the Pod, which will be created by the Job.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

NAME                READY STATUS  RESTARTS AGE
go-demo-8-chaos-... 1/1   Running 0        14s

We can see that, in my case, the Pod created by the Job is Running. If that’s what you see on your screen, please repeat that command until the STATUS is Completed. That should take around a minute. When it’s finished, the output should be similar to the one that follows.

NAME                READY STATUS    RESTARTS AGE
go-demo-8-chaos-... 0/1   Completed 0        75s

The execution of the Pod created through the Job finished. It is Completed, meaning that the experiment was successful.

Inspecting the logs#

Let’s output the logs and confirm that the experiment was indeed successful.

Enter to Rename, Shift+Enter to Preview

Since we wanted to output the logs only of the Pod that executed the experiment, we used the selector as a filter.

I won’t present the output in here since it is too big to fit into the lesson. You should see on your screen the events that we already explored countless times before. The only substantial difference between now and then is that we run the experiment from a container, instead of a laptop. As a result, Kubernetes added timestamps and log levels to each output entry.

We managed to run an experiment from a Kubernetes cluster in a very similar way as if we’d run it from a laptop.

Deleting the Job#

Before we proceed, we’ll delete the Job. We won’t need it anymore since we are about to switch to the scheduled execution of experiments.

Enter to Rename, Shift+Enter to Preview

That’s it. The Job is no more.

As you saw, running one-shot experiments inside a Kubernetes cluster is straightforward. I encourage you to hook them into your continuous delivery pipelines. Really, all you have to do is create a Job.

In the next lesson, we’re going to take a look at how we can schedule our experiments to be executed periodically instead of running one-shot executions.

Types of Experiment Executions

Running Scheduled Experiments

Mark as Completed

Report an Issue

Before We Begin

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

Conclusion

Running One-Shot Experiments

Inspecting the `once.yaml` file#

Kubernetes jobs#

Inspecting definition of the image `vfarcic/chaostoolkit`#

Applying the definition#

Checking the pods#

Inspecting the logs#

Deleting the Job#

Running One-Shot Experiments

Inspecting the once.yaml file#

Kubernetes jobs#

Inspecting definition of the image vfarcic/chaostoolkit#

Applying the definition#

Checking the pods#

Inspecting the logs#

Deleting the Job#

Inspecting the `once.yaml` file#

Inspecting definition of the image `vfarcic/chaostoolkit`#