Running Failed Scheduled Experiments

In this lesson, we will simulate a failed experiment by deleting the Deployment. Moreover, we will also discuss how to generate a report for it.

We'll cover the following

Deleting the Deployment to simulate a failed experiment
Retrieving pods created by CronJobs
Generating the report
Confirming PersistentVolume
Reapplying definition of the app

Deleting the Deployment to simulate a failed experiment#

Let’s see what happens when an experiment fails. You probably already know the outcome, or, at least, you should be able to guess. Nevertheless, we’ll simulate a failure of the experiment by deleting the Deployment go-demo-8. As a result, the Pods of the application that is used as the target of the experiment will be terminated, and the experiment will inevitably fail.

Enter to Rename, Shift+Enter to Preview

Retrieving pods created by CronJobs#

The target of the experiment (the application) is gone. We’re going to retrieve the Pods of the experiment created by the CronJob.

Enter to Rename, Shift+Enter to Preview

Keep repeating that command until the output is similar to the one that follows. Remember, new experiments are being created every five minutes. On top of that, we need to wait for a minute or two until the chaos process finishes running. The end result should be the creation of a new Pod, which, when the experiment inside it is finished, should have the STATUS set to error.

NAME                READY STATUS    RESTARTS AGE
go-demo-8-chaos-... 0/1   Completed 0        6m8s
go-demo-8-chaos-... 0/1   Error     0        67s

The experiment failed because we deleted the application it was targeting. The process running inside the container failed and returned an exit code other than zero. That was the indication to Kubernetes to treat it as an error. The process inside the container running inside that Pod did not terminate successfully; hence the experiment failed.

There’s probably no need to go deeper. I just wanted to show you how both successful and unsuccessful experiments look like when scheduled periodically through a CronJob.

Generating the report#

If we’d like to generate reports, we have the journal file with the information from the last execution. It is stored in the PersistentVolume. All we’d have to do is retrieve it from that volume. From there on, we should be able to generate the report just as we did in the past.

I will not show you how to retrieve something from persistent volumes. Instead, I will assume that you already know how to do that. If you don’t, you are likely new to Kubernetes. If that’s the case, I’m surprised that you got this far into the course. Nevertheless, if you are indeed new to Kubernetes and you’re not sure how to handle persistent volumes, I recommend you take the course A Practical Guide to Kubernetes or read the official documentation to get more familiar with Kubernetes.

On the other hand, if you already have experience with Kubernetes, retrieving data from a volume should not be a problem. In either case, I will not explain that here. Kubernetes itself is not the subject of this course. Instead, I’ll just give you a tip by saying that I’d create a report by creating a new Pod that has the same volume attached, and I’d run the process of creating the report from there. After that, I’d push the report to some external storage that is easy to access like, for example, a Git repository, an S3 bucket, Artifactory, or whatever else I might have at my disposal.

Confirming PersistentVolume#

In any case, let’s confirm that the PersistentVolume is indeed there.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

NAME    CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM                     STORAGECLASS REASON AGE
pvc-... 1Gi      RWO          Delete         Bound  go-demo-8/go-demo-8-chaos standard            7m44s
pvc-... 8Gi      RWO          Delete         Bound  go-demo-8/go-demo-8-db    standard            19m

As we can see, one of the two volumes is claimed by go-demo-8-chaos. That’s where the journals are being stored.

Reapplying definition of the app#

There are still a couple of things we need to figure out when running chaos experiments inside a Kubernetes cluster. But, before we move on, we’re going to re-apply the definition of the demo application. We destroyed the app so that we ensure that the experiment will fail. Now we’re going to recreate it.

Enter to Rename, Shift+Enter to Preview

To be on the safe side, we’ll wait until it rolls out.

Enter to Rename, Shift+Enter to Preview

There is at least one more important thing that we should explore.

In the next lesson, we’ll try to figure out how to send notifications from the experiments.

Running Scheduled Experiments

Sending Experiment Notifications

Mark as Completed

Report an Issue

Before We Begin

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

Conclusion

Running Failed Scheduled Experiments

Deleting the Deployment to simulate a failed experiment#

Retrieving pods created by CronJobs#

Generating the report#

Confirming PersistentVolume#

Reapplying definition of the app#