Running Failed Scheduled Experiments
In this lesson, we will simulate a failed experiment by deleting the Deployment. Moreover, we will also discuss how to generate a report for it.
Deleting the Deployment to simulate a failed experiment#
Let’s see what happens when an experiment fails. You probably already know the outcome, or, at least, you should be able to guess. Nevertheless, we’ll simulate a failure of the experiment by deleting the Deployment go-demo-8
. As a result, the Pods of the application that is used as the target of the experiment will be terminated, and the experiment will inevitably fail.
Retrieving pods created by CronJobs#
The target of the experiment (the application) is gone. We’re going to retrieve the Pods of the experiment created by the CronJob.
Keep repeating that command until the output is similar to the one that follows. Remember, new experiments are being created every five minutes. On top of that, we need to wait for a minute or two until the chaos
process finishes running. The end result should be the creation of a new Pod, which, when the experiment inside it is finished, should have the STATUS
set to error
.
NAME READY STATUS RESTARTS AGE
go-demo-8-chaos-... 0/1 Completed 0 6m8s
go-demo-8-chaos-... 0/1 Error 0 67s
The experiment failed because we deleted the application it was targeting. The process running inside the container failed and returned an exit code other than zero. That was the indication to Kubernetes to treat it as an error. The process inside the container running inside that Pod did not terminate successfully; hence the experiment failed.
There’s probably no need to go deeper. I just wanted to show you how both successful and unsuccessful experiments look like when scheduled periodically through a CronJob.
Generating the report#
If we’d like to generate reports, we have the journal file with the information from the last execution. It is stored in the PersistentVolume. All we’d have to do is retrieve it from that volume. From there on, we should be able to generate the report just as we did in the past.
I will not show you how to retrieve something from persistent volumes. Instead, I will assume that you already know how to do that. If you don’t, you are likely new to Kubernetes. If that’s the case, I’m surprised that you got this far into the course. Nevertheless, if you are indeed new to Kubernetes and you’re not sure how to handle persistent volumes, I recommend you take the course A Practical Guide to Kubernetes or read the official documentation to get more familiar with Kubernetes.
On the other hand, if you already have experience with Kubernetes, retrieving data from a volume should not be a problem. In either case, I will not explain that here. Kubernetes itself is not the subject of this course. Instead, I’ll just give you a tip by saying that I’d create a report by creating a new Pod that has the same volume attached, and I’d run the process of creating the report from there. After that, I’d push the report to some external storage that is easy to access like, for example, a Git repository, an S3 bucket, Artifactory, or whatever else I might have at my disposal.
Confirming PersistentVolume#
In any case, let’s confirm that the PersistentVolume is indeed there.
The output is as follows.
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-... 1Gi RWO Delete Bound go-demo-8/go-demo-8-chaos standard 7m44s
pvc-... 8Gi RWO Delete Bound go-demo-8/go-demo-8-db standard 19m
As we can see, one of the two volumes is claimed by go-demo-8-chaos
. That’s where the journals are being stored.
Reapplying definition of the app#
There are still a couple of things we need to figure out when running chaos experiments inside a Kubernetes cluster. But, before we move on, we’re going to re-apply the definition of the demo application. We destroyed the app so that we ensure that the experiment will fail. Now we’re going to recreate it.
To be on the safe side, we’ll wait until it rolls out.
There is at least one more important thing that we should explore.
In the next lesson, we’ll try to figure out how to send notifications from the experiments.