Pausing After Actions

In this lesson, we will find out why our experiment did not fail and see how we can pause after or before actions to give the system enough time to perform tasks.

Why didn’t the experiment fail?#

In our previous experiment, we validated the state before and after actions. We checked whether the Pod exists, we terminated the Pod, and then we verified whether the Pod still exists. The experiment should have failed, but it didn’t. The reason why it didn’t fail is that all those probes and actions were executed immediately one after another.

When Chaos Toolkit sent an instruction to Kube API to destroy the Pod, it received an acknowledgment of that action. After that, it immediately validated whether the Pod was still there, and it was. Kubernetes did not have enough time to remove it entirely. Maybe the Pod was still running at that time, and perhaps, we were too fast. Or else, maybe the Pod was terminating. That would probably explain the strange outcome. The Pod was not gone right away. It was still there, with the state terminating. So, the thing that we are probably missing to make that experiment more useful is a pause.

Let’s see how we can pause after or before actions to give enough time for the system to perform whichever tasks it needs to perform before we validate the state again.

Inspecting the definition of terminate-pod-pause.yaml#

We’re going to take a look at yet another YAML.

We will almost always output the differences between the new and the old definition as a way to easily spot the differences.

The output is as follows.

>   pauses: 
>     after: 10

This time, we can see that the only addition is to add the pauses section. It is placed after the action that terminates the Pod, and it has the after section set to 10. Pauses can be added before or after some actions. In this case, when we execute the action to terminate the Pod, the system will wait for 10 seconds before validating the state again.

Running chaos experiment and inspecting the output#

So, let’s see what we get if we execute this experiment.

The output is as follows (timestamps are removed for brevity).

[... INFO] Validating the experiment's syntax
[... INFO] Experiment looks valid
[... INFO] Running experiment: What happens if we terminate a Pod?
[... INFO] Steady state hypothesis: Pod exists
[... INFO] Probe: pod-exists
[... INFO] Steady state hypothesis is met!
[... INFO] Action: terminate-pod
[... INFO] Pausing after activity for 10s...
[... INFO] Steady state hypothesis: Pod exists
[... INFO] Probe: pod-exists
[... CRITICAL] Steady state probe 'pod-exists' is not in the given tolerance so failing this experiment
[... INFO] Let's rollback...
[... INFO] No declared rollbacks, let's move on.
[... INFO] Experiment ended with status: deviated
[... INFO] The steady-state has deviated, a weakness may have been discovered

We can see that it paused for 10 seconds, and then the probe failed. It said that steady state probe 'pod-exists' is not in the given tolerance so failing this experiment.

With enough pause, we managed to make a more realistic experiment. We gave Kubernetes enough time to remove that Pod entirely, and then we validated whether the Pod is still there. The system came back to us saying that it isn’t, and we can confirm that by outputting the exit code of the last command.

We can see that the output is 1, meaning that the experiment indeed failed.

Recreating the failed pods#

Now, before we move forward and explore a few other essential things, we’re going to recreate that failed Pod before we add a few more missing pieces that would make this experiment really valid.


In the next lesson, we will learn how to probe the conditions and phases of a pod.

Defining the Steady-State Hypothesis
Probing Phases and Conditions
Mark as Completed
Report an Issue