Sending Selective Notifications

In this lesson, we are going to filter the notifications so that we only receive a notification whenever an experiment fails. We will do this by modifying the ConfigMap.

We'll cover the following

Inspecting the settings-failure.yaml file
Checking the difference between settings.yaml and settings-failure.yaml
Fixing the Slack token and applying the ConfigMap
Checking the pods
Observing the notifications on Slack
Deleting the Deployment to fail the experiment
Confirming the notification received on Slack

We are now receiving notifications when an experiment starts and ends, no matter if it is successful or not. However, this might be too much. We can easily be overwhelmed with too many notifications, and, as a result, we will likely start ignoring notifications and miss those that do matter. Instead, I prefer to receive notifications only when something terrible happens. I prefer to receive notifications from the system only when they are critical. More often than not, that means that we might want to receive only notifications that should result in some actions. What’s the point of knowing that an experiment was successful just as many others were successful before. No news can be interpreted as good news. Instead, I prefer to receive notifications that indicate that there is an action that I should perform. A failed experiment should be a clear indication that the system does not behave as it should, and that we should do our best to improve it so that the same experiment is successful the next time it runs.

We’re going to modify the notifications section in Chaos Toolkit settings so that we are notified only when experiments fail. That way, we’ll be ignorant when they’re successful, and we’ll be ready to act when they fail.

Inspecting the `settings-failure.yaml` file#

Let’s take a look at yet another YAML.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

apiVersion: v1
kind: ConfigMap
metadata:
  name: chaostoolkit-settings
data:
  settings.yaml: |
    notifications:
    - type: plugin
      module: chaosslack.notification
      token: xoxb-@102298415591@-@902246554644@-@xa4zDR3lcyvleRymXAFSifLD@
      channel: tests
      events:
      - discover-failed
      - run-failed
      - validate-failed

That ConfigMap is very similar to the one we used before. The only difference is in the additional section events. Over there, we’re specifying that we want to send notifications only if certain events happen. In our case, those are all related to failures. We’ll be sending notifications if there is an issue detected during discovery (discover-failed), during running (run-failed), or during validation (validate-failed) phases. Those three cover all failure scenarios in Chaos Toolkit, so we’ll be notified every time something goes wrong.

If you’re curious about which other events you can use, please go to the Chaos Toolkit Flow Events section in the documentation.

Checking the difference between `settings.yaml` and `settings-failure.yaml`#

To be on the safe side, we’ll confirm that those are indeed the only changes to the ConfigMap definition by outputting the diff with the previous definition.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

11a12,15
>       events:
>       - discover-failed
>       - run-failed
>       - validate-failed

As you can see, the events section is indeed the only addition to the ConfigMap.

Fixing the Slack token and applying the ConfigMap#

Let’s modify the token by removing the @ characters and applying the result with kubectl, just as we did before.

Enter to Rename, Shift+Enter to Preview

Checking the pods#

Next, we’ll start fetching the Pods and waiting until the CronJob spins up a new one.

Enter to Rename, Shift+Enter to Preview

The output, in my case, is as follows.

NAME                READY STATUS    RESTARTS AGE
go-demo-8-chaos-... 0/1   Error     0        26m
go-demo-8-chaos-... 0/1   Completed 0        11m
go-demo-8-chaos-... 0/1   Completed 0        6m21s
go-demo-8-chaos-... 0/1   Completed 0        81s

We can see from my output that the last job was created eighty-one seconds ago. So, I’ll have to wait approximately three and a half minutes until the next Job is created, and a minute or two until it finishes executing. In your case, the remaining time will be different depending on when the last job was created.

Keep repeating the kubectl get command until a new Pod (created by a new Job) starts running, and it completes the execution of the experiment.

Observing the notifications on Slack#

Please go back to Slack once the new Pod is created and with the STATUS set to Completed.

If you ignore potential notifications coming from other readers, you’ll see that there’s nothing new. No notifications are coming from you. The process didn’t create a new notification because the experiment was successful, and we configured Chaos Toolkit to send notification only if an experiment fails. From now on, there will be no notifications coming from you until we do something that will make one of the experiments fail. Just remember what I said before. Other people might be running experiments while following this course, so you might see notifications from others. What matters is that your notification is not there. The experiment was successful.

Deleting the Deployment to fail the experiment#

Let’s change the situation. Let’s confirm not only that we don’t receive notifications when experiments are successful, but also that we do get new messages in Slack when an experiment fails. We’re going to do that in the same way as we did before. We’re going to simulate failure by removing the Deployment of the application. That will inevitably result in a failed experiment since it will try to confirm that the app that does not exist is healthy.

Enter to Rename, Shift+Enter to Preview

We removed the go-demo-8 Deployment, and that, as you already know, will terminate the ReplicaSets, which, in turn, will eliminate the Pods. As a result, the experiment should fail to confirm that the application is healthy.

Confirming the notification received on Slack#

Now comes the waiting time. We need to be patient until the next experiment is executed. Please go back to Slack and wait for a while.

After a while, you should see a new notification similar to the one in the image below.

Chaos Toolkit Slack notifications about a failed experiment

From now on, we will receive notifications only when experiments fail, instead of being swarmed with, more or less, useless information about successful executions. While it might be useful to see notifications of successful experiments a few times, sooner or later, we’d get tired of this, and we’d start ignoring them. So, we configured our Chaos Toolkit CronJob to send notifications only when one of the experiments fails.

In the next lesson, we will remove the resources that we have created.

Sending Experiment Notifications

Destroying What We Created

Mark as Completed

Report an Issue

Before We Begin

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

Conclusion

Sending Selective Notifications

Inspecting the `settings-failure.yaml` file#

Checking the difference between `settings.yaml` and `settings-failure.yaml`#

Fixing the Slack token and applying the ConfigMap#

Checking the pods#

Observing the notifications on Slack#

Deleting the Deployment to fail the experiment#

Confirming the notification received on Slack#

Sending Selective Notifications

Inspecting the settings-failure.yaml file#

Checking the difference between settings.yaml and settings-failure.yaml#

Fixing the Slack token and applying the ConfigMap#

Checking the pods#

Observing the notifications on Slack#

Deleting the Deployment to fail the experiment#

Confirming the notification received on Slack#

Inspecting the `settings-failure.yaml` file#

Checking the difference between `settings.yaml` and `settings-failure.yaml`#