Sending Selective Notifications
In this lesson, we are going to filter the notifications so that we only receive a notification whenever an experiment fails. We will do this by modifying the ConfigMap.
We'll cover the following
- Inspecting the settings-failure.yaml file
- Checking the difference between settings.yaml and settings-failure.yaml
- Fixing the Slack token and applying the ConfigMap
- Checking the pods
- Observing the notifications on Slack
- Deleting the Deployment to fail the experiment
- Confirming the notification received on Slack
We are now receiving notifications when an experiment starts and ends, no matter if it is successful or not. However, this might be too much. We can easily be overwhelmed with too many notifications, and, as a result, we will likely start ignoring notifications and miss those that do matter. Instead, I prefer to receive notifications only when something terrible happens. I prefer to receive notifications from the system only when they are critical. More often than not, that means that we might want to receive only notifications that should result in some actions. What’s the point of knowing that an experiment was successful just as many others were successful before. No news can be interpreted as good news. Instead, I prefer to receive notifications that indicate that there is an action that I should perform. A failed experiment should be a clear indication that the system does not behave as it should, and that we should do our best to improve it so that the same experiment is successful the next time it runs.
We’re going to modify the notifications
section in Chaos Toolkit settings so that we are notified only when experiments fail. That way, we’ll be ignorant when they’re successful, and we’ll be ready to act when they fail.
Inspecting the settings-failure.yaml
file#
Let’s take a look at yet another YAML.
The output is as follows.
apiVersion: v1
kind: ConfigMap
metadata:
name: chaostoolkit-settings
data:
settings.yaml: |
notifications:
- type: plugin
module: chaosslack.notification
token: xoxb-@102298415591@-@902246554644@-@xa4zDR3lcyvleRymXAFSifLD@
channel: tests
events:
- discover-failed
- run-failed
- validate-failed
That ConfigMap
is very similar to the one we used before. The only difference is in the additional section events
. Over there, we’re specifying that we want to send notifications only if certain events happen. In our case, those are all related to failures. We’ll be sending notifications if there is an issue detected during discovery (discover-failed
), during running (run-failed
), or during validation (validate-failed
) phases. Those three cover all failure scenarios in Chaos Toolkit, so we’ll be notified every time something goes wrong.
If you’re curious about which other events you can use, please go to the Chaos Toolkit Flow Events section in the documentation.
Checking the difference between settings.yaml
and settings-failure.yaml
#
To be on the safe side, we’ll confirm that those are indeed the only changes to the ConfigMap definition by outputting the diff
with the previous definition.
The output is as follows.
11a12,15
> events:
> - discover-failed
> - run-failed
> - validate-failed
As you can see, the events
section is indeed the only addition to the ConfigMap.
Fixing the Slack token and applying the ConfigMap#
Let’s modify the token
by removing the @
characters and applying the result with kubectl
, just as we did before.
Checking the pods#
Next, we’ll start fetching the Pods and waiting until the CronJob spins up a new one.
The output, in my case, is as follows.
NAME READY STATUS RESTARTS AGE
go-demo-8-chaos-... 0/1 Error 0 26m
go-demo-8-chaos-... 0/1 Completed 0 11m
go-demo-8-chaos-... 0/1 Completed 0 6m21s
go-demo-8-chaos-... 0/1 Completed 0 81s
We can see from my output that the last job was created eighty-one seconds ago. So, I’ll have to wait approximately three and a half minutes until the next Job is created, and a minute or two until it finishes executing. In your case, the remaining time will be different depending on when the last job was created.
Keep repeating the kubectl get
command until a new Pod (created by a new Job) starts running, and it completes the execution of the experiment.
Observing the notifications on Slack#
Please go back to Slack once the new Pod is created and with the STATUS
set to Completed
.
If you ignore potential notifications coming from other readers, you’ll see that there’s nothing new. No notifications are coming from you. The process didn’t create a new notification because the experiment was successful, and we configured Chaos Toolkit to send notification only if an experiment fails. From now on, there will be no notifications coming from you until we do something that will make one of the experiments fail. Just remember what I said before. Other people might be running experiments while following this course, so you might see notifications from others. What matters is that your notification is not there. The experiment was successful.
Deleting the Deployment to fail the experiment#
Let’s change the situation. Let’s confirm not only that we don’t receive notifications when experiments are successful, but also that we do get new messages in Slack when an experiment fails. We’re going to do that in the same way as we did before. We’re going to simulate failure by removing the Deployment of the application. That will inevitably result in a failed experiment since it will try to confirm that the app that does not exist is healthy.
We removed the go-demo-8
Deployment, and that, as you already know, will terminate the ReplicaSets, which, in turn, will eliminate the Pods. As a result, the experiment should fail to confirm that the application is healthy.
Confirming the notification received on Slack#
Now comes the waiting time. We need to be patient until the next experiment is executed. Please go back to Slack and wait for a while.
After a while, you should see a new notification similar to the one in the image below.
From now on, we will receive notifications only when experiments fail, instead of being swarmed with, more or less, useless information about successful executions. While it might be useful to see notifications of successful experiments a few times, sooner or later, we’d get tired of this, and we’d start ignoring them. So, we configured our Chaos Toolkit CronJob to send notifications only when one of the experiments fails.
In the next lesson, we will remove the resources that we have created.