Terminating Application Dependencies
This lesson will cover a chaos experiment carried out to check how the application behaves if we terminate a replica of the database.
There’s one more thing within the area of what we’re exploring that we might want to try. We might want to check what happens if we destroy an instance of a dependency of our application. As you already know, our demo application depends on MongoDB. We saw what happens when we destroy an instance of our application. Next, we’ll explore how the application behaves if we terminate a replica of the database (the dependency).
Inspecting the definition of health-db.yaml
#
We are going to take a look at yet another chaos experiment definition.
The output is as follows.
version: 1.0.0
title: What happens if we terminate an instance of the DB?
description: If an instance of the DB is terminated, dependant applications should still be operational.
tags:
- k8s
- pod
- http
configuration:
ingress_host:
type: env
key: INGRESS_HOST
steady-state-hypothesis:
title: The app is healthy
probes:
- name: app-responds-to-requests
type: probe
tolerance: 200
provider:
type: http
timeout: 3
verify_tls: false
url: http://${ingress_host}/demo/person
headers:
Host: go-demo-8.acme.com
method:
- type: action
name: terminate-db-pod
provider:
type: python
module: chaosk8s.pod.actions
func: terminate_pods
arguments:
label_selector: app=go-demo-8-db
rand: true
ns: go-demo-8
pauses:
after: 2
Checking the difference between of health-http.yaml
and health-db.yaml
#
As you’re already accustomed, we’ll output differences between that definition and the one we used before. That will help us spot what really changed, given that the differences are subtle.
The output is as follows.
2,3c2,3
< title: What happens if we terminate an instance of the application?
< description: If an instance of the application is terminated, the applications as a whole should still be operational.
---
> title: What happens if we terminate an instance of the DB?
> description: If an instance of the DB is terminated, dependant applications should still be operational.
27c27
< name: terminate-app-pod
---
> name: terminate-db-pod
33c33
< label_selector: app=go-demo-8
---
> label_selector: app=go-demo-8-db
We can see that, if we ignore the title
, the description
, and the name
, what really changed is the label_selector
. We’ll terminate one of the Pods with the label app
set to go-demo-8-db
. In other words, we are doing the same thing as before. Instead of terminating instances of our application, we are terminating instances of the database that is the dependency of our app.
Let’s see what happens. Remember, in the previous experiment, we terminated a Pod of the passed application. We could continue sending requests. It is highly available.
Running chaos experiment and inspecting the output#
Let’s see what happens when we do the same to the database.
The output, without timestamps, is as follows.
[... INFO] Validating the experiment's syntax
[... INFO] Experiment looks valid
[... INFO] Running experiment: What happens if we terminate an instance of the DB?
[... INFO] Steady state hypothesis: The app is healthy
[... INFO] Probe: app-responds-to-requests
[... INFO] Steady state hypothesis is met!
[... INFO] Action: terminate-db-pod
[... INFO] Pausing after activity for 2s...
[... INFO] Steady state hypothesis: The app is healthy
[... INFO] Probe: app-responds-to-requests
[... ERROR] => failed: activity took too long to complete
[... WARNING] Probe terminated unexpectedly, so its tolerance could not be validated
[... CRITICAL] Steady state probe 'app-responds-to-requests' is not in the given tolerance so failing this experiment
[... INFO] Let's rollback...
[... INFO] No declared rollbacks, let's move on.
[... INFO] Experiment ended with status: deviated
[... INFO] The steady-state has deviated, a weakness may have been discovered
The initial probe passed, and the action was successful. It terminated an instance of the database, it waited for two seconds, and then it failed. Our application does not respond to requests when the associated database is not there.
We know that Kubernetes will recreate the terminated Pod of the database. However, there is downtime between destroying an existing Pod and the new one being up and running. Not only that the database is not available, but our application is not available either.
Assignment#
Unlike other examples where I showed you how to fix something, from now on, you will have homework. The goal is for you to figure out how to solve this problem. How can you make this experiment a success? You should have a clue how to do that from the application. It’s not really that different. Think about it and take a break from reading. Try executing the experiment again when you figure it out.
If the homework is too difficult, I’ll give you a couple of tips. Do the same thing to the database as we did to the application. You need to run multiple instances of the application to make it highly available. Our demo application depends on it, and it will never be highly available if the database is not available. So, multiple instances of the database are required as well. Since the database is stateful, you probably want to use StatefulSet instead of a Deployment, and I strongly recommend that you don’t try to define all that by yourself. There is an excellent Helm chart in the stable channel. Search “MongoDB stable channel Helm chart,” and you will find a definition of MongoDB that can be replicated and made highly available.
In the next lesson, we will remove the resources that we have created.