Aborting Network Requests

In this lesson, we will run a chaos experiment to see what happens if we abort some network requests.

We'll cover the following

Inspecting the definition of network.yaml
- Repeating the requests
Running chaos experiment and inspecting the output

Networking issues are very common. They happen more often than many people think. We are about to explore what happens when we simulate or create those same issues ourselves.

So, what can we do?

We can do many different things, but, in our case, we’ll start with something relatively simple. We’ll see what happens if we intentionally abort some of the network requests. We’re going to terminate requests and see how our application behaves when that happens. We’re not going to abort all the requests, but only some. Terminating 50% of requests should do.

What happens if 50% of the requests coming to our applications are terminated? Is our application resilient enough to survive without negatively affecting users? As you can probably guess, we can check that through an experiment.

Inspecting the definition of `network.yaml`#

Let’s take a look at yet another Chaos Toolkit definition.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

version: 1.0.0
title: What happens if we abort responses
description: If responses are aborted, the dependant application should retry and/or timeout requests
tags:
- k8s
- istio
- http
configuration:
  ingress_host:
      type: env
      key: INGRESS_HOST
steady-state-hypothesis:
  title: The app is healthy
  probes:
  - type: probe
    name: app-responds-to-requests
    tolerance: 200
    provider:
      type: http
      timeout: 5
      verify_tls: false
      url: http://＄{ingress_host}?addr=http://go-demo-8
      headers:
        Host: repeater.acme.com
  - type: probe
    tolerance: 200
    ref: app-responds-to-requests
  - type: probe
    tolerance: 200
    ref: app-responds-to-requests
  - type: probe
    tolerance: 200
    ref: app-responds-to-requests
  - type: probe
    tolerance: 200
    ref: app-responds-to-requests
method:
- type: action
  name: abort-failure
  provider:
    type: python
    module: chaosistio.fault.actions
    func: add_abort_fault
    arguments:
      virtual_service_name: go-demo-8
      http_status: 500
      routes:
        - destination:
            host: go-demo-8
            subset: primary
      percentage: 50
      version: networking.istio.io/v1alpha3
      ns: go-demo-8
  pauses: 
    after: 1

At the top, we have general information like the title asking what happens if we abort responses and the description stating that if responses are aborted, the dependant application should retry and/or timeout requests. Those are reasonable questions and assumptions. If something bad happens with requests, we should probably retry or timeout them. We also have some tags telling us that the experiment is about k8s, istio, and http. Just as before, we have configuration that will allow us to convert the environment variable INGRESS_HOST into Chaos Toolkit variable ingress_host. And we have a steady-state-hypothesis that validates that the application is healthy. We’re measuring that health by sending a request to our application and expecting that the return code is 200. We are, more or less, doing the same thing as before. However, this time, we are not sending a request to go-demo-8 but to the repeater.

Repeating the requests#

Since we are going to abort 50% of the requests, having only one probe with a request might not produce the result that we want. Getting the desired would rely on luck since we couldn’t predict whether that request would fall into the 50% that are aborted. To reduce the possibility of randomness influencing our steady-state hypothesis, we are going to repeat that request four more times. However, instead of defining the whole probe, we have a shortcut definition. The second probe also has the tolerance 200, but it is referencing the probe app-responds-to-requests. So, instead of repeating everything, we are just referencing the existing probe, and we are doing that four times.

All in all, we are sending requests, and we’re expecting the 200 response code five times.

Then we have a method with the action abort-failure. It’s using the module chaosistio.fault.actions and the function add_abort_fault. It should be self-descriptive, and you should be able to guess that it will add abort faults into an Istio Virtual Service. We can also see that the action is targeting the Virtual Service go-demo-8.

All in all, the add_abort_fault function will inject HTTP status 500 to the Virtual Service go-demo-8 that is identified through the destination with the host set to go-demo-8 and the subset set to primary. Further on, we can see that we have the percentage set to 50. So, fifty percent of the requests to go-demo-8 will be aborted. We also have the version of Istio that we’re using and the Namespace (ns) where that Virtual Service is residing.

So, we will be sending requests to the repeater, but we will be aborting those requests on the go-demo-8 API. That’s why we added an additional application. Since the repeater forwards requests to go-demo-8, we will be able to see what happens when we interact with one application that interacts with another while there is a cut in that communication between the two.

After we inject the abort, just to be sure that we are not too hasty, we’re going to give the system one second pause so that the abortion can be adequately propagated to the Virtual Service.

Now, let’s see what happens when we run this experiment. Can you guess? It should be obvious what happens if we abort 50% of the responses, and we are validating whether our application is responsive. Will all five requests that will be sent to our application return status code 200?

Running chaos experiment and inspecting the output#

Let’s run the experiment and see.

Enter to Rename, Shift+Enter to Preview

The output, without timestamps, is as follows.

[... INFO] Validating the experiment's syntax
[... INFO] Experiment looks valid
[... INFO] Running experiment: What happens if we abort responses
[... INFO] Steady state hypothesis: The app is healthy
[... INFO] Probe: app-responds-to-requests
[... INFO] Probe: app-responds-to-requests
[... INFO] Probe: app-responds-to-requests
[... INFO] Probe: app-responds-to-requests
[... INFO] Probe: app-responds-to-requests
[... INFO] Steady state hypothesis is met!
[... INFO] Action: abort-failure
[... INFO] Pausing after activity for 1s...
[... INFO] Steady state hypothesis: The app is healthy
[... INFO] Probe: app-responds-to-requests
[... CRITICAL] Steady state probe 'app-responds-to-requests' is not in the given tolerance so failing this experiment
[... INFO] Let's rollback...
[... INFO] No declared rollbacks, let's move on.
[... INFO] Experiment ended with status: deviated
[... INFO] The steady-state has deviated, a weakness may have been discovered

Please note that the output in your case could be different.

The probe was executed successfully five times. Then, the action added abort failures to the Istio Virtual Service. We were waiting for one second, and then we started re-running the probes.

We can see that, in my case, the first probe failed. I was unlucky. Given that approximately 50% should be unsuccessful, it could have been the second, third, or any other probe that failed, but my luck ran out right away. The first probe failed, and that was the end of the experiment. It is the first of five post-action probes. That was to be expected. One of those probes should have failed; it didn’t have to be the first one, though.

In the next lesson, we will find out how to roll back abort failures.

Discovering Chaos Toolkit Istio Plugin

Rolling Back Abort Failures

Mark as Completed

Report an Issue

Before We Begin

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

Conclusion

Aborting Network Requests

Inspecting the definition of `network.yaml`#

Repeating the requests#

Running chaos experiment and inspecting the output#

Aborting Network Requests

Inspecting the definition of network.yaml#

Repeating the requests#

Running chaos experiment and inspecting the output#

Inspecting the definition of `network.yaml`#