Running Denial of Service Attacks

In this lesson, we will run a chaos experiment for DoS and observe how our application responds. We will also explore the logs.

We'll cover the following

Inspecting the definition of network-dos.yaml
Running the chaos experiment and inspecting the output
Exploring the logs
Assignment

Now that you are familiar with Siege, and that you have seen a “trick” baked in the go-demo-8 app that allows us to limit the number of requests the application can handle, we can construct a chaos experiment that will check how the application behaves when under Denial of Service attack.

Inspecting the definition of `network-dos.yaml`#

Let’s take a look at yet another chaos experiment definition.

Enter to Rename, Shift+Enter to Preview

The output is as follows.

version: 1.0.0
title: What happens if we abort responses
description: If responses are aborted, the dependant application should retry and/or timeout requests
tags:
- k8s
- pod
- deployment
- istio
configuration:
  ingress_host:
      type: env
      key: INGRESS_HOST
steady-state-hypothesis:
  title: The app is healthy
  probes:
  - type: probe
    name: app-responds-to-requests
    tolerance: 200
    provider:
      type: http
      timeout: 5
      verify_tls: false
      url: http://＄{ingress_host}?addr=http://go-demo-8/limiter
      headers:
        Host: repeater.acme.com
method:
- type: action
  name: abort-failure
  provider:
    type: process
    path: kubectl
    arguments:
    - run
    - siege
    - --namespace
    - go-demo-8
    - --image
    - yokogawa/siege
    - --generator
    - run-pod/v1
    - -it
    - --rm
    - -- 
    - --concurrent
    - 50
    - --time
    - 20S
    - "http://go-demo-8/limiter"
  pauses: 
    after: 5

We have a steady-state hypothesis, which validates that our application does respond with 200 on the endpoint /limiter. Then, we have an action with the type of the provider set to process. This another reason why I’m showing you that definition. Besides simulating Denial of Service and how to send an increased number of concurrent requests to our applications, I’m using this opportunity to explore yet another provider type.

The process provider allows us to execute any command. This is very useful in cases when none of the Chaos Toolkit plugins will enable us to do what we need.

We can always accomplish goals that are not available through plugins by using the process provider, which can execute any command. It could be a script, a shell command, or anything else, as long as it is executable. In this case, the path is kubectl (a command) followed by a list of arguments. Those are the same we just executed manually. We’ll be sending fifty concurrent requests for 20 seconds to the /limiter endpoint.

Running the chaos experiment and inspecting the output#

Let’s run this experiment and see what happens.

Enter to Rename, Shift+Enter to Preview

[2020-03-13 23:51:28 INFO] Validating the experiment's syntax
[2020-03-13 23:51:28 INFO] Experiment looks valid
[2020-03-13 23:51:28 INFO] Running experiment: What happens if we abort responses
[2020-03-13 23:51:28 INFO] Steady state hypothesis: The app is healthy
[2020-03-13 23:51:28 INFO] Probe: app-responds-to-requests
[2020-03-13 23:51:28 INFO] Steady state hypothesis is met!
[2020-03-13 23:51:28 INFO] Action: abort-failure
[2020-03-13 23:51:52 INFO] Pausing after activity for 5s...
[2020-03-13 23:51:57 INFO] Steady state hypothesis: The app is healthy
[2020-03-13 23:51:57 INFO] Probe: app-responds-to-requests
[2020-03-13 23:51:57 CRITICAL] Steady state probe 'app-responds-to-requests' is not in the given tolerance so failing this experiment
[2020-03-13 23:51:57 INFO] Let's rollback...
[2020-03-13 23:51:57 INFO] No declared rollbacks, let's move on.
[2020-03-13 23:51:57 INFO] Experiment ended with status: deviated
[2020-03-13 23:51:57 INFO] The steady-state has deviated, a weakness may have been discovered

We can see that, after the initial probe was successful, we executed an action that ran the siege Pod. After that, the probe ran again, and it failed. Our application failed to respond because it was under a heavy load, and it collapsed. It couldn’t handle that amount of traffic. This time, the amount of traffic was ridiculously low, and that’s why we’re simulating DoS attacks. However, in a “real world” situation, you would send high volumes, maybe thousands or hundreds of thousands of concurrent requests, and see whether your application is responsive after that. In this case, we were cheating by configuring the application to handle a very low number of requests.

We can see that, in this case, the application cannot handle the load. The experiment failed.

Exploring the logs#

The output in front of us is not very descriptive. We probably wouldn’t be able to deduce the cause of the issue just by looking at it. Fortunately, that’s only the list of events and their statuses, and more information is available. Every time we run an experiment, we get a chaostoolkit.log file that stores detailed logs of what happened in case we need additional information. Let’s take a look at the log for this scenario.

Enter to Rename, Shift+Enter to Preview

The output is too big to be presented in a lesson, so I’ll let you explore it on your screen. You should see the (poorly formatted) output from siege, which it gives us the same info as when we run it manually.

All in all, if you need more information, you can always find it in chaostoolkit.log. Think of it as debug info.

What would be the fix for this situation?

If you’re waiting for me to give you the answer, you’re out of luck. Just like the end of the previous section, I have a task for you. You’re getting yet another homework assignment.

Try to figure out how to handle the situation we explored through the last experiment.

I will give you just a small tip that might help you know what to look for. In Istio, we can use circuit breakers to limit the number of requests coming to an endpoint. In case of a Denial of Service attack, or a sudden increase in the number of requests, we can use circuit breakers in Istio, or almost any other service mesh, to control the maximum number of concurrent requests that an application should receive.

Assignment#

Now it’s your turn. Do the homework. Explore circuit breakers, and try to figure out how you would implement them for your applications. Use a chaos experiment to confirm that it fails before the changes are implemented and that it passes after. The goal is to figure out how to prevent a situation like this from becoming a disaster.

I know that your application has a limit. Every application does. How will you handle a sudden outburst of requests that is way above what your app can handle at any given moment?

In the next lesson, we will remove the resources that we have created.

Simulating Denial of Service Attacks

Destroying What We Created

Mark as Completed

Report an Issue

Before We Begin

Introduction To Kubernetes Chaos Engineering

Defining Requirements

Destroying Application Instances

Experimenting with Application Availability

Obstructing and Destroying Network

Draining and Deleting Nodes

Creating Chaos Experiment Reports

Running Chaos Experiments Inside a Kubernetes Cluster

Executing Random Chaos

Conclusion

Running Denial of Service Attacks

Inspecting the definition of `network-dos.yaml`#

Running the chaos experiment and inspecting the output#

Exploring the logs#

Assignment#

Running Denial of Service Attacks

Inspecting the definition of network-dos.yaml#

Running the chaos experiment and inspecting the output#

Exploring the logs#

Assignment#

Inspecting the definition of `network-dos.yaml`#