Rolling Back Failed Deployments

Learn how to roll back the failed deployments.

Why to roll back?#

Discovering a critical bug is probably the most common reason for a rollback. Still, there are others. For example, we might be in a situation when Pods cannot be created. An easy to reproduce case would be an attempt to deploy an image with a tag that does not exist.

Let's update the definition of our Image and deploy it again.

Updated definition of 'go-demo-2-api'

The output is as follows.

Output of the above command

After seeing such a message, you might be under the impression that everything is OK. However, that output only indicates that the definition of the image used in the Deployment was successfully updated. That does not mean that the Pods behind the ReplicaSet are indeed running. For one, I can assure you that the vfarcic/go-demo-2:does-not-exist image does not exist.

ℹ️ Please make sure that at least 60 seconds have passed since you executed the kubectl set image command. If you’re wondering why we are waiting, the answer lies in the progressDeadlineSeconds field set in the go-demo-2-api Deployment definition. That’s how much the Deployment has to wait before it deduces that it cannot progress due to a failure to run a Pod.

Verification#

Let’s take a look at the ReplicaSets.

Find replicaSets

The output is as follows.

Output of the above command

By now, under different circumstances, all the Pods from the new ReplicaSet (go-demo-2-api-dc7877dcd) should be set to 3, and the Pods of the previous one (go-demo-2-api-68c75f4f5) should have been scaled down to 0. However, the Deployment noticed that there is a problem and stopped the update process.

We should be able to get more detailed information with the kubectl rollout status command.

Find roll out status

The output is as follows.

Output of above command

The Deployment realized that it shouldn’t proceed. The new Pods are not running, and the limit was reached. There’s no point to continue trying.

If you expected that the Deployment would roll back after it failed, you’re wrong. It will not do such a thing. At least, not without additional addons. That does not mean that we would expect you to sit in front of your terminal, wait for timeouts, and check the rollout status before deciding whether to keep the new update or to roll back. You should deploy new releases as part of your automated CDP pipeline. Fortunately, the status command returns 1 if the deployment failed and we can use that information to decide what to do next. For those of you not living and breathing Linux, any exit code different than 0 is considered an error. Let’s confirm that by checking the exit code of the last command.

Status command

The output is indeed 1, thus confirming that the rollout failed.

We’ll explore automated CDP pipeline soon. For now, just remember that we can find out whether Deployment updates were successful or not.

Undo rollout#

Now that we discovered that our last rollout failed, we should undo it. You already know how to do that, but we’ll remind you just in case you’re of a forgetful nature.

Undo roll out

The output of the last command confirmed that deployment "go-demo-2-api" was successfully rolled out.

Now that we have learned how to rollback no matter whether the problem is a critical bug or inability to run the new release, we can take a short pause from learning new stuff and merge all the definitions we explored thus far into a single YAML file. But, before we do that, we’ll remove the objects we created.

Delete Everything

Try it yourself#

A list of all the commands used in this section is given below.

Commands used in this lesson

You can practice the commands in the following code playground by pressing the Run button and waiting for the cluster to set up:

/
go-demo-2-api.yml
go-demo-2-db-svc.yml
go-demo-2-db.yml
Roll back failed deployments
Playing around with the Deployment
Merging Everything into the Same YAML Definition
Mark as Completed
Report an Issue