WEBVTT

00:00:01.040 --> 00:00:04.350
We're going to discuss more closely, virtualization

00:00:04.350 --> 00:00:07.330
risk and their remedies in the cloud.

00:00:07.440 --> 00:00:11.660
In the shared responsibility model, the water line is that the provider is

00:00:11.660 --> 00:00:15.130
responsible for physical security and the based infrastructure,

00:00:15.140 --> 00:00:18.790
which means they need to pay attention to the hypervisor security. Now,

00:00:18.800 --> 00:00:21.900
although hypervisors are now something that you can

00:00:21.910 --> 00:00:25.520
actually consume and configure, in most cases,

00:00:25.520 --> 00:00:27.880
this is on the side of the provider.

00:00:28.040 --> 00:00:29.480
Here are the major issues,

00:00:29.500 --> 00:00:34.030
security flaws that could be found in the hypervisor could be exposed by

00:00:34.030 --> 00:00:37.580
such threats like Spectre and things like Meltdown.

00:00:37.590 --> 00:00:41.080
You have inadequate granularity of controls that could be a

00:00:41.080 --> 00:00:43.130
lack of governance related to administrative,

00:00:43.130 --> 00:00:48.010
technical, and physical controls, and you have over subscription.

00:00:48.020 --> 00:00:52.880
The threats that the security flaw can create is something known as

00:00:52.880 --> 00:00:59.900
hyperjacking or VM hopping. This all happens because a tenant may be

00:00:59.900 --> 00:01:05.370
capable of escaping from their host operating system. From the standpoint

00:01:05.370 --> 00:01:08.130
of inadequate granularity of controls,

00:01:08.140 --> 00:01:11.330
there may be a lack of attention from a governance

00:01:11.330 --> 00:01:16.380
perspective to have administrative, physical, and technical controls operational.

00:01:16.390 --> 00:01:20.430
The other element of there being oversubscription of services,

00:01:20.430 --> 00:01:25.700
the actual threat could be that there are lack of controls at the provider's

00:01:25.700 --> 00:01:30.990
location that would prevent the limits, and the reservations, and the shares

00:01:30.990 --> 00:01:33.490
from being active that we talked about earlier.

00:01:33.560 --> 00:01:34.190
Again,

00:01:34.200 --> 00:01:37.550
it must be stated, most of this would most likely be on the

00:01:37.550 --> 00:01:41.580
cloud service provider's side of the water line.

00:01:41.590 --> 00:01:45.580
Let's look at the cloud service providers that dominate the market.

00:01:46.140 --> 00:01:52.040
Amazon, their AWS service has Xen and Nitro for their hypervisor

00:01:52.040 --> 00:01:59.610
currently, Google uses KVM, and Azure uses Windows Hyper‑V. Let's take

00:01:59.610 --> 00:02:04.290
a look at what they say about their own product and the security that

00:02:04.290 --> 00:02:06.750
they publicized to the public.

00:02:06.920 --> 00:02:13.810
AWS's Nitro security states that they use what are called enclaves now. This

00:02:13.820 --> 00:02:19.470
enclave technology is hypervisor technology that provides CPU and memory

00:02:19.480 --> 00:02:25.390
isolation for their EC2 instances. They get this accomplished by means of a

00:02:25.390 --> 00:02:31.370
Trusted Platform Module, or a TPM, currently at 2.0. This is a security

00:02:31.370 --> 00:02:35.470
compatibility feature that makes it easier for customers to use applications

00:02:35.470 --> 00:02:40.620
and operating systems, capabilities that depend on TPMs in their EC2

00:02:40.620 --> 00:02:46.360
instances. And it also conforms to the TPM 2.0 specification, which makes it

00:02:46.370 --> 00:02:52.300
easy to migrate existing on‑premise workloads that use TPM functionalities to

00:02:53.240 --> 00:02:58.670
EC2. This, in turn, helps to provide a secure cryptographic offload using

00:02:58.680 --> 00:03:03.990
AWS's Nitro System, and allows for EC2 instances to generate,

00:03:03.990 --> 00:03:08.730
store, and use keys without having to access the same keys.

00:03:08.740 --> 00:03:09.480
Additionally,

00:03:09.480 --> 00:03:14.370
they have an internal governance system that they call lockdown

00:03:14.370 --> 00:03:17.350
security model, which prohibits all administrative access,

00:03:17.350 --> 00:03:20.330
including those of Amazon employees,

00:03:20.340 --> 00:03:23.400
eliminating the possibility of human error and tampering.

00:03:23.840 --> 00:03:28.540
On the Google side of the world, they use KVM, which is

00:03:28.540 --> 00:03:30.860
Kernel‑based Virtual Machine.

00:03:30.870 --> 00:03:34.660
This is an open‑source virtualization technology built into Linux.

00:03:34.850 --> 00:03:39.510
KVM lets you turn Linux into a hypervisor that allows a host

00:03:39.510 --> 00:03:43.340
machine to run multiple isolated virtual environments called

00:03:43.340 --> 00:03:45.040
guest or virtual machines.

00:03:45.050 --> 00:03:47.090
This is, in fact, their hypervisor.

00:03:47.100 --> 00:03:52.130
The version of Linux that was released after 2007 is the one that could be

00:03:52.130 --> 00:03:58.500
installed on an x86 architecture, and it supports virtualization capabilities.

00:03:58.940 --> 00:04:03.840
Looking more closely at their description, one of the things that Google uses is

00:04:03.840 --> 00:04:07.830
proactive vulnerability searching. These are multiple layers of security and

00:04:07.830 --> 00:04:13.290
isolation built into Google's Kernel‑based Virtual Machine, and they're always

00:04:13.290 --> 00:04:15.130
working to strengthen them.

00:04:15.130 --> 00:04:19.899
Google's cloud security staff includes experts in the world of KVM

00:04:19.899 --> 00:04:24.650
security, and they've uncovered multiple vulnerabilities in KVM in

00:04:24.650 --> 00:04:27.460
Xen and VMware hypervisors over the years.

00:04:27.640 --> 00:04:33.610
They reduce their attack surface by running in a non‑QEMU environment.

00:04:33.620 --> 00:04:37.550
The reason why they don't use this is because there is a history

00:04:37.550 --> 00:04:42.330
of security problems with QEMU, a long track record of security

00:04:42.330 --> 00:04:46.270
bugs such as VENOM, and it's unclear what vulnerabilities may

00:04:46.270 --> 00:04:47.950
still be lurking in the code.

00:04:48.040 --> 00:04:54.030
All jobs that are running on the VMs will only run by means of cryptographic

00:04:54.030 --> 00:04:58.560
key sharing, with jobs running on that host helping to make sure that all

00:04:58.560 --> 00:05:03.600
communication between jobs are actually encrypted. Code providence means

00:05:03.600 --> 00:05:08.510
that they run a custom binary and configuration verification system that was

00:05:08.510 --> 00:05:13.840
developed and integrated into their own development process to track which

00:05:13.840 --> 00:05:15.030
code is running.

00:05:15.040 --> 00:05:18.040
Imagine a monitor that only allows code that has

00:05:18.040 --> 00:05:21.160
been preapproved to be executed.

00:05:21.440 --> 00:05:27.380
Their rapid and graceful vulnerability response defines a strict set of SLAs,

00:05:27.380 --> 00:05:32.130
both internal and to their customer, for processing patch management in the

00:05:32.130 --> 00:05:36.640
KVM environment in an event that a critical security vulnerability is

00:05:36.650 --> 00:05:40.510
discovered. Their internal infrastructure helps to maximize security

00:05:40.510 --> 00:05:45.100
protection, and then to meet all applicable compliance requirements externally

00:05:45.200 --> 00:05:49.140
to notify customers of updates and any kind of contractual and legal

00:05:49.150 --> 00:05:50.860
obligations they have to do this.

00:05:51.140 --> 00:05:54.530
All of their releases are based upon stringent rollout

00:05:54.530 --> 00:05:57.720
policies and processes for their KVM updates.

00:05:57.730 --> 00:06:01.800
This is also driven by compliance requirements for Google cloud security

00:06:01.800 --> 00:06:06.270
controls, and there's only a small team of Google employees that has access to

00:06:06.270 --> 00:06:09.650
the KVM build system and release management control.

00:06:10.040 --> 00:06:14.880
Azure capitalizes on host‑based isolation in a

00:06:14.880 --> 00:06:18.540
process of hosting cross‑VM components.

00:06:18.550 --> 00:06:24.210
They use Virtualization‑based security, known as VBS, for ensuring the

00:06:24.210 --> 00:06:28.910
integrity of user and kernel mode components from a secure world so the

00:06:28.910 --> 00:06:34.760
isolation continues down into the foundation of the hypervisor.

00:06:35.000 --> 00:06:40.020
They have multiple levels of exploit mitigations. Mitigations include

00:06:40.030 --> 00:06:44.690
Address Space Layout Randomization, so that if a virtual machine is

00:06:44.690 --> 00:06:47.830
somehow being monitored in a side channel,

00:06:47.840 --> 00:06:51.870
the data that is being committed to disk is actually not in

00:06:51.870 --> 00:06:54.760
an order that could be easily figured out.

00:06:55.040 --> 00:06:59.770
There's also controlled initiation of stack variables that

00:06:59.770 --> 00:07:03.940
go all the way down to the level of the compilation of the

00:07:03.940 --> 00:07:06.070
code or the compiler level.

00:07:06.140 --> 00:07:08.170
They also have zero‑initialize,

00:07:08.180 --> 00:07:13.880
which could potentially help to block injections into the OS level.

00:07:13.890 --> 00:07:18.490
Here, the kernel APIs that automatically zero‑initialize kernel heap

00:07:18.500 --> 00:07:24.420
allocations are made by Hyper‑V. There are various other risk

00:07:24.430 --> 00:07:29.570
associated with consuming certain levels of the SPI model, including

00:07:29.570 --> 00:07:33.300
having people from the inside of your organization initiating

00:07:33.310 --> 00:07:34.990
unauthorized workloads.

00:07:35.000 --> 00:07:38.470
The simplicity of starting a new workload is as

00:07:38.470 --> 00:07:40.710
simple as pulling out a credit card.

00:07:40.720 --> 00:07:45.050
East‑West movement of advanced persistent threats could be another issue.

00:07:45.140 --> 00:07:49.410
Again, depending on the level of consumption that you're at, if you've used

00:07:49.410 --> 00:07:54.940
cloud‑native architecture, it will be less likely that East‑West or lateral

00:07:54.940 --> 00:08:00.000
movement of advanced persistent threats are likely. Improperly trained staff

00:08:00.010 --> 00:08:05.730
anywhere along the SPI stack, applications that don't have security from

00:08:05.730 --> 00:08:11.110
their initiation in the design process, also, lack of due care and due

00:08:11.110 --> 00:08:15.230
diligence with the granularity of controls that are implemented, and just

00:08:15.240 --> 00:08:20.850
overall Shadow IT, the same things that cause this unauthorized workloads.

00:08:21.030 --> 00:08:27.060
Google actually mentioned in their report of 2021 for cloud threat

00:08:27.060 --> 00:08:32.809
intelligence, that a large number of compromises, most of them are actually

00:08:32.809 --> 00:08:39.179
used to perform cryptocurrency mining. The vulnerability that is made so that

00:08:39.179 --> 00:08:44.780
the systems can actually be overtaken and instances can be installed is

00:08:44.780 --> 00:08:48.590
usually by means of some type of third‑party software, something for us to

00:08:48.590 --> 00:08:50.660
really watch for in the supply chain.

00:08:52.340 --> 00:08:57.140
The main thing that we want to be thinking about is how do we

00:08:57.140 --> 00:09:01.150
limit the blast radius? It's not that we won't be attacked,

00:09:01.160 --> 00:09:05.840
but if an attack is successful, how well do we contain that attack?

00:09:05.850 --> 00:09:09.580
Join me over in the next clip as we look at a major

00:09:09.580 --> 00:09:12.610
element to success, Zero Trust.