1 00:00:00,000 --> 00:00:01,960 Now we're getting into content delivery 2 00:00:01,960 --> 00:00:03,810 and we'll start by CloudFront. 3 00:00:03,810 --> 00:00:07,790 So CloudFront is a content delivery network or CDN. 4 00:00:07,790 --> 00:00:10,520 And what it does, is it improves read performance, 5 00:00:10,520 --> 00:00:13,033 because the content is going to be distributed 6 00:00:13,033 --> 00:00:15,550 and cached at the edge locations 7 00:00:15,550 --> 00:00:18,520 and edge locations are all around the world. 8 00:00:18,520 --> 00:00:22,360 And there's about 216 points of presence globally, 9 00:00:22,360 --> 00:00:23,570 as I'm recording this lecture 10 00:00:23,570 --> 00:00:26,930 and they are all the time, new points of presence. 11 00:00:26,930 --> 00:00:29,200 So it's much more than the 30 something regions 12 00:00:29,200 --> 00:00:30,200 that always has. 13 00:00:30,200 --> 00:00:32,159 This is a worldwide thing. 14 00:00:32,159 --> 00:00:33,520 And so what this platform gives you 15 00:00:33,520 --> 00:00:36,460 on top of this caching at the edge, 16 00:00:36,460 --> 00:00:38,300 well, it gives you DDoS protection. 17 00:00:38,300 --> 00:00:41,040 So to protect against attack that are 18 00:00:41,040 --> 00:00:42,530 distributed denial of service, 19 00:00:42,530 --> 00:00:44,740 it gives you integration with a shield 20 00:00:44,740 --> 00:00:46,310 and also a web application firewall. 21 00:00:46,310 --> 00:00:49,560 We'll see those into the security section of this course. 22 00:00:49,560 --> 00:00:51,170 But the idea is that it's really protected 23 00:00:51,170 --> 00:00:53,380 and it's a good way to front your applications 24 00:00:53,380 --> 00:00:55,030 when you deploy them globally. 25 00:00:55,030 --> 00:00:58,310 And also allow you to expose an external HTTPS endpoint 26 00:00:58,310 --> 00:00:59,690 by loading the certificates 27 00:00:59,690 --> 00:01:02,760 and also talk internally in HTTPS to your applications 28 00:01:02,760 --> 00:01:05,190 if you need to encrypt that traffic as well. 29 00:01:05,190 --> 00:01:06,330 So let's take a diagram. 30 00:01:06,330 --> 00:01:08,160 So this is a map of the world 31 00:01:08,160 --> 00:01:12,800 and there are some orange regions and their edge. 32 00:01:12,800 --> 00:01:14,400 Everything on this graph is edge. 33 00:01:14,400 --> 00:01:16,440 But as you can see, it's all around the globe 34 00:01:16,440 --> 00:01:19,550 and so for example, say we have an S3 bucket in Australia 35 00:01:19,550 --> 00:01:22,000 and some user from America wants to access it, 36 00:01:22,000 --> 00:01:25,380 it's actually going to access an edge location close to it. 37 00:01:25,380 --> 00:01:28,770 So in America and that network is going to be transmitted 38 00:01:28,770 --> 00:01:32,260 over the private AWS network, all the way to the S3 buckets, 39 00:01:32,260 --> 00:01:34,040 and the content is going to be cached. 40 00:01:34,040 --> 00:01:36,530 So the idea is that this American user, 41 00:01:36,530 --> 00:01:38,270 with the more users you have in America, 42 00:01:38,270 --> 00:01:40,700 the more they will want to do the same kind of reads. 43 00:01:40,700 --> 00:01:44,310 And they will all have content served directly from America, 44 00:01:44,310 --> 00:01:45,850 not necessarily from Australia, 45 00:01:45,850 --> 00:01:47,850 because it will be fetched once into America 46 00:01:47,850 --> 00:01:50,490 and then served from there so cached locally. 47 00:01:50,490 --> 00:01:52,950 So another user, maybe in Asia, 48 00:01:52,950 --> 00:01:56,020 will talk to a edge location closer to Asia 49 00:01:56,020 --> 00:01:57,230 and that edge location again, 50 00:01:57,230 --> 00:01:59,360 will support traffic to the S3 buckets 51 00:01:59,360 --> 00:02:02,050 to get the content and then cache it at the edge. 52 00:02:02,050 --> 00:02:03,280 So CloudFront allows you really to 53 00:02:03,280 --> 00:02:05,400 distribute your reads all around the world 54 00:02:05,400 --> 00:02:07,200 based on these different edge locations. 55 00:02:07,200 --> 00:02:08,729 And we improve latency 56 00:02:08,729 --> 00:02:11,373 and reduce the load on your main S3 buckets. 57 00:02:12,896 --> 00:02:14,080 So I said S3 buckets. 58 00:02:14,080 --> 00:02:16,600 But what are the different CloudFront origins? 59 00:02:16,600 --> 00:02:18,280 Well, the first one is an S3 bucket 60 00:02:18,280 --> 00:02:20,320 and you would use CloudFront in front of S3 61 00:02:20,320 --> 00:02:21,870 as a very common pattern 62 00:02:21,870 --> 00:02:24,000 to distribute your files globally 63 00:02:24,000 --> 00:02:26,090 and cache them at the edge. 64 00:02:26,090 --> 00:02:29,550 You also get enhanced security, as we'll see in the hands on 65 00:02:29,550 --> 00:02:31,680 between CloudFront and your S3 buckets 66 00:02:31,680 --> 00:02:35,950 using your CloudFront OAI or origin access identity. 67 00:02:35,950 --> 00:02:38,660 And this allows your S3 bucket to only 68 00:02:38,660 --> 00:02:41,950 allow communication from CloudFront and from nowhere else. 69 00:02:41,950 --> 00:02:43,160 And then finally, 70 00:02:43,160 --> 00:02:45,280 you could also use CloudFront as an ingress, 71 00:02:45,280 --> 00:02:49,570 to upload files into S3 from anywhere in the world. 72 00:02:49,570 --> 00:02:52,830 Okay, the other option is to use custom origin 73 00:02:52,830 --> 00:02:55,230 and there must be an HTTP endpoints. 74 00:02:55,230 --> 00:02:57,890 So this could be anything that respects the HTTP protocol. 75 00:02:57,890 --> 00:03:00,260 So it could be an Application load balancer, 76 00:03:00,260 --> 00:03:03,890 it could be an EC2 instance, it can be an S3 website. 77 00:03:03,890 --> 00:03:08,080 But we first must enable the bucket as a static S3 website 78 00:03:08,080 --> 00:03:10,500 and note that it is different from an S3 buckets. 79 00:03:10,500 --> 00:03:12,030 And this Debug website, 80 00:03:12,030 --> 00:03:14,370 we need to enable that setting as we've seen before. 81 00:03:14,370 --> 00:03:17,360 And we could be any HTTP backend you want, 82 00:03:17,360 --> 00:03:20,830 for example, if it was on your own premises infrastructure. 83 00:03:20,830 --> 00:03:24,180 Okay, how this platform work at a high level. 84 00:03:24,180 --> 00:03:27,520 So we have a bunch of edge locations all around the globe. 85 00:03:27,520 --> 00:03:30,200 And they're connected to the origin we defined, 86 00:03:30,200 --> 00:03:31,350 it could be an S3 buckets 87 00:03:31,350 --> 00:03:34,210 or it could be any HTTP endpoints 88 00:03:34,210 --> 00:03:38,060 and our clients wants to access our CloudFront distribution. 89 00:03:38,060 --> 00:03:40,510 For doing this, the client will send 90 00:03:40,510 --> 00:03:43,210 an HTTP request directly into CloudFront 91 00:03:43,210 --> 00:03:45,770 and this is what an HTTP request would look like. 92 00:03:45,770 --> 00:03:49,280 There will be a URL, some query string parameters 93 00:03:49,280 --> 00:03:51,640 and there will be also some headers. 94 00:03:51,640 --> 00:03:53,350 And then the edge location 95 00:03:53,350 --> 00:03:56,270 will forward the request to your origin 96 00:03:56,270 --> 00:03:58,490 and that includes the query strings 97 00:03:58,490 --> 00:04:00,740 and that includes the headers, 98 00:04:00,740 --> 00:04:03,550 so everything gets forwarded on to your origin. 99 00:04:03,550 --> 00:04:04,600 You can configure this 100 00:04:04,600 --> 00:04:07,420 and then your origin response to the edge location. 101 00:04:07,420 --> 00:04:10,020 The edge location will cache the response 102 00:04:10,020 --> 00:04:12,090 based on the cache settings we've defined 103 00:04:12,090 --> 00:04:15,720 and return the response back to our clients. 104 00:04:15,720 --> 00:04:19,500 And the next time another client makes a similar request, 105 00:04:19,500 --> 00:04:22,100 the edge location will first look into the cache 106 00:04:22,100 --> 00:04:24,250 before forwarding the request to the origin. 107 00:04:24,250 --> 00:04:26,530 That is the whole purpose of having a CDN. 108 00:04:26,530 --> 00:04:28,250 Okay, so very, very simple. 109 00:04:28,250 --> 00:04:30,450 This is how CloudFront works at a high level. 110 00:04:30,450 --> 00:04:33,240 So let's look at S3 as an origin in details. 111 00:04:33,240 --> 00:04:35,800 So you have the cloud and you have your origin, 112 00:04:35,800 --> 00:04:37,280 which is your S3 buckets. 113 00:04:37,280 --> 00:04:39,660 And for example, you have an edge location in Los Angeles 114 00:04:39,660 --> 00:04:43,810 and some users want to read some data from there. 115 00:04:43,810 --> 00:04:46,630 So your edge location is going to fetch the data 116 00:04:46,630 --> 00:04:49,830 from your S3 buckets over the private AWS network 117 00:04:49,830 --> 00:04:52,830 and give you the results from that edge location. 118 00:04:52,830 --> 00:04:56,520 The idea here is that for the edge location of CloudFront 119 00:04:56,520 --> 00:04:59,920 to access your S3 buckets is going to use an OAI 120 00:04:59,920 --> 00:05:02,770 or an origin access identity is IAM role 121 00:05:02,770 --> 00:05:05,150 for your CloudFront origin. 122 00:05:05,150 --> 00:05:08,682 And using that role is going to access your S3 buckets 123 00:05:08,682 --> 00:05:10,470 and the bucket policy is going to say yes, 124 00:05:10,470 --> 00:05:13,050 this role is accessible and yes, 125 00:05:13,050 --> 00:05:14,820 send the file to CloudFront. 126 00:05:14,820 --> 00:05:17,670 So this works as well for other edge locations for example, 127 00:05:17,670 --> 00:05:20,670 in Sao Paulo in Brazil, or Mumbai, or Melbourne. 128 00:05:20,670 --> 00:05:22,060 And so all around the world, 129 00:05:22,060 --> 00:05:24,820 your edge locations are going to serve cached content 130 00:05:24,820 --> 00:05:27,200 from your S3 buckets and so we can see how CloudFront 131 00:05:27,200 --> 00:05:31,190 can become super helpful as a CDN. 132 00:05:31,190 --> 00:05:34,390 Now, what if you have a ALB or EC2 two as an origin? 133 00:05:34,390 --> 00:05:36,330 The security changes a little bit. 134 00:05:36,330 --> 00:05:39,340 So we have our EC2 instance or instances 135 00:05:39,340 --> 00:05:40,890 and they must be public because 136 00:05:40,890 --> 00:05:43,940 they must be publicly accessible from HTTP standpoint 137 00:05:43,940 --> 00:05:46,290 and we have our users all around the world. 138 00:05:46,290 --> 00:05:48,260 So they will access our edge location 139 00:05:48,260 --> 00:05:51,697 and our edge location will access our EC2 instance 140 00:05:51,697 --> 00:05:54,840 and as you can see, it traverses the security group. 141 00:05:54,840 --> 00:05:57,430 So the security group must allow the IPs 142 00:05:57,430 --> 00:06:01,600 of CloudFront edge locations into the EC2 instance. 143 00:06:01,600 --> 00:06:04,160 And for this, there is a list of public IP 144 00:06:04,160 --> 00:06:06,850 for edge locations that you can get on this website. 145 00:06:06,850 --> 00:06:09,120 And the idea is that the security group must allow 146 00:06:09,120 --> 00:06:11,140 all these public IP of edge locations 147 00:06:11,140 --> 00:06:13,070 to allow CloudFront to fetch content 148 00:06:13,070 --> 00:06:15,210 from your EC2 instances. 149 00:06:15,210 --> 00:06:16,120 So that makes sense. 150 00:06:16,120 --> 00:06:19,120 What if we use an ALB as an origin. 151 00:06:19,120 --> 00:06:21,723 So now we have a security group for the a ALB 152 00:06:21,723 --> 00:06:25,030 and the ALB must be public to be accessible by CloudFront. 153 00:06:25,030 --> 00:06:27,720 But the backend EC2 instances now can be private. 154 00:06:27,720 --> 00:06:31,110 And so in terms of security group for the EC2 instances, 155 00:06:31,110 --> 00:06:33,310 EC2 allow the security group of the load balancer, 156 00:06:33,310 --> 00:06:34,950 we've seen this extensively. 157 00:06:34,950 --> 00:06:36,480 And for the edge location, 158 00:06:36,480 --> 00:06:38,440 which are again, public locations, 159 00:06:38,440 --> 00:06:42,180 it needs to access your ALB through the public network. 160 00:06:42,180 --> 00:06:44,930 And so that means that your security group for your ALB 161 00:06:44,930 --> 00:06:47,420 must allow the public IP of the edge locations 162 00:06:47,420 --> 00:06:49,620 the same public IP as we had from before. 163 00:06:49,620 --> 00:06:51,740 So two different architectures, same concept 164 00:06:51,740 --> 00:06:55,500 but we better understand network security for S3 for ALB 165 00:06:55,500 --> 00:06:59,753 or EC2 in front or behind I must say CloudFront. 166 00:07:00,600 --> 00:07:01,433 Now CloudFront is a CDN 167 00:07:01,433 --> 00:07:03,300 and it also has some really nice features. 168 00:07:03,300 --> 00:07:05,000 One of them is geo restriction. 169 00:07:05,000 --> 00:07:07,710 So you can restrict who can access your distribution. 170 00:07:07,710 --> 00:07:09,000 So you can provide a white list. 171 00:07:09,000 --> 00:07:10,080 We're saying, okay, 172 00:07:10,080 --> 00:07:12,620 users from this list of approved countries 173 00:07:12,620 --> 00:07:15,610 and only this list can go to a CloudFront. 174 00:07:15,610 --> 00:07:17,230 Or we can say blacklist, where he's saying, 175 00:07:17,230 --> 00:07:19,330 okay, the users from these countries 176 00:07:19,330 --> 00:07:22,460 are not allowed to access our distribution. 177 00:07:22,460 --> 00:07:24,450 And the way the country is determined, 178 00:07:24,450 --> 00:07:26,900 is using a third party Geo-IP database 179 00:07:26,900 --> 00:07:28,730 where the incoming IP is matched against it 180 00:07:28,730 --> 00:07:30,440 to figure out the country. 181 00:07:30,440 --> 00:07:32,250 So the use case for jurisdiction 182 00:07:32,250 --> 00:07:33,860 will be when you have copyright laws 183 00:07:33,860 --> 00:07:35,430 to prevent access to your content. 184 00:07:35,430 --> 00:07:37,090 And you want to prove to regulators 185 00:07:37,090 --> 00:07:40,180 that you are indeed restricting content access from, 186 00:07:40,180 --> 00:07:43,230 say, France if you have content in America. 187 00:07:43,230 --> 00:07:45,000 Okay, now you may be asking yourself 188 00:07:45,000 --> 00:07:46,750 what is really the difference between CloudFront 189 00:07:46,750 --> 00:07:49,940 and something like S3 cross region replication. 190 00:07:49,940 --> 00:07:52,730 So CloudFront is using a global edge network 191 00:07:52,730 --> 00:07:55,460 and files are going to be cached for a TTL. 192 00:07:55,460 --> 00:07:57,520 So a time to live maybe for a day. 193 00:07:57,520 --> 00:07:59,780 So it's great when you have static content 194 00:07:59,780 --> 00:08:03,510 that must be available everywhere around the world, okay? 195 00:08:03,510 --> 00:08:05,120 And maybe you are okay with 196 00:08:05,120 --> 00:08:07,780 if that content is outdated or a little bit. 197 00:08:07,780 --> 00:08:09,950 Now for S3 cross region replication, 198 00:08:09,950 --> 00:08:11,710 it must be set up for each region 199 00:08:11,710 --> 00:08:14,480 in which you want to have replication to happen. 200 00:08:14,480 --> 00:08:17,410 And the files will be updated in near real time, 201 00:08:17,410 --> 00:08:18,450 it's going to be read only 202 00:08:18,450 --> 00:08:20,200 so is going to help you with read performance. 203 00:08:20,200 --> 00:08:22,020 So S3 cross region replication 204 00:08:22,020 --> 00:08:24,410 will be great if you have dynamic content 205 00:08:24,410 --> 00:08:26,510 that needs to be available at low latency 206 00:08:26,510 --> 00:08:29,070 in a few amount of regions. 207 00:08:29,070 --> 00:08:30,790 Hope that makes sense, hope that's very clear. 208 00:08:30,790 --> 00:08:32,590 Platform is for catching globally 209 00:08:32,590 --> 00:08:34,049 and S3 cross region replication 210 00:08:34,049 --> 00:08:36,600 for replication into select regions. 211 00:08:36,600 --> 00:08:37,990 All right, so that's it for this lecture. 212 00:08:37,990 --> 00:08:40,363 I will see you in the next lecture for some hands on.