1 00:00:00,210 --> 00:00:02,670 ‫Okay, so now let's talk about Amazon SageMaker. 2 00:00:02,670 --> 00:00:05,480 ‫So SageMaker is a fully managed service 3 00:00:05,480 --> 00:00:07,280 ‫for developers and data scientists 4 00:00:07,280 --> 00:00:08,950 ‫to build machine learning model. 5 00:00:08,950 --> 00:00:10,760 ‫So all of the services so far we've seen 6 00:00:10,760 --> 00:00:12,130 ‫in this section that are all 7 00:00:12,130 --> 00:00:14,180 ‫managed machine learning service with 8 00:00:14,180 --> 00:00:15,770 ‫a very specific purpose, 9 00:00:15,770 --> 00:00:18,140 ‫for example translates some text, 10 00:00:18,140 --> 00:00:19,500 ‫transcribes some audio, 11 00:00:19,500 --> 00:00:21,130 ‫or convert text into audio 12 00:00:21,130 --> 00:00:22,450 ‫or analyze parts of a text, 13 00:00:22,450 --> 00:00:25,220 ‫but SageMaker is a higher level 14 00:00:25,220 --> 00:00:26,980 ‫machine learning service where you have 15 00:00:26,980 --> 00:00:29,070 ‫your actual developers or your data scientists 16 00:00:29,070 --> 00:00:31,720 ‫within your organization create 17 00:00:31,720 --> 00:00:33,260 ‫and build machine learning model. 18 00:00:33,260 --> 00:00:34,900 ‫So it is a lot more involved 19 00:00:34,900 --> 00:00:37,180 ‫and a lot more difficult to use. 20 00:00:37,180 --> 00:00:40,090 ‫Now, when you want to do this kind of processes 21 00:00:40,090 --> 00:00:41,920 ‫to build a machine learning model, 22 00:00:41,920 --> 00:00:43,640 ‫you have to do a bunch of steps that I will show 23 00:00:43,640 --> 00:00:44,630 ‫you in a second, 24 00:00:44,630 --> 00:00:46,200 ‫and all these are quite difficult to do 25 00:00:46,200 --> 00:00:47,080 ‫in one place, 26 00:00:47,080 --> 00:00:49,500 ‫plus you need to provision some servers 27 00:00:49,500 --> 00:00:52,420 ‫to perform these competitions to create these models, 28 00:00:52,420 --> 00:00:54,380 ‫and that can be cumbersome as well. 29 00:00:54,380 --> 00:00:56,150 ‫So this is where SageMaker comes in. 30 00:00:56,150 --> 00:00:58,860 ‫So SageMaker will try to help you all along the way 31 00:00:58,860 --> 00:01:00,010 ‫for the process. 32 00:01:00,010 --> 00:01:02,595 ‫So I will show you what machine learning actually is, 33 00:01:02,595 --> 00:01:04,210 ‫and this is a simplified version, 34 00:01:04,210 --> 00:01:05,500 ‫if you are a machine learning expert 35 00:01:05,500 --> 00:01:07,500 ‫don't get mad at me please, 36 00:01:07,500 --> 00:01:09,380 ‫but let's say you wanted to build a model, 37 00:01:09,380 --> 00:01:11,170 ‫or let's say I wanted to build a model 38 00:01:11,170 --> 00:01:13,380 ‫to predict what score you're going to get 39 00:01:13,380 --> 00:01:15,780 ‫at your certified CLAP practitioner exam. 40 00:01:15,780 --> 00:01:17,020 ‫So how will I do it? 41 00:01:17,020 --> 00:01:18,900 ‫Say for example that I am a developer 42 00:01:18,900 --> 00:01:19,970 ‫or a data scientist, 43 00:01:19,970 --> 00:01:22,528 ‫so I'm going to gather all your data from 44 00:01:22,528 --> 00:01:24,840 ‫the actual performance of my students. 45 00:01:24,840 --> 00:01:26,707 ‫So I will ask maybe 10,000 students 46 00:01:26,707 --> 00:01:29,680 ‫to give me information about how many years of experience 47 00:01:29,680 --> 00:01:30,750 ‫in IT they had, 48 00:01:30,750 --> 00:01:32,680 ‫how many years of experience with AWS they had, 49 00:01:32,680 --> 00:01:34,100 ‫how much time they spent on the course, 50 00:01:34,100 --> 00:01:36,500 ‫how many practice exams they did, etc. etc, 51 00:01:36,500 --> 00:01:38,880 ‫so I'll try to gather as much data as possible, 52 00:01:38,880 --> 00:01:40,340 ‫and then I'm going to label the data. 53 00:01:40,340 --> 00:01:42,780 ‫So that means you need to say which columns 54 00:01:42,780 --> 00:01:43,630 ‫corresponds to what, 55 00:01:43,630 --> 00:01:45,753 ‫and also you need to give some kind of score, 56 00:01:45,753 --> 00:01:48,200 ‫and the score is the actual score for me 57 00:01:48,200 --> 00:01:49,680 ‫at the exam that someone obtains. 58 00:01:49,680 --> 00:01:50,910 ‫For example someone did not pass, 59 00:01:50,910 --> 00:01:54,160 ‫670, maybe he didn't do the course completely, 60 00:01:54,160 --> 00:01:55,320 ‫that would be the reason. 61 00:01:55,320 --> 00:01:57,610 ‫Maybe someone passed with high grade, 62 00:01:57,610 --> 00:02:02,610 ‫so 990 or maybe someone with an even higher grade 934, 63 00:02:03,000 --> 00:02:06,464 ‫so, each student will get a specific score, 64 00:02:06,464 --> 00:02:10,080 ‫and my guess is that based on the data that I've collected 65 00:02:10,080 --> 00:02:12,250 ‫I can predict what the score will be. 66 00:02:12,250 --> 00:02:13,610 ‫So I've first done the labeling, 67 00:02:13,610 --> 00:02:15,760 ‫and that labeling process can be quite complicated to do 68 00:02:15,760 --> 00:02:16,960 ‫in practice. 69 00:02:16,960 --> 00:02:18,820 ‫Then I need to build a machine learning model, 70 00:02:18,820 --> 00:02:21,530 ‫which is how can I predict these scores 71 00:02:21,530 --> 00:02:23,570 ‫from the historical data. 72 00:02:23,570 --> 00:02:25,040 ‫So, then you build that machine learning model 73 00:02:25,040 --> 00:02:26,480 ‫and then you have to train it 74 00:02:26,480 --> 00:02:27,313 ‫and tune it. 75 00:02:27,313 --> 00:02:29,610 ‫So this is another part that's actually quite difficult 76 00:02:29,610 --> 00:02:32,270 ‫to do, which is how to refine my model over time 77 00:02:32,270 --> 00:02:36,250 ‫to better fit my data and my outputs. 78 00:02:36,250 --> 00:02:39,210 ‫Okay, so now SageMaker and all of this will help 79 00:02:39,210 --> 00:02:40,100 ‫you with the labeling, 80 00:02:40,100 --> 00:02:41,440 ‫the building, the training 81 00:02:41,440 --> 00:02:43,030 ‫and tuning, but not only. 82 00:02:43,030 --> 00:02:44,760 ‫Now we have a machine learning model 83 00:02:44,760 --> 00:02:45,800 ‫and it is created, 84 00:02:45,800 --> 00:02:46,850 ‫it's fully working. 85 00:02:46,850 --> 00:02:48,350 ‫So now I need to use it. 86 00:02:48,350 --> 00:02:50,210 ‫So this is called deploying machine learning models. 87 00:02:50,210 --> 00:02:52,670 ‫So, we're going to get new data coming in. 88 00:02:52,670 --> 00:02:54,500 ‫For example you are the new student, 89 00:02:54,500 --> 00:02:55,780 ‫and I'm going to survey you, 90 00:02:55,780 --> 00:02:56,613 ‫and I'm going to say okay, 91 00:02:56,613 --> 00:02:58,100 ‫how many years of experience do you have in IT, 92 00:02:58,100 --> 00:03:00,400 ‫with AWS, how much time have you spent on this course, 93 00:03:00,400 --> 00:03:02,830 ‫and then I will apply based on this data 94 00:03:02,830 --> 00:03:03,810 ‫you just given me, 95 00:03:03,810 --> 00:03:06,240 ‫I will apply the machine learning model that I have created 96 00:03:06,240 --> 00:03:07,220 ‫from before. 97 00:03:07,220 --> 00:03:08,940 ‫And then the machine learning model will say, 98 00:03:08,940 --> 00:03:12,350 ‫for example, hey it sends based on the data you have 99 00:03:12,350 --> 00:03:14,250 ‫I'm going to predict that this student 100 00:03:14,250 --> 00:03:17,350 ‫will pass with a score of 906. 101 00:03:17,350 --> 00:03:18,910 ‫And this whole process, 102 00:03:18,910 --> 00:03:20,240 ‫of labeling, building the model, 103 00:03:20,240 --> 00:03:21,310 ‫training it, tuning it, 104 00:03:21,310 --> 00:03:24,530 ‫applying it can be all done within SageMaker. 105 00:03:24,530 --> 00:03:26,570 ‫So, that's it for a quick introduction, 106 00:03:26,570 --> 00:03:27,420 ‫but I hope you liked it, 107 00:03:27,420 --> 00:03:29,370 ‫and I will see you in the next lecture.