0 1 00:00:00,580 --> 00:00:06,840 Okay. So firstly, let's talk about Supervised Learning. And supervised learning really is like having a teacher 1 2 00:00:06,840 --> 00:00:12,240 there, having a trainer there, who's guiding you step by step, telling you, "That was right, that was wrong," 2 3 00:00:12,570 --> 00:00:13,830 or "This is a flower. 3 4 00:00:13,830 --> 00:00:14,850 This is a plant." 4 5 00:00:14,910 --> 00:00:18,360 It's a sort of handholding way of teaching the computer what to do. 5 6 00:00:18,360 --> 00:00:24,750 So one of the most famous examples of supervised learning is training in computer to recognize a cat, 6 7 00:00:24,750 --> 00:00:25,020 right? 7 8 00:00:25,050 --> 00:00:28,480 So we would say to the computer, "This is a cat. 8 9 00:00:28,620 --> 00:00:29,450 This is a cat. 9 10 00:00:29,460 --> 00:00:30,050 This is cat. 10 11 00:00:30,060 --> 00:00:31,920 This is a cat and this is a cat." 11 12 00:00:31,980 --> 00:00:37,590 And, of course, as humans, it's very easy for us to look at any image and identify, you know, whether it's 12 13 00:00:37,590 --> 00:00:40,340 a cat or a dog, or any other sort of animal. 13 14 00:00:40,350 --> 00:00:44,400 But, essentially, we kind of went through this training process when we were young as well, 14 15 00:00:44,400 --> 00:00:44,840 right? 15 16 00:00:44,970 --> 00:00:50,430 When we were babies and we looked at this strange fluffy animal, you know, maybe one of our parents said, 16 17 00:00:50,490 --> 00:00:54,310 That is a cat," or maybe your teacher said, "That is a dog." 17 18 00:00:54,390 --> 00:00:57,810 And over time, we learned pattern recognition. 18 19 00:00:57,810 --> 00:01:04,200 We understood how to classify different animals based on repeated exposure. 19 20 00:01:04,290 --> 00:01:07,520 And this is exactly what we do for machine vision. 20 21 00:01:07,560 --> 00:01:11,390 So we feed the computer loads and loads of cat images. 21 22 00:01:11,490 --> 00:01:17,580 And if there's one thing that the Internet is not short on, it's cat images. And every single time we 22 23 00:01:17,580 --> 00:01:23,050 show it, this cat image, the cat image also has a label saying, "This is a cat." 23 24 00:01:23,070 --> 00:01:28,500 So every single time the computer sees this particular blend of pixels, it'll be taught that that is 24 25 00:01:28,650 --> 00:01:29,280 a cat. 25 26 00:01:29,550 --> 00:01:34,020 So this is why it's supervised. The training data is always labeled. 26 27 00:01:34,080 --> 00:01:39,810 So what we're hoping for is after this repeated exposure where the machine learns that this is the rough 27 28 00:01:39,870 --> 00:01:46,590 outline of what a cat looks like, then the next time when we ask the computer and feed it a piece of 28 29 00:01:46,590 --> 00:01:52,710 data that's unlabeled saying, "What is this?" The machine learning models should be able to identify that 29 30 00:01:52,710 --> 00:01:58,290 it has a lot of features that are very similar to the previous images of cat and it should spit out 30 31 00:01:58,290 --> 00:01:59,210 the answer: 31 32 00:01:59,220 --> 00:02:02,280 "This is a cat," even though it looks like a burrito. 32 33 00:02:02,580 --> 00:02:07,680 So when you're training that machine learning model, you're essentially presenting it with loads of images, 33 34 00:02:08,040 --> 00:02:10,890 and each image comes with a label. 34 35 00:02:10,890 --> 00:02:16,970 So the data is clearly labeled and gets fed into the machine learning model. 35 36 00:02:17,010 --> 00:02:18,620 So this is a cat. 36 37 00:02:18,720 --> 00:02:21,500 This is a dog and this is a cow. 37 38 00:02:21,510 --> 00:02:27,150 Now, you're going to do this for lots of different types of cats, dogs, and cows, 38 39 00:02:27,150 --> 00:02:32,730 so that you cover all the different breeds, and all the different sizes, or lighting conditions, et cetera. 39 40 00:02:32,730 --> 00:02:39,750 And the machine learns through this experience and begins to classify these images into their respective 40 41 00:02:39,750 --> 00:02:40,440 groups. 41 42 00:02:40,440 --> 00:02:46,200 So, in this case, the model is the thing that does the learning and the data that you feed it is called 42 43 00:02:46,200 --> 00:02:47,490 the training data. 43 44 00:02:47,490 --> 00:02:54,750 Now, once you've completed your training, at a later stage, you should be able to present an image of a 44 45 00:02:54,750 --> 00:02:58,290 dog and that the model has never ever seen before. 45 46 00:02:58,380 --> 00:03:06,280 So this is not a part of the training data. And you ask the model, 'What is this?" based on all of your training. 46 47 00:03:06,300 --> 00:03:14,460 "Can you classify what this image is?" And it should use what it's learned based on the training data 47 48 00:03:14,940 --> 00:03:18,990 and be able to spit out an answer and say, "This is a dog." 48 49 00:03:19,050 --> 00:03:24,600 So the new image that the model has never seen is called the testing data and the result that it spits 49 50 00:03:24,600 --> 00:03:26,560 back out is the output. 50 51 00:03:26,610 --> 00:03:28,830 Now, the output can be in various forms. 51 52 00:03:28,830 --> 00:03:32,730 It could be a word or it could be a move on a chessboard. 52 53 00:03:32,730 --> 00:03:37,050 It really depends on how you trained up the model and what you want it to do. 53 54 00:03:37,320 --> 00:03:45,200 So this is one of the most fundamental types of supervised learning and it's known as classification. 54 55 00:03:45,330 --> 00:03:50,700 So if you imagine, you're trying to teach a computer to differentiate between apples and pears. 55 56 00:03:50,700 --> 00:03:56,910 Now, to us, as humans, this seems like a really simple task. But if you imagine trying to turn this task 56 57 00:03:56,970 --> 00:04:03,600 into a programming exercise where you have to tell the computer what features it had to look for in 57 58 00:04:03,600 --> 00:04:08,910 order to tell the difference between an apple and a pear, it's actually a really, really complex problem. 58 59 00:04:09,330 --> 00:04:15,420 Because you could say, maybe, "Oh, computer, you know if you blur everything up and you look at all the pixels 59 60 00:04:15,450 --> 00:04:22,230 in each image, the apple pictures tend to have more red colors than the pear pictures. 60 61 00:04:22,230 --> 00:04:27,790 So from our experience, we know that pears tend to be more green and apples tend to be more red. 61 62 00:04:27,810 --> 00:04:30,700 But if you designed a program the computer in this way, 62 63 00:04:30,840 --> 00:04:37,320 what happens when you have a green apple, then the computer is probably going to think that's a pear, right? 63 64 00:04:37,320 --> 00:04:43,160 Now, also, what if you have some sort of fruit that it's never ever seen before even though you can write 64 65 00:04:43,170 --> 00:04:49,470 a lot of code and specify all of the unique features of apples when it's compared to a pear. 65 66 00:04:49,680 --> 00:04:55,530 So, say, if you said apples tend to be more round than pears, apples tend to be a little bit redder than 66 67 00:04:55,530 --> 00:04:56,190 pears, 67 68 00:04:56,310 --> 00:05:01,710 and then you try and present the computer with something it's never ever seen before that is an anomaly 68 69 00:05:01,740 --> 00:05:05,070 that doesn't belong in the apple or the pear camp, 69 70 00:05:05,070 --> 00:05:10,890 then it's going to try and use those rules that you defined and try and classify it, and it'll probably 70 71 00:05:10,890 --> 00:05:14,580 classify this as an apple. And through a lot of research, 71 72 00:05:14,580 --> 00:05:20,070 it's been demonstrated that even though we're really, really good at pattern recognition as humans, we 72 73 00:05:20,070 --> 00:05:26,580 can't always pinpoint exactly what it is that makes a certain thing that. I mean, 73 74 00:05:26,580 --> 00:05:32,970 try and think of what makes an apple unique amongst all the other fruits or, indeed, any other item in 74 75 00:05:32,970 --> 00:05:33,760 the world. 75 76 00:05:33,780 --> 00:05:40,050 It's pretty difficult and time consuming to come up with a program with a list of rules that classifies 76 77 00:05:40,050 --> 00:05:44,840 an apple and makes the computer differentiate it from a peach, for example. 77 78 00:05:44,940 --> 00:05:51,240 But in the case of machine learning, you can feed that machine learning model a whole bunch of images 78 79 00:05:51,240 --> 00:05:56,140 of cats, of dogs. of pears, of apples, of anything that you can imagine. 79 80 00:05:56,160 --> 00:06:03,540 And as long as all of those pieces of data are labeled and you give the model enough instances of each 80 81 00:06:03,540 --> 00:06:10,860 and every category, then it should be able to spit out what each of the items were, and be able to classify 81 82 00:06:10,860 --> 00:06:17,130 them based on the features that it's identified. And the nice thing about a lot of these machine learning 82 83 00:06:17,130 --> 00:06:23,820 models is that they're reusable. So you could probably create a generic classifier that looks at handwritten 83 84 00:06:23,820 --> 00:06:27,150 numbers and is able to figure out what those numbers are. 84 85 00:06:27,150 --> 00:06:34,530 So turning it from an image to an integer, for example. But you can use that same generic classifier, and 85 86 00:06:34,530 --> 00:06:40,980 instead of training it on images of handwritten numbers, you could feed it emails that are labeled as 86 87 00:06:40,980 --> 00:06:47,380 spam or not spam and it could classify new emails based on those criteria. 87 88 00:06:47,400 --> 00:06:54,450 So once you've created a good model or a good generic classifier, if you change the training data, you 88 89 00:06:54,450 --> 00:06:58,760 can get it to do different things without having to recode the entire model. 89 90 00:06:58,830 --> 00:07:01,570 And this is one of the advantages of machine learning. 90 91 00:07:01,570 --> 00:07:04,470 Now, let's look at how a machine might do this. 91 92 00:07:04,470 --> 00:07:11,790 So, say, if we have a graph where we have a threshold for emails that should go into the inbox and a threshold 92 93 00:07:11,880 --> 00:07:15,160 for emails that should probably go into the spam folder. 93 94 00:07:15,210 --> 00:07:22,320 When you have a spam filter or an artificially intelligent program that is able to differentiate emails 94 95 00:07:22,320 --> 00:07:26,940 that come in, whether if they should head to the inbox or whether if they should head to the spam box, 95 96 00:07:27,270 --> 00:07:29,100 then it's kind of binary. 96 97 00:07:29,100 --> 00:07:36,630 You can't really have a halfway house, like maybe a decontamination zone or a sort of spam or maybe spam 97 98 00:07:36,630 --> 00:07:37,180 folder, 98 99 00:07:37,200 --> 00:07:37,540 right? 99 100 00:07:37,560 --> 00:07:43,860 So let's say that the decision is only binary when this machine learning model gets fed new testing 100 101 00:07:43,860 --> 00:07:49,920 data in the form of an email, it should be able to scan through the contents and decide whether if it 101 102 00:07:49,920 --> 00:07:54,390 should be sent to the inbox or whether if it should be sent to the spam box. 102 103 00:07:54,390 --> 00:07:56,310 So it's a 1 or 0. 103 104 00:07:56,310 --> 00:08:03,930 Now let's say that we train this model on a whole bunch of emails which are labeled as spam or not spam. 104 105 00:08:03,930 --> 00:08:10,110 Now, let's say that one of the factors that affect whether if an email is likely to be spam is the number 105 106 00:08:10,110 --> 00:08:13,110 of links that are contained in the actual email. 106 107 00:08:13,110 --> 00:08:14,610 And this is a real thing, by the way. 107 108 00:08:14,640 --> 00:08:21,210 If you try and go on the Gmail and you send somebody an email that has a hundred links, you can see which 108 109 00:08:21,210 --> 00:08:26,820 folder it lands in. It's usually not going to be an inbox because it just looks so spammy. 109 110 00:08:26,820 --> 00:08:33,930 So if we plot all of our training data onto this graph, so emails which are labeled spam or not spam, 110 111 00:08:33,990 --> 00:08:39,770 based on the number of links that the email contains, then it might look something like this. 111 112 00:08:39,810 --> 00:08:47,010 Now, the machine learning models job is to try and draw a line that goes through this data, and figures 112 113 00:08:47,010 --> 00:08:49,760 out a threshold for the number of links. 113 114 00:08:49,770 --> 00:08:56,340 So if, say, the number of links was less than five, then it probably is more likely to go into the inbox. 114 115 00:08:56,670 --> 00:09:00,300 And if it's greater than five, it's probably more likely to be spam. 115 116 00:09:00,300 --> 00:09:02,130 So it should go into the spam box. 116 117 00:09:02,130 --> 00:09:06,960 And this particular rule can be given a weight amongst many, many other features. 117 118 00:09:07,020 --> 00:09:13,350 For example, the number of images that are contained in the email or the number of words like buy or 118 119 00:09:13,350 --> 00:09:14,230 sale. 119 120 00:09:14,250 --> 00:09:21,630 So it evaluates a whole bunch of these features within the email and it has different weightings towards 120 121 00:09:21,720 --> 00:09:28,050 each of these features. And using all of that, it's able to decide whether if an email should be going 121 122 00:09:28,050 --> 00:09:33,060 to spam or going into the inbox. And it learns continuously over time. 122 123 00:09:33,390 --> 00:09:39,030 So every single time you mark an email as spam, then you're teaching that machine learning algorithm 123 124 00:09:39,150 --> 00:09:45,570 something new, whether if it's confirming its existing model or giving it new features or giving it new 124 125 00:09:45,570 --> 00:09:50,370 data to work on to be able to predict this with increasing accuracy. 125 126 00:09:50,370 --> 00:09:57,390 Now, the thing to remember is that in supervised learning, we've already spoken about classification problems. 126 127 00:09:57,780 --> 00:10:04,880 And classification problems tend to be applied when you have discrete data, data that fit into specific 127 128 00:10:04,880 --> 00:10:05,270 camp. 128 129 00:10:05,270 --> 00:10:07,030 So for example, your grades, 129 130 00:10:07,040 --> 00:10:07,370 right? 130 131 00:10:07,400 --> 00:10:14,540 You might get an A+, you might get an A, a B, B+, et cetera. You won't get like, you know, a B+ .5, or 131 132 00:10:14,540 --> 00:10:18,610 5, or a B+ .599, instead, 132 133 00:10:18,710 --> 00:10:20,840 that would be called Continuous Data. 133 134 00:10:20,870 --> 00:10:26,420 So if, for example, your height, right? Your height could be anywhere along a ruler and depending on how 134 135 00:10:26,540 --> 00:10:31,880 accurately you decide to measure your height, you know, it could go down to 10 decimal places if you were 135 136 00:10:31,880 --> 00:10:34,790 really that interested in your precise height. 136 137 00:10:35,240 --> 00:10:40,220 So the reason why we differentiate between discrete and continuous data is that when you're working 137 138 00:10:40,220 --> 00:10:46,670 with continuous data with machine learning, you're more likely to be using a process called regression 138 139 00:10:46,940 --> 00:10:48,510 to do your machine learning. 139 140 00:10:48,560 --> 00:10:54,280 So a good example of a regression model is, for example, developer salaries, 140 141 00:10:54,290 --> 00:10:54,560 right? 141 142 00:10:54,680 --> 00:11:01,220 So according to indeed.com, the average salary of a developer, it doesn't say what kind of developer, 142 143 00:11:01,580 --> 00:11:04,480 is about 100,000 per year. 143 144 00:11:04,550 --> 00:11:11,240 And this is based on data from about 1,500 employees over the past 12 months. 144 145 00:11:11,240 --> 00:11:16,580 It also shows you this box depending on the level of experience of the developer. 145 146 00:11:16,580 --> 00:11:18,920 They might get paid less or more. 146 147 00:11:19,190 --> 00:11:22,250 So this is something that we would probably tend to agree with, 147 148 00:11:22,250 --> 00:11:22,910 right? 148 149 00:11:22,970 --> 00:11:27,710 If you've worked for a longer number of years at a particular job, you're probably going to get paid 149 150 00:11:27,710 --> 00:11:28,340 more. 150 151 00:11:28,370 --> 00:11:36,680 Now, say, if you had a friend who came to you and said, "I worked as a developer for 12 years. How much do 151 152 00:11:36,680 --> 00:11:41,530 you think that I'm likely to get paid if I apply for a job right now?" 152 153 00:11:41,540 --> 00:11:46,360 Now, you can given them a rough estimate putting him into one of these camps, right? 153 154 00:11:46,400 --> 00:11:51,980 Maybe 12 years puts you into, you know, your average developer, or maybe if he's worked for 20 years and 154 155 00:11:51,980 --> 00:11:59,420 that puts him into the senior developer camp. But you can't really say for sure exactly what number should 155 156 00:11:59,420 --> 00:12:03,520 be based on his exact input, i.e., 12 years. 156 157 00:12:03,560 --> 00:12:06,820 So this is where a regression learning model comes in handy. 157 158 00:12:07,100 --> 00:12:15,020 Let's say that you plot a graph of developer salaries and on the Y axis, you've got the salary, and on 158 159 00:12:15,020 --> 00:12:18,120 the X axis, you've got the years of experience. 159 160 00:12:18,230 --> 00:12:25,930 So you're looking at the years of experience as an independent variable to the salary of a developer. 160 161 00:12:25,940 --> 00:12:27,800 So this is what that graph looks like 161 162 00:12:27,830 --> 00:12:32,270 if you give it a whole bunch of training data, i.e., clearly label data. 162 163 00:12:32,270 --> 00:12:38,960 So each data point has a salary amount and the years of experience of that developer who is earning 163 164 00:12:38,960 --> 00:12:39,850 that salary. 164 165 00:12:39,860 --> 00:12:44,780 Now, in most cases, even if you have zero years of experience, you're not going to start out with a salary 165 166 00:12:44,780 --> 00:12:45,470 of zero, 166 167 00:12:45,470 --> 00:12:47,220 because even interns get paid, 167 168 00:12:47,220 --> 00:12:47,800 right? 168 169 00:12:47,810 --> 00:12:52,930 So that first data point is maybe the entry-level standard salary. 169 170 00:12:53,120 --> 00:13:00,680 But as you can see, using your human eye, as the years of experience increase, you generally see an increase 170 171 00:13:00,800 --> 00:13:03,340 in the amount of salary they earn. 171 172 00:13:03,350 --> 00:13:10,130 Now, if I told you to try and draw a line through all the data that best fits this particular data set, 172 173 00:13:10,430 --> 00:13:12,620 you might be able to do it quite easily. 173 174 00:13:12,620 --> 00:13:16,220 And similarly, this is what we want our regression model to do, 174 175 00:13:16,220 --> 00:13:22,120 we wanted to draw a line through the data that best fits this particular dataset. 175 176 00:13:22,250 --> 00:13:29,630 And now that we have our regression model, we're able to go and go into the X axis and look at where 176 177 00:13:29,800 --> 00:13:33,130 12 years of experience will hit the line at. 177 178 00:13:33,200 --> 00:13:41,120 And then if we extend it to the Y axis, we can see based on our training data what amount of salary our 178 179 00:13:41,120 --> 00:13:45,380 friend, who has 12 years of developer experience, should be earning. 179 180 00:13:45,380 --> 00:13:49,490 So this is a very, very simple representation of a regression model. 180 181 00:13:49,490 --> 00:13:54,440 Now, of course, there's other types of supervised learning models, but the most common ones that you will 181 182 00:13:54,440 --> 00:14:00,590 see and the most common ones are used commercially are regression and classification, both of which CoreML 182 183 00:14:00,590 --> 00:14:05,810 is able to do. Now, in the next lesson, we're going to talk about another type of machine learning 183 184 00:14:05,840 --> 00:14:08,190 which is unsupervised learning. 184 185 00:14:08,330 --> 00:14:09,230 So I'll see you there.