0
1
00:00:00,580 --> 00:00:06,840
Okay. So firstly, let's talk about Supervised Learning. And supervised learning really is like having a teacher
1

2
00:00:06,840 --> 00:00:12,240
there, having a trainer there, who's guiding you step by step, telling you, "That was right, that was wrong,"
2

3
00:00:12,570 --> 00:00:13,830
or "This is a flower.
3

4
00:00:13,830 --> 00:00:14,850
This is a plant."
4

5
00:00:14,910 --> 00:00:18,360
It's a sort of handholding way of teaching the computer what to do.
5

6
00:00:18,360 --> 00:00:24,750
So one of the most famous examples of supervised learning is training in computer to recognize a cat,
6

7
00:00:24,750 --> 00:00:25,020
right?
7

8
00:00:25,050 --> 00:00:28,480
So we would say to the computer, "This is a cat.
8

9
00:00:28,620 --> 00:00:29,450
This is a cat.
9

10
00:00:29,460 --> 00:00:30,050
This is cat.
10

11
00:00:30,060 --> 00:00:31,920
This is a cat and this is a cat."
11

12
00:00:31,980 --> 00:00:37,590
And, of course, as humans, it's very easy for us to look at any image and identify, you know, whether it's
12

13
00:00:37,590 --> 00:00:40,340
a cat or a dog, or any other sort of animal.
13

14
00:00:40,350 --> 00:00:44,400
But, essentially, we kind of went through this training process when we were young as well,
14

15
00:00:44,400 --> 00:00:44,840
right?
15

16
00:00:44,970 --> 00:00:50,430
When we were babies and we looked at this strange fluffy animal, you know, maybe one of our parents said,
16

17
00:00:50,490 --> 00:00:54,310
That is a cat," or maybe your teacher said, "That is a dog."
17

18
00:00:54,390 --> 00:00:57,810
And over time, we learned pattern recognition.
18

19
00:00:57,810 --> 00:01:04,200
We understood how to classify different animals based on repeated exposure.
19

20
00:01:04,290 --> 00:01:07,520
And this is exactly what we do for machine vision.
20

21
00:01:07,560 --> 00:01:11,390
So we feed the computer loads and loads of cat images.
21

22
00:01:11,490 --> 00:01:17,580
And if there's one thing that the Internet is not short on, it's cat images. And every single time we
22

23
00:01:17,580 --> 00:01:23,050
show it, this cat image, the cat image also has a label saying, "This is a cat."
23

24
00:01:23,070 --> 00:01:28,500
So every single time the computer sees this particular blend of pixels, it'll be taught that that is
24

25
00:01:28,650 --> 00:01:29,280
a cat.
25

26
00:01:29,550 --> 00:01:34,020
So this is why it's supervised. The training data is always labeled.
26

27
00:01:34,080 --> 00:01:39,810
So what we're hoping for is after this repeated exposure where the machine learns that this is the rough
27

28
00:01:39,870 --> 00:01:46,590
outline of what a cat looks like, then the next time when we ask the computer and feed it a piece of
28

29
00:01:46,590 --> 00:01:52,710
data that's unlabeled saying, "What is this?" The machine learning models should be able to identify that
29

30
00:01:52,710 --> 00:01:58,290
it has a lot of features that are very similar to the previous images of cat and it should spit out
30

31
00:01:58,290 --> 00:01:59,210
the answer:
31

32
00:01:59,220 --> 00:02:02,280
"This is a cat," even though it looks like a burrito.
32

33
00:02:02,580 --> 00:02:07,680
So when you're training that machine learning model, you're essentially presenting it with loads of images,
33

34
00:02:08,040 --> 00:02:10,890
and each image comes with a label.
34

35
00:02:10,890 --> 00:02:16,970
So the data is clearly labeled and gets fed into the machine learning model.
35

36
00:02:17,010 --> 00:02:18,620
So this is a cat.
36

37
00:02:18,720 --> 00:02:21,500
This is a dog and this is a cow.
37

38
00:02:21,510 --> 00:02:27,150
Now, you're going to do this for lots of different types of cats, dogs, and cows,
38

39
00:02:27,150 --> 00:02:32,730
so that you cover all the different breeds, and all the different sizes, or lighting conditions, et cetera.
39

40
00:02:32,730 --> 00:02:39,750
And the machine learns through this experience and begins to classify these images into their respective
40

41
00:02:39,750 --> 00:02:40,440
groups.
41

42
00:02:40,440 --> 00:02:46,200
So, in this case, the model is the thing that does the learning and the data that you feed it is called
42

43
00:02:46,200 --> 00:02:47,490
the training data.
43

44
00:02:47,490 --> 00:02:54,750
Now, once you've completed your training, at a later stage, you should be able to present an image of a
44

45
00:02:54,750 --> 00:02:58,290
dog and that the model has never ever seen before.
45

46
00:02:58,380 --> 00:03:06,280
So this is not a part of the training data. And you ask the model, 'What is this?" based on all of your training.
46

47
00:03:06,300 --> 00:03:14,460
"Can you classify what this image is?" And it should use what it's learned based on the training data
47

48
00:03:14,940 --> 00:03:18,990
and be able to spit out an answer and say, "This is a dog."
48

49
00:03:19,050 --> 00:03:24,600
So the new image that the model has never seen is called the testing data and the result that it spits
49

50
00:03:24,600 --> 00:03:26,560
back out is the output.
50

51
00:03:26,610 --> 00:03:28,830
Now, the output can be in various forms.
51

52
00:03:28,830 --> 00:03:32,730
It could be a word or it could be a move on a chessboard.
52

53
00:03:32,730 --> 00:03:37,050
It really depends on how you trained up the model and what you want it to do.
53

54
00:03:37,320 --> 00:03:45,200
So this is one of the most fundamental types of supervised learning and it's known as classification.
54

55
00:03:45,330 --> 00:03:50,700
So if you imagine, you're trying to teach a computer to differentiate between apples and pears.
55

56
00:03:50,700 --> 00:03:56,910
Now, to us, as humans, this seems like a really simple task. But if you imagine trying to turn this task
56

57
00:03:56,970 --> 00:04:03,600
into a programming exercise where you have to tell the computer what features it had to look for in
57

58
00:04:03,600 --> 00:04:08,910
order to tell the difference between an apple and a pear, it's actually a really, really complex problem.
58

59
00:04:09,330 --> 00:04:15,420
Because you could say, maybe, "Oh, computer, you know if you blur everything up and you look at all the pixels
59

60
00:04:15,450 --> 00:04:22,230
in each image, the apple pictures tend to have more red colors than the pear pictures.
60

61
00:04:22,230 --> 00:04:27,790
So from our experience, we know that pears tend to be more green and apples tend to be more red.
61

62
00:04:27,810 --> 00:04:30,700
But if you designed a program the computer in this way,
62

63
00:04:30,840 --> 00:04:37,320
what happens when you have a green apple, then the computer is probably going to think that's a pear, right?
63

64
00:04:37,320 --> 00:04:43,160
Now, also, what if you have some sort of fruit that it's never ever seen before even though you can write
64

65
00:04:43,170 --> 00:04:49,470
a lot of code and specify all of the unique features of apples when it's compared to a pear.
65

66
00:04:49,680 --> 00:04:55,530
So, say, if you said apples tend to be more round than pears, apples tend to be a little bit redder than
66

67
00:04:55,530 --> 00:04:56,190
pears,
67

68
00:04:56,310 --> 00:05:01,710
and then you try and present the computer with something it's never ever seen before that is an anomaly
68

69
00:05:01,740 --> 00:05:05,070
that doesn't belong in the apple or the pear camp,
69

70
00:05:05,070 --> 00:05:10,890
then it's going to try and use those rules that you defined and try and classify it, and it'll probably
70

71
00:05:10,890 --> 00:05:14,580
classify this as an apple. And through a lot of research,
71

72
00:05:14,580 --> 00:05:20,070
it's been demonstrated that even though we're really, really good at pattern recognition as humans, we
72

73
00:05:20,070 --> 00:05:26,580
can't always pinpoint exactly what it is that makes a certain thing that. I mean,
73

74
00:05:26,580 --> 00:05:32,970
try and think of what makes an apple unique amongst all the other fruits or, indeed, any other item in
74

75
00:05:32,970 --> 00:05:33,760
the world.
75

76
00:05:33,780 --> 00:05:40,050
It's pretty difficult and time consuming to come up with a program with a list of rules that classifies
76

77
00:05:40,050 --> 00:05:44,840
an apple and makes the computer differentiate it from a peach, for example.
77

78
00:05:44,940 --> 00:05:51,240
But in the case of machine learning, you can feed that machine learning model a whole bunch of images
78

79
00:05:51,240 --> 00:05:56,140
of cats, of dogs. of pears, of apples, of anything that you can imagine.
79

80
00:05:56,160 --> 00:06:03,540
And as long as all of those pieces of data are labeled and you give the model enough instances of each
80

81
00:06:03,540 --> 00:06:10,860
and every category, then it should be able to spit out what each of the items were, and be able to classify
81

82
00:06:10,860 --> 00:06:17,130
them based on the features that it's identified. And the nice thing about a lot of these machine learning
82

83
00:06:17,130 --> 00:06:23,820
models is that they're reusable. So you could probably create a generic classifier that looks at handwritten
83

84
00:06:23,820 --> 00:06:27,150
numbers and is able to figure out what those numbers are.
84

85
00:06:27,150 --> 00:06:34,530
So turning it from an image to an integer, for example. But you can use that same generic classifier, and
85

86
00:06:34,530 --> 00:06:40,980
instead of training it on images of handwritten numbers, you could feed it emails that are labeled as
86

87
00:06:40,980 --> 00:06:47,380
spam or not spam and it could classify new emails based on those criteria.
87

88
00:06:47,400 --> 00:06:54,450
So once you've created a good model or a good generic classifier, if you change the training data, you
88

89
00:06:54,450 --> 00:06:58,760
can get it to do different things without having to recode the entire model.
89

90
00:06:58,830 --> 00:07:01,570
And this is one of the advantages of machine learning.
90

91
00:07:01,570 --> 00:07:04,470
Now, let's look at how a machine might do this.
91

92
00:07:04,470 --> 00:07:11,790
So, say, if we have a graph where we have a threshold for emails that should go into the inbox and a threshold
92

93
00:07:11,880 --> 00:07:15,160
for emails that should probably go into the spam folder.
93

94
00:07:15,210 --> 00:07:22,320
When you have a spam filter or an artificially intelligent program that is able to differentiate emails
94

95
00:07:22,320 --> 00:07:26,940
that come in, whether if they should head to the inbox or whether if they should head to the spam box,
95

96
00:07:27,270 --> 00:07:29,100
then it's kind of binary.
96

97
00:07:29,100 --> 00:07:36,630
You can't really have a halfway house, like maybe a decontamination zone or a sort of spam or maybe spam
97

98
00:07:36,630 --> 00:07:37,180
folder,
98

99
00:07:37,200 --> 00:07:37,540
right?
99

100
00:07:37,560 --> 00:07:43,860
So let's say that the decision is only binary when this machine learning model gets fed new testing
100

101
00:07:43,860 --> 00:07:49,920
data in the form of an email, it should be able to scan through the contents and decide whether if it
101

102
00:07:49,920 --> 00:07:54,390
should be sent to the inbox or whether if it should be sent to the spam box.
102

103
00:07:54,390 --> 00:07:56,310
So it's a 1 or 0.
103

104
00:07:56,310 --> 00:08:03,930
Now let's say that we train this model on a whole bunch of emails which are labeled as spam or not spam.
104

105
00:08:03,930 --> 00:08:10,110
Now, let's say that one of the factors that affect whether if an email is likely to be spam is the number
105

106
00:08:10,110 --> 00:08:13,110
of links that are contained in the actual email.
106

107
00:08:13,110 --> 00:08:14,610
And this is a real thing, by the way.
107

108
00:08:14,640 --> 00:08:21,210
If you try and go on the Gmail and you send somebody an email that has a hundred links, you can see which
108

109
00:08:21,210 --> 00:08:26,820
folder it lands in. It's usually not going to be an inbox because it just looks so spammy.
109

110
00:08:26,820 --> 00:08:33,930
So if we plot all of our training data onto this graph, so emails which are labeled spam or not spam,
110

111
00:08:33,990 --> 00:08:39,770
based on the number of links that the email contains, then it might look something like this.
111

112
00:08:39,810 --> 00:08:47,010
Now, the machine learning models job is to try and draw a line that goes through this data, and figures
112

113
00:08:47,010 --> 00:08:49,760
out a threshold for the number of links.
113

114
00:08:49,770 --> 00:08:56,340
So if, say, the number of links was less than five, then it probably is more likely to go into the inbox.
114

115
00:08:56,670 --> 00:09:00,300
And if it's greater than five, it's probably more likely to be spam.
115

116
00:09:00,300 --> 00:09:02,130
So it should go into the spam box.
116

117
00:09:02,130 --> 00:09:06,960
And this particular rule can be given a weight amongst many, many other features.
117

118
00:09:07,020 --> 00:09:13,350
For example, the number of images that are contained in the email or the number of words like buy or
118

119
00:09:13,350 --> 00:09:14,230
sale.
119

120
00:09:14,250 --> 00:09:21,630
So it evaluates a whole bunch of these features within the email and it has different weightings towards
120

121
00:09:21,720 --> 00:09:28,050
each of these features. And using all of that, it's able to decide whether if an email should be going
121

122
00:09:28,050 --> 00:09:33,060
to spam or going into the inbox. And it learns continuously over time.
122

123
00:09:33,390 --> 00:09:39,030
So every single time you mark an email as spam, then you're teaching that machine learning algorithm
123

124
00:09:39,150 --> 00:09:45,570
something new, whether if it's confirming its existing model or giving it new features or giving it new
124

125
00:09:45,570 --> 00:09:50,370
data to work on to be able to predict this with increasing accuracy.
125

126
00:09:50,370 --> 00:09:57,390
Now, the thing to remember is that in supervised learning, we've already spoken about classification problems.
126

127
00:09:57,780 --> 00:10:04,880
And classification problems tend to be applied when you have discrete data, data that fit into specific
127

128
00:10:04,880 --> 00:10:05,270
camp.
128

129
00:10:05,270 --> 00:10:07,030
So for example, your grades,
129

130
00:10:07,040 --> 00:10:07,370
right?
130

131
00:10:07,400 --> 00:10:14,540
You might get an A+, you might get an A, a B, B+, et cetera. You won't get like, you know, a B+ .5, or
131

132
00:10:14,540 --> 00:10:18,610
5, or a B+  .599, instead,
132

133
00:10:18,710 --> 00:10:20,840
that would be called Continuous Data.
133

134
00:10:20,870 --> 00:10:26,420
So if, for example, your height, right? Your height could be anywhere along a ruler and depending on how
134

135
00:10:26,540 --> 00:10:31,880
accurately you decide to measure your height, you know, it could go down to 10 decimal places if you were
135

136
00:10:31,880 --> 00:10:34,790
really that interested in your precise height.
136

137
00:10:35,240 --> 00:10:40,220
So the reason why we differentiate between discrete and continuous data is that when you're working
137

138
00:10:40,220 --> 00:10:46,670
with continuous data with machine learning, you're more likely to be using a process called regression
138

139
00:10:46,940 --> 00:10:48,510
to do your machine learning.
139

140
00:10:48,560 --> 00:10:54,280
So a good example of a regression model is, for example, developer salaries,
140

141
00:10:54,290 --> 00:10:54,560
right?
141

142
00:10:54,680 --> 00:11:01,220
So according to indeed.com, the average salary of a developer, it doesn't say what kind of developer,
142

143
00:11:01,580 --> 00:11:04,480
is about 100,000 per year.
143

144
00:11:04,550 --> 00:11:11,240
And this is based on data from about 1,500 employees over the past 12 months.
144

145
00:11:11,240 --> 00:11:16,580
It also shows you this box depending on the level of experience of the developer.
145

146
00:11:16,580 --> 00:11:18,920
They might get paid less or more.
146

147
00:11:19,190 --> 00:11:22,250
So this is something that we would probably tend to agree with,
147

148
00:11:22,250 --> 00:11:22,910
right?
148

149
00:11:22,970 --> 00:11:27,710
If you've worked for a longer number of years at a particular job, you're probably going to get paid
149

150
00:11:27,710 --> 00:11:28,340
more.
150

151
00:11:28,370 --> 00:11:36,680
Now, say, if you had a friend who came to you and said, "I worked as a developer for 12 years. How much do
151

152
00:11:36,680 --> 00:11:41,530
you think that I'm likely to get paid if I apply for a job right now?"
152

153
00:11:41,540 --> 00:11:46,360
Now, you can given them a rough estimate putting him into one of these camps, right?
153

154
00:11:46,400 --> 00:11:51,980
Maybe 12 years puts you into, you know, your average developer, or maybe if he's worked for 20 years and
154

155
00:11:51,980 --> 00:11:59,420
that puts him into the senior developer camp. But you can't really say for sure exactly what number should
155

156
00:11:59,420 --> 00:12:03,520
be based on his exact input, i.e., 12 years.
156

157
00:12:03,560 --> 00:12:06,820
So this is where a regression learning model comes in handy.
157

158
00:12:07,100 --> 00:12:15,020
Let's say that you plot a graph of developer salaries and on the Y axis, you've got the salary, and on
158

159
00:12:15,020 --> 00:12:18,120
the X axis, you've got the years of experience.
159

160
00:12:18,230 --> 00:12:25,930
So you're looking at the years of experience as an independent variable to the salary of a developer.
160

161
00:12:25,940 --> 00:12:27,800
So this is what that graph looks like
161

162
00:12:27,830 --> 00:12:32,270
if you give it a whole bunch of training data, i.e., clearly label data.
162

163
00:12:32,270 --> 00:12:38,960
So each data point has a salary amount and the years of experience of that developer who is earning
163

164
00:12:38,960 --> 00:12:39,850
that salary.
164

165
00:12:39,860 --> 00:12:44,780
Now, in most cases, even if you have zero years of experience, you're not going to start out with a salary
165

166
00:12:44,780 --> 00:12:45,470
of zero,
166

167
00:12:45,470 --> 00:12:47,220
because even interns get paid,
167

168
00:12:47,220 --> 00:12:47,800
right?
168

169
00:12:47,810 --> 00:12:52,930
So that first data point is maybe the entry-level standard salary.
169

170
00:12:53,120 --> 00:13:00,680
But as you can see, using your human eye, as the years of experience increase, you generally see an increase
170

171
00:13:00,800 --> 00:13:03,340
in the amount of salary they earn.
171

172
00:13:03,350 --> 00:13:10,130
Now, if I told you to try and draw a line through all the data that best fits this particular data set,
172

173
00:13:10,430 --> 00:13:12,620
you might be able to do it quite easily.
173

174
00:13:12,620 --> 00:13:16,220
And similarly, this is what we want our regression model to do,
174

175
00:13:16,220 --> 00:13:22,120
we wanted to draw a line through the data that best fits this particular dataset.
175

176
00:13:22,250 --> 00:13:29,630
And now that we have our regression model, we're able to go and go into the X axis and look at where
176

177
00:13:29,800 --> 00:13:33,130
12 years of experience will hit the line at.
177

178
00:13:33,200 --> 00:13:41,120
And then if we extend it to the Y axis, we can see based on our training data what amount of salary our
178

179
00:13:41,120 --> 00:13:45,380
friend, who has 12 years of developer experience, should be earning.
179

180
00:13:45,380 --> 00:13:49,490
So this is a very, very simple representation of a regression model.
180

181
00:13:49,490 --> 00:13:54,440
Now, of course, there's other types of supervised learning models, but the most common ones that you will
181

182
00:13:54,440 --> 00:14:00,590
see and the most common ones are used commercially are regression and classification, both of which CoreML
182

183
00:14:00,590 --> 00:14:05,810
is able to do. Now, in the next lesson, we're going to talk about another type of machine learning
183

184
00:14:05,840 --> 00:14:08,190
which is unsupervised learning.
184

185
00:14:08,330 --> 00:14:09,230
So I'll see you there.