0
1
00:00:00,690 --> 00:00:01,080
All right.
1

2
00:00:01,110 --> 00:00:08,130
So, so far, we've covered supervised learning which is where you're feeding the computational model training
2

3
00:00:08,130 --> 00:00:15,810
data that is labeled that the computer can understand and it can use it to classify or perform regressions.
3

4
00:00:16,440 --> 00:00:19,760
The next time we talked about was unsupervised learning.
4

5
00:00:19,860 --> 00:00:23,790
And this is where you feed the computational model a whole bunch of data
5

6
00:00:23,970 --> 00:00:30,270
and the computer tries to make sense of that data and tries to give it structure, be it using clustering
6

7
00:00:30,300 --> 00:00:32,400
or be it using another framework.
7

8
00:00:32,400 --> 00:00:36,630
Now, we're going to talk about reinforcement learning.
8

9
00:00:36,750 --> 00:00:39,410
You can see where I'm going with this school analogy.
9

10
00:00:39,420 --> 00:00:46,110
Now, when I was in school, growing up in China, I had a lot of reinforcement learning, from memory the various
10

11
00:00:46,110 --> 00:00:55,170
things that have been used to reinforce learning include chalk, chalkboard duster, as well as one teacher
11

12
00:00:55,200 --> 00:01:03,480
who decided to throw her high heels at me, because I refused to memorize a poem when I was age 7 because
12

13
00:01:03,540 --> 00:01:07,000
I said, "Well, it's in the book. Why don't you just read it in the book?
13

14
00:01:07,020 --> 00:01:08,760
Why do I have to memorize it?"
14

15
00:01:08,760 --> 00:01:16,370
So clearly, reinforcement learning works very well on me and it works also very well on machines.
15

16
00:01:16,380 --> 00:01:17,580
So enough about me.
16

17
00:01:17,580 --> 00:01:20,670
What exactly is reinforcement learning?
17

18
00:01:20,670 --> 00:01:26,610
Well, if you think about it, as humans, we tend to learn through reinforcement, right?
18

19
00:01:26,670 --> 00:01:32,730
If you touch something that's hot and it burns and it hurts you, then you're going to learn to not do
19

20
00:01:32,730 --> 00:01:33,710
that in the future,
20

21
00:01:33,720 --> 00:01:34,020
right/?
21

22
00:01:34,080 --> 00:01:36,540
So that's a form of negative reward.
22

23
00:01:36,540 --> 00:01:42,330
It's a way of punishing particular behaviors that lead to, probably, the demise of our species.
23

24
00:01:42,330 --> 00:01:47,730
So, imagine, if none of us had pain receptors, then we would run around hurting ourselves in all sorts
24

25
00:01:47,730 --> 00:01:48,840
of creative ways,
25

26
00:01:48,900 --> 00:01:53,220
and the human species wouldn't have lasted for long enough for me to talk to you about reinforcement
26

27
00:01:53,220 --> 00:01:54,360
learning.
27

28
00:01:54,450 --> 00:02:00,180
So this is a form of negative reward, but there's also positive rewards,
28

29
00:02:00,180 --> 00:02:04,140
then maybe you'll be rewarded by the teacher or by your parents.
29

30
00:02:04,230 --> 00:02:09,030
And if you look at dog training, you can see that one of the most effective ways of getting a dog to
30

31
00:02:09,030 --> 00:02:11,430
do something is by rewarding it.
31

32
00:02:11,730 --> 00:02:19,200
So for example, if you're trying to tell it to sit and it does sit, then you give it a treat, then it knows,
32

33
00:02:19,320 --> 00:02:24,710
"Oh, I did the right thing. I did the requested thing. I'm on the right track."
33

34
00:02:24,750 --> 00:02:30,570
So this is reinforcement learning. And one of the most famous applications of reinforcement learning
34

35
00:02:30,930 --> 00:02:33,060
is with these games like chess.
35

36
00:02:33,090 --> 00:02:39,380
So let's say that the guy on the right is a machine that is using a reinforcement learning algorithm.
36

37
00:02:39,450 --> 00:02:46,170
Even though he's making moves continuously, he's probably only getting that reinforcement at the end
37

38
00:02:46,170 --> 00:02:53,100
of the game where he finds out whether if he won or whether he lost, however, he can continuously calculate
38

39
00:02:53,340 --> 00:02:55,170
his probability of winning.
39

40
00:02:55,650 --> 00:03:03,660
So if he makes a particular move and it increases his probability of winning, then that's a positive
40

41
00:03:03,660 --> 00:03:05,460
reinforcement for that move.
41

42
00:03:05,460 --> 00:03:12,300
But if the opponent then counters that move very easily and it reduces his win probability, then that
42

43
00:03:12,330 --> 00:03:14,490
is negative reinforcement.
43

44
00:03:14,520 --> 00:03:21,600
So through many, many cycles of training through practicing many, many games, the computers are able to
44

45
00:03:21,600 --> 00:03:29,820
learn on an ongoing basis which moves in which situations are more likely to lead to an increased probability
45

46
00:03:29,850 --> 00:03:34,380
of winning, i.e., an increased probability of getting that reward.
46

47
00:03:34,830 --> 00:03:40,780
So one of the real-life applications of reinforcement learning is in Google DeeMind's AlphaGo.
47

48
00:03:40,800 --> 00:03:46,320
So this is a machine learning algorithm that uses many different types of deep learning, actually, not
48

49
00:03:46,320 --> 00:03:48,280
just reinforcement learning.
49

50
00:03:48,300 --> 00:03:56,430
So if you haven't heard the news, recently, AlphaGo has won in three games out of three against the
50

51
00:03:56,430 --> 00:03:59,250
world champion in Go.
51

52
00:03:59,250 --> 00:04:06,030
So for those you guys who don't know, Go is this incredibly simple game where you have only black and
52

53
00:04:06,030 --> 00:04:07,190
white pieces,
53

54
00:04:07,410 --> 00:04:11,590
and the aim is to surround the opponent using your own pieces,
54

55
00:04:11,640 --> 00:04:15,030
and when you surround them, you're able to eat their pieces.
55

56
00:04:15,030 --> 00:04:22,590
So it's an incredibly simple game in terms of rules. But the actual permutations on a 19 by 19 board
56

57
00:04:22,950 --> 00:04:28,800
are actually larger than the number of atoms that exist in the universe.
57

58
00:04:28,800 --> 00:04:37,890
So it is one of the most complex strategy games known to and played by man. And recently, AlphaGo, which
58

59
00:04:37,890 --> 00:04:44,430
is based on the machine learning algorithm developed by Google's DeepMind, managed to beat the world's
59

60
00:04:44,520 --> 00:04:46,780
number one player in Go.
60

61
00:04:47,250 --> 00:04:51,190
So I love this image because it says, "The future of Go."
61

62
00:04:51,480 --> 00:04:58,080
But it probably should be renamed to "The future of mankind," where man sits there looking puzzled as the
62

63
00:04:58,080 --> 00:05:03,780
singularity takes over, and we become some sort of minor race that's ruled by computers.
63

64
00:05:03,810 --> 00:05:07,170
So here's another great reason for understanding machine learning.
64

65
00:05:07,350 --> 00:05:12,570
At least you might have a chance of standing up against dumb machine overlords if you understand at
65

66
00:05:12,570 --> 00:05:15,780
least a bit about how machine learning works.
66

67
00:05:15,780 --> 00:05:22,470
So the point that I'm making is that AlphaGo is an artificially intelligent program that uses machine
67

68
00:05:22,470 --> 00:05:29,550
learning more specifically various forms of deep learning including reinforcement learning to evaluate
68

69
00:05:29,730 --> 00:05:36,900
each and every move based on how likely it is that it will increase or decrease the final outcome.
69

70
00:05:36,900 --> 00:05:43,350
So one of the really interesting things about AlphaGo is that it's been trained on thousands of historical
70

71
00:05:43,350 --> 00:05:49,980
Go games, more games than a human could possibly hold in their memory, or having played in their lifetime.
71

72
00:05:50,310 --> 00:05:56,970
And through reinforcement learning, it's able to figure out under which conditions, which moves will confer
72

73
00:05:56,970 --> 00:06:04,140
it an advantage in winning. And the really interesting thing is that it's programmed to win under a binary
73

74
00:06:04,140 --> 00:06:10,800
condition, i.e., win or lose. It's trying to optimize for the winning condition, but it's not trying to
74

75
00:06:10,830 --> 00:06:14,760
optimize for winning by the largest margin.
75

76
00:06:15,270 --> 00:06:21,780
So when it's doing really well in a game, it won't necessarily try to beat you into the ground and it's
76

77
00:06:21,870 --> 00:06:24,930
only aiming for that final win condition.
77

78
00:06:24,960 --> 00:06:26,780
So this is a good point, too.
78

79
00:06:26,790 --> 00:06:33,120
So this is a good point to say that machine learning has so many applications in the real world and
79

80
00:06:33,150 --> 00:06:37,900
it's going to become increasingly important in software development.
80

81
00:06:37,920 --> 00:06:43,800
So we've spoken about some of the most common types of machine learning and it's really, really awesome
81

82
00:06:43,830 --> 00:06:49,530
that Apple is bringing it into the iOS world using CoreML.
82

83
00:06:49,560 --> 00:06:55,980
So without further ado, I think we should get into an example and get stuck in implementing CoreML
83

84
00:06:56,340 --> 00:06:59,100
in our very own app that we're going to build from scratch.
84

85
00:06:59,190 --> 00:07:03,570
So in the next module, I'm going to introduce you to all the tools and all the things that you need to
85

86
00:07:03,570 --> 00:07:06,250
download in order to get CoreML to work,
86

87
00:07:06,290 --> 00:07:09,300
and we're gonna get started creating our image recognition,
87

88
00:07:09,300 --> 00:07:11,400
Hotdog or Not Hotdog app.
88

89
00:07:11,430 --> 00:07:12,290
So I'll see you there.