0 1 00:00:00,690 --> 00:00:01,080 All right. 1 2 00:00:01,110 --> 00:00:08,130 So, so far, we've covered supervised learning which is where you're feeding the computational model training 2 3 00:00:08,130 --> 00:00:15,810 data that is labeled that the computer can understand and it can use it to classify or perform regressions. 3 4 00:00:16,440 --> 00:00:19,760 The next time we talked about was unsupervised learning. 4 5 00:00:19,860 --> 00:00:23,790 And this is where you feed the computational model a whole bunch of data 5 6 00:00:23,970 --> 00:00:30,270 and the computer tries to make sense of that data and tries to give it structure, be it using clustering 6 7 00:00:30,300 --> 00:00:32,400 or be it using another framework. 7 8 00:00:32,400 --> 00:00:36,630 Now, we're going to talk about reinforcement learning. 8 9 00:00:36,750 --> 00:00:39,410 You can see where I'm going with this school analogy. 9 10 00:00:39,420 --> 00:00:46,110 Now, when I was in school, growing up in China, I had a lot of reinforcement learning, from memory the various 10 11 00:00:46,110 --> 00:00:55,170 things that have been used to reinforce learning include chalk, chalkboard duster, as well as one teacher 11 12 00:00:55,200 --> 00:01:03,480 who decided to throw her high heels at me, because I refused to memorize a poem when I was age 7 because 12 13 00:01:03,540 --> 00:01:07,000 I said, "Well, it's in the book. Why don't you just read it in the book? 13 14 00:01:07,020 --> 00:01:08,760 Why do I have to memorize it?" 14 15 00:01:08,760 --> 00:01:16,370 So clearly, reinforcement learning works very well on me and it works also very well on machines. 15 16 00:01:16,380 --> 00:01:17,580 So enough about me. 16 17 00:01:17,580 --> 00:01:20,670 What exactly is reinforcement learning? 17 18 00:01:20,670 --> 00:01:26,610 Well, if you think about it, as humans, we tend to learn through reinforcement, right? 18 19 00:01:26,670 --> 00:01:32,730 If you touch something that's hot and it burns and it hurts you, then you're going to learn to not do 19 20 00:01:32,730 --> 00:01:33,710 that in the future, 20 21 00:01:33,720 --> 00:01:34,020 right/? 21 22 00:01:34,080 --> 00:01:36,540 So that's a form of negative reward. 22 23 00:01:36,540 --> 00:01:42,330 It's a way of punishing particular behaviors that lead to, probably, the demise of our species. 23 24 00:01:42,330 --> 00:01:47,730 So, imagine, if none of us had pain receptors, then we would run around hurting ourselves in all sorts 24 25 00:01:47,730 --> 00:01:48,840 of creative ways, 25 26 00:01:48,900 --> 00:01:53,220 and the human species wouldn't have lasted for long enough for me to talk to you about reinforcement 26 27 00:01:53,220 --> 00:01:54,360 learning. 27 28 00:01:54,450 --> 00:02:00,180 So this is a form of negative reward, but there's also positive rewards, 28 29 00:02:00,180 --> 00:02:04,140 then maybe you'll be rewarded by the teacher or by your parents. 29 30 00:02:04,230 --> 00:02:09,030 And if you look at dog training, you can see that one of the most effective ways of getting a dog to 30 31 00:02:09,030 --> 00:02:11,430 do something is by rewarding it. 31 32 00:02:11,730 --> 00:02:19,200 So for example, if you're trying to tell it to sit and it does sit, then you give it a treat, then it knows, 32 33 00:02:19,320 --> 00:02:24,710 "Oh, I did the right thing. I did the requested thing. I'm on the right track." 33 34 00:02:24,750 --> 00:02:30,570 So this is reinforcement learning. And one of the most famous applications of reinforcement learning 34 35 00:02:30,930 --> 00:02:33,060 is with these games like chess. 35 36 00:02:33,090 --> 00:02:39,380 So let's say that the guy on the right is a machine that is using a reinforcement learning algorithm. 36 37 00:02:39,450 --> 00:02:46,170 Even though he's making moves continuously, he's probably only getting that reinforcement at the end 37 38 00:02:46,170 --> 00:02:53,100 of the game where he finds out whether if he won or whether he lost, however, he can continuously calculate 38 39 00:02:53,340 --> 00:02:55,170 his probability of winning. 39 40 00:02:55,650 --> 00:03:03,660 So if he makes a particular move and it increases his probability of winning, then that's a positive 40 41 00:03:03,660 --> 00:03:05,460 reinforcement for that move. 41 42 00:03:05,460 --> 00:03:12,300 But if the opponent then counters that move very easily and it reduces his win probability, then that 42 43 00:03:12,330 --> 00:03:14,490 is negative reinforcement. 43 44 00:03:14,520 --> 00:03:21,600 So through many, many cycles of training through practicing many, many games, the computers are able to 44 45 00:03:21,600 --> 00:03:29,820 learn on an ongoing basis which moves in which situations are more likely to lead to an increased probability 45 46 00:03:29,850 --> 00:03:34,380 of winning, i.e., an increased probability of getting that reward. 46 47 00:03:34,830 --> 00:03:40,780 So one of the real-life applications of reinforcement learning is in Google DeeMind's AlphaGo. 47 48 00:03:40,800 --> 00:03:46,320 So this is a machine learning algorithm that uses many different types of deep learning, actually, not 48 49 00:03:46,320 --> 00:03:48,280 just reinforcement learning. 49 50 00:03:48,300 --> 00:03:56,430 So if you haven't heard the news, recently, AlphaGo has won in three games out of three against the 50 51 00:03:56,430 --> 00:03:59,250 world champion in Go. 51 52 00:03:59,250 --> 00:04:06,030 So for those you guys who don't know, Go is this incredibly simple game where you have only black and 52 53 00:04:06,030 --> 00:04:07,190 white pieces, 53 54 00:04:07,410 --> 00:04:11,590 and the aim is to surround the opponent using your own pieces, 54 55 00:04:11,640 --> 00:04:15,030 and when you surround them, you're able to eat their pieces. 55 56 00:04:15,030 --> 00:04:22,590 So it's an incredibly simple game in terms of rules. But the actual permutations on a 19 by 19 board 56 57 00:04:22,950 --> 00:04:28,800 are actually larger than the number of atoms that exist in the universe. 57 58 00:04:28,800 --> 00:04:37,890 So it is one of the most complex strategy games known to and played by man. And recently, AlphaGo, which 58 59 00:04:37,890 --> 00:04:44,430 is based on the machine learning algorithm developed by Google's DeepMind, managed to beat the world's 59 60 00:04:44,520 --> 00:04:46,780 number one player in Go. 60 61 00:04:47,250 --> 00:04:51,190 So I love this image because it says, "The future of Go." 61 62 00:04:51,480 --> 00:04:58,080 But it probably should be renamed to "The future of mankind," where man sits there looking puzzled as the 62 63 00:04:58,080 --> 00:05:03,780 singularity takes over, and we become some sort of minor race that's ruled by computers. 63 64 00:05:03,810 --> 00:05:07,170 So here's another great reason for understanding machine learning. 64 65 00:05:07,350 --> 00:05:12,570 At least you might have a chance of standing up against dumb machine overlords if you understand at 65 66 00:05:12,570 --> 00:05:15,780 least a bit about how machine learning works. 66 67 00:05:15,780 --> 00:05:22,470 So the point that I'm making is that AlphaGo is an artificially intelligent program that uses machine 67 68 00:05:22,470 --> 00:05:29,550 learning more specifically various forms of deep learning including reinforcement learning to evaluate 68 69 00:05:29,730 --> 00:05:36,900 each and every move based on how likely it is that it will increase or decrease the final outcome. 69 70 00:05:36,900 --> 00:05:43,350 So one of the really interesting things about AlphaGo is that it's been trained on thousands of historical 70 71 00:05:43,350 --> 00:05:49,980 Go games, more games than a human could possibly hold in their memory, or having played in their lifetime. 71 72 00:05:50,310 --> 00:05:56,970 And through reinforcement learning, it's able to figure out under which conditions, which moves will confer 72 73 00:05:56,970 --> 00:06:04,140 it an advantage in winning. And the really interesting thing is that it's programmed to win under a binary 73 74 00:06:04,140 --> 00:06:10,800 condition, i.e., win or lose. It's trying to optimize for the winning condition, but it's not trying to 74 75 00:06:10,830 --> 00:06:14,760 optimize for winning by the largest margin. 75 76 00:06:15,270 --> 00:06:21,780 So when it's doing really well in a game, it won't necessarily try to beat you into the ground and it's 76 77 00:06:21,870 --> 00:06:24,930 only aiming for that final win condition. 77 78 00:06:24,960 --> 00:06:26,780 So this is a good point, too. 78 79 00:06:26,790 --> 00:06:33,120 So this is a good point to say that machine learning has so many applications in the real world and 79 80 00:06:33,150 --> 00:06:37,900 it's going to become increasingly important in software development. 80 81 00:06:37,920 --> 00:06:43,800 So we've spoken about some of the most common types of machine learning and it's really, really awesome 81 82 00:06:43,830 --> 00:06:49,530 that Apple is bringing it into the iOS world using CoreML. 82 83 00:06:49,560 --> 00:06:55,980 So without further ado, I think we should get into an example and get stuck in implementing CoreML 83 84 00:06:56,340 --> 00:06:59,100 in our very own app that we're going to build from scratch. 84 85 00:06:59,190 --> 00:07:03,570 So in the next module, I'm going to introduce you to all the tools and all the things that you need to 85 86 00:07:03,570 --> 00:07:06,250 download in order to get CoreML to work, 86 87 00:07:06,290 --> 00:07:09,300 and we're gonna get started creating our image recognition, 87 88 00:07:09,300 --> 00:07:11,400 Hotdog or Not Hotdog app. 88 89 00:07:11,430 --> 00:07:12,290 So I'll see you there.