0 1 00:00:00,810 --> 00:00:06,450 So how does machine learning actually happen in practice? 1 2 00:00:06,450 --> 00:00:08,430 I mean it can't just be magic, right? 2 3 00:00:09,540 --> 00:00:17,460 In this lesson you and I are going to establish a useful framework for thinking about machine learning techniques. 3 4 00:00:17,520 --> 00:00:24,090 This is going to be our basis for thinking about the gradient descent algorithm. So at the most basic 4 5 00:00:24,090 --> 00:00:29,440 level what we're doing is feeding a whole bunch of data into a computer 5 6 00:00:29,520 --> 00:00:33,550 and it gives us back some solution, some answer. 6 7 00:00:33,720 --> 00:00:38,940 The thing that our computer is actually learning is the relationship in the data. 7 8 00:00:38,940 --> 00:00:44,820 So how is it that we can feed a whole bunch of data into our Python program and our program spits out 8 9 00:00:44,910 --> 00:00:49,160 a function that describes the relationship in this data? 9 10 00:00:49,290 --> 00:00:56,880 What are the steps involved in how our machine learns this mathematical function? In a very simple linear 10 11 00:00:56,880 --> 00:00:58,180 regression example, 11 12 00:00:58,200 --> 00:01:05,580 our machine learning program has had to learn the orange theta zero and the green theta one parameters 12 13 00:01:05,700 --> 00:01:07,410 in this equation. 13 14 00:01:07,410 --> 00:01:11,460 And that's going to be on the basis of the data points that it was given. 14 15 00:01:11,550 --> 00:01:16,800 So you can keep this example in mind in what I'm about to describe to you, but this framework that we're 15 16 00:01:16,800 --> 00:01:23,130 going to talk about goes well beyond regression and that's because many, many, many machine learning techniques 16 17 00:01:23,460 --> 00:01:29,040 follow pretty much the same three step process to arrive at their solution. 17 18 00:01:29,040 --> 00:01:38,320 And here it is - step one is to make a prediction, predict what exactly? Well, the coefficients in our function 18 19 00:01:38,320 --> 00:01:42,370 for example, the theta zero and theta one. Our machine is learning a function, 19 20 00:01:42,370 --> 00:01:47,900 so it has to start by predicting the coefficients in that function. 20 21 00:01:47,950 --> 00:01:56,080 Now, the very first time this happens the very first prediction is pretty much like a completely random 21 22 00:01:56,080 --> 00:01:57,060 guess. 22 23 00:01:57,160 --> 00:01:58,450 So let's move on to step two. 23 24 00:01:59,410 --> 00:02:01,270 After making the prediction, 24 25 00:02:01,270 --> 00:02:03,960 step two is calculating the error - 25 26 00:02:04,060 --> 00:02:09,580 in other words we need to measure how good the prediction was. 26 27 00:02:09,610 --> 00:02:17,670 We need to calculate how far off we were from the data and that's why we calculate the size of our error. 27 28 00:02:18,100 --> 00:02:21,640 And step three is the learning step. 28 29 00:02:21,640 --> 00:02:24,880 This is where we adjust our initial prediction. 29 30 00:02:24,880 --> 00:02:26,650 And this is the crucial part, right? 30 31 00:02:26,680 --> 00:02:28,150 First we made a prediction. 31 32 00:02:28,150 --> 00:02:34,490 Second, we compared our prediction to the data and now it's time to learn from our mistakes. 32 33 00:02:34,490 --> 00:02:34,980 Yeah. 33 34 00:02:35,110 --> 00:02:43,640 Having figured out how far off we were in the previous step, we can now make a change to the coefficients. 34 35 00:02:43,770 --> 00:02:44,870 But, we're not done just yet, 35 36 00:02:44,880 --> 00:02:45,380 right? 36 37 00:02:45,390 --> 00:02:48,930 This was only the first run through. At this point, 37 38 00:02:48,930 --> 00:02:53,280 we're going to go back to step one and make a new prediction. 38 39 00:02:53,580 --> 00:02:58,620 This new prediction is going to have our modified coefficients. 39 40 00:02:58,620 --> 00:03:05,610 So using this new prediction, we once again calculate how badly we did and calculate the error. 40 41 00:03:05,610 --> 00:03:09,630 Hopefully this time round the error is smaller than the first time round. 41 42 00:03:10,110 --> 00:03:17,010 So, having measured the error and how badly we did, we adjust our prediction once again and then rinse 42 43 00:03:17,010 --> 00:03:18,450 and repeat. 43 44 00:03:18,450 --> 00:03:26,450 So, in summary, there are three steps. Number one is predict or infer the theta values of the function. 44 45 00:03:26,470 --> 00:03:32,860 Number two is calculate the error and measure how far off we were in our prediction from the data. 45 46 00:03:32,860 --> 00:03:37,500 And step three is making an adjustment to have a smaller error 46 47 00:03:37,510 --> 00:03:44,910 the next time round, and slowly learn the best coefficients. And this is the learning process. 47 48 00:03:45,220 --> 00:03:50,380 When we're writing our Python code in this module, this is how we can think about training our machine 48 49 00:03:50,380 --> 00:03:51,760 learning model. 49 50 00:03:51,820 --> 00:03:56,890 Now there is actually a name for this kind of step by step approach that we just described. 50 51 00:03:56,890 --> 00:04:03,840 This is called an algorithm. An algorithm is a set of instructions for solving a problem. 51 52 00:04:03,910 --> 00:04:11,920 The Cambridge Dictionary defines an algorithm as a set of mathematical instructions or rules that, especially 52 53 00:04:11,920 --> 00:04:16,860 if given to a computer, will help calculate an answer to a problem. 53 54 00:04:17,840 --> 00:04:23,390 You know, the thing is you and I are probably more familiar with a different usage of this word, right? 54 55 00:04:23,390 --> 00:04:29,090 Having heard sentences like "My app uses an algorithm to predict if fans of one particular band will 55 56 00:04:29,090 --> 00:04:31,780 also like music from another band." 56 57 00:04:31,970 --> 00:04:39,890 So it's perfectly understandable that most people think that the word algorithm is actually a word used 57 58 00:04:39,890 --> 00:04:42,980 by programmers when they don't want explain what they did. 58 59 00:04:42,980 --> 00:04:49,070 So before moving on to the next lesson I'm going to leave you with a fun fact. The word algorithm actually 59 60 00:04:49,070 --> 00:04:51,580 gets his name from a guy, right? 60 61 00:04:51,620 --> 00:04:57,220 Mohammed Ibn Musa Al-Khwarizmi. Al-Khwarizmi, algorithm. 61 62 00:04:57,440 --> 00:05:02,620 Now I probably didn't pronounce that right but 825, 62 63 00:05:02,740 --> 00:05:09,290 yeah a thousand two hundred years ago this guy wrote a best selling book in mathematics and the Latin 63 64 00:05:09,290 --> 00:05:15,330 translators in the Middle Ages did an even worse job than I in pronouncing this guy's Persian name. 64 65 00:05:15,530 --> 00:05:18,970 So, that's how we get stuck with the word algorithm. 65 66 00:05:19,580 --> 00:05:23,540 So anyhow, on that bombshell, I'll see you in the next lesson. 66 67 00:05:23,540 --> 00:05:24,050 Take care.