1 00:00:00,300 --> 00:00:00,700 All right. 2 00:00:00,700 --> 00:00:03,500 So for us we don't have a training set of tested here. 3 00:00:03,500 --> 00:00:06,200 So I'm going to remove this line of code. 4 00:00:06,200 --> 00:00:09,166 I'm going to you know remove that as well. 5 00:00:09,166 --> 00:00:13,066 All the index selections and here as well. 6 00:00:13,066 --> 00:00:15,133 And I'm just keeping x. 7 00:00:15,133 --> 00:00:16,500 And then let's see what we have to do. 8 00:00:16,500 --> 00:00:16,766 All right. 9 00:00:16,766 --> 00:00:19,766 So first that's already more clean. 10 00:00:19,800 --> 00:00:21,766 Now with what we have here. 11 00:00:21,766 --> 00:00:25,500 You know these three lines of code, we are indeed applying feature 12 00:00:25,500 --> 00:00:29,700 scaling to the matrix of features X, which is one of the things we have to do. 13 00:00:29,700 --> 00:00:31,033 Indeed. That's good. 14 00:00:31,033 --> 00:00:34,800 However, as we've just explained, we also have to scale 15 00:00:34,866 --> 00:00:38,700 the dependent variable vector, the salaries, and of course, 16 00:00:38,700 --> 00:00:41,700 and that's the important thing you had to understand and figure out. 17 00:00:41,800 --> 00:00:45,633 We're not going to use the same standard scaler object 18 00:00:45,800 --> 00:00:49,833 on both the matrix of features X and the dependent variable vector y. 19 00:00:50,066 --> 00:00:50,866 Why is that? 20 00:00:50,866 --> 00:00:54,800 That's because, you know, when you fit your object and C 21 00:00:54,966 --> 00:00:57,866 on your data, well, it is going to compute 22 00:00:57,866 --> 00:01:01,100 the mean and the standard deviation of that same variable. 23 00:01:01,266 --> 00:01:04,333 And therefore since of course we don't have the same mean and same 24 00:01:04,333 --> 00:01:08,666 standard deviation for our levels here and our salaries, well, obviously 25 00:01:08,700 --> 00:01:13,766 we have to create two standard scaler object, one that will be fitted to X 26 00:01:13,766 --> 00:01:16,833 in order to compute the mean and standard deviation of the position levels, 27 00:01:17,033 --> 00:01:19,833 and one that will be fitted to Y to indeed 28 00:01:19,833 --> 00:01:22,866 compute the mean and the standard deviation of the salaries. 29 00:01:23,033 --> 00:01:23,366 All right. 30 00:01:23,366 --> 00:01:26,400 So that was the only important thing to understand. 31 00:01:26,500 --> 00:01:31,200 And therefore here I'm actually going to call this subject c x 32 00:01:31,533 --> 00:01:34,900 in order to say that it's the scalar of the matrix of features x. 33 00:01:35,233 --> 00:01:38,233 And here's x as well. 34 00:01:38,333 --> 00:01:41,766 And now I'm going to create a new standard scalar object, 35 00:01:42,000 --> 00:01:45,900 the one that will be used on our dependent variable vector y. 36 00:01:46,166 --> 00:01:52,233 And therefore here I'm replacing x by we're going to call it's c 37 00:01:52,233 --> 00:01:55,500 y so that it's perfectly clear that this is a scalar of x. 38 00:01:55,500 --> 00:01:57,033 And this is a scalar of y. 39 00:01:57,033 --> 00:02:02,566 And now we're going to copy this paste that right below. 40 00:02:02,833 --> 00:02:06,266 And here we'll just have three little replacements to do 41 00:02:06,333 --> 00:02:09,333 which are first replacing x here by y. 42 00:02:09,433 --> 00:02:15,133 Then x here by c y and x here by y. 43 00:02:15,433 --> 00:02:18,433 So that now we are ready to scale both 44 00:02:18,433 --> 00:02:22,333 our matrix of features x and our dependent variable vector y. 45 00:02:22,466 --> 00:02:25,466 Let's check it out. Let's run this cell. 46 00:02:25,500 --> 00:02:28,900 And now we're going to add two new code cells 47 00:02:29,066 --> 00:02:32,866 to print the new x and the new y and see what they have become. 48 00:02:33,000 --> 00:02:35,533 All right so two new code cells. 49 00:02:35,533 --> 00:02:39,300 And let's start here with a print of X. 50 00:02:39,800 --> 00:02:40,600 Good. 51 00:02:40,600 --> 00:02:43,600 And then a print of Y. 52 00:02:43,866 --> 00:02:48,000 And now let's run these two cells here starting with print x. 53 00:02:48,533 --> 00:02:51,833 And indeed well we have some perfectly scaled values 54 00:02:51,833 --> 00:02:55,233 of the position levels going from -1.5. 55 00:02:55,233 --> 00:02:57,600 And that corresponds of course to position level number 56 00:02:57,600 --> 00:03:01,933 one and 1.56 which corresponds to position level number ten. 57 00:03:02,466 --> 00:03:02,800 All right. 58 00:03:02,800 --> 00:03:04,800 So now let's print y. 59 00:03:04,800 --> 00:03:07,033 And this will be interesting. 60 00:03:07,033 --> 00:03:11,000 Here the values will go from minus 0.7 which corresponds 61 00:03:11,000 --> 00:03:15,433 of course to the salary of 45,000 for year and 2.64 62 00:03:15,433 --> 00:03:17,833 which corresponds to the $1 million salary. 63 00:03:17,833 --> 00:03:19,166 And as you can see here, this time 64 00:03:19,166 --> 00:03:23,000 the values are in the range from minus one to plus three. 65 00:03:23,000 --> 00:03:27,400 That's why in the data preprocessing part I told you that usually standardization 66 00:03:27,600 --> 00:03:30,800 transform your values between minus three and plus three.