1 00:00:00,133 --> 00:00:00,733 All right. 2 00:00:00,733 --> 00:00:02,733 Good. So now I'm going to give you the solution. 3 00:00:02,733 --> 00:00:07,100 So first of all we're going to fit or scaler. 4 00:00:07,100 --> 00:00:10,166 You know our standardization tool on the training set. 5 00:00:10,166 --> 00:00:14,300 So I'm taking the training set first X train. 6 00:00:14,300 --> 00:00:15,466 All right. 7 00:00:15,466 --> 00:00:18,200 And then since I've just explained that we want 8 00:00:18,200 --> 00:00:21,133 apply feature scaling on the dummy variables. 9 00:00:21,133 --> 00:00:25,200 Well then that means that we will fit our standard scaler object 10 00:00:25,400 --> 00:00:30,066 only on these two columns here containing the ages and the salaries. 11 00:00:30,066 --> 00:00:33,866 And therefore here I'm going to take only the two columns here 12 00:00:33,866 --> 00:00:36,733 for the age and the salary. And then of course all the rows. 13 00:00:36,733 --> 00:00:39,033 And remember the trick to take all the rows 14 00:00:39,033 --> 00:00:42,100 we just need to add a column, which means we are taking the range 15 00:00:42,100 --> 00:00:45,100 from the lower bound to the upper bound, meaning everything. 16 00:00:45,133 --> 00:00:47,233 And then to take the columns well. 17 00:00:47,233 --> 00:00:50,000 Be careful this is what we have to look at. 18 00:00:50,000 --> 00:00:54,133 Now this because indeed we want to take this column and this one. 19 00:00:54,300 --> 00:00:57,300 And so now the question is what are the indexes of these columns. 20 00:00:57,400 --> 00:01:00,633 Well remember that indexes in Python start from zero. 21 00:01:00,666 --> 00:01:02,533 So this has index zero. 22 00:01:02,533 --> 00:01:05,333 Then the second column has an x one and x two. 23 00:01:05,333 --> 00:01:08,266 And this one has index with each column has index three. 24 00:01:08,266 --> 00:01:11,266 And the salary column has index four. 25 00:01:11,400 --> 00:01:16,000 But since I want to make this template as much generic as we can, 26 00:01:16,000 --> 00:01:19,566 and since when you one hot encode your categorical variables, 27 00:01:19,800 --> 00:01:22,633 they always automatically go as the first column. 28 00:01:22,633 --> 00:01:24,600 Well we're going to do something even better. 29 00:01:24,600 --> 00:01:28,200 We're going to specify the indexes we want here by three, 30 00:01:28,233 --> 00:01:30,400 which is the index of the h column. 31 00:01:30,400 --> 00:01:33,300 And then a simple column. Right. 32 00:01:33,300 --> 00:01:37,066 Because this will take the range from the column of the next three, 33 00:01:37,066 --> 00:01:40,100 which is the edge up to all the other columns. 34 00:01:40,100 --> 00:01:41,600 You know, there is not a minus one here. 35 00:01:41,600 --> 00:01:45,600 So this will take all the remaining columns from the H, meaning the age 36 00:01:45,733 --> 00:01:48,433 and the salary. Basically, this will take these two columns. 37 00:01:48,433 --> 00:01:52,033 And if you have a larger matrix of features with numerical values 38 00:01:52,033 --> 00:01:54,700 in your feature as well, this will just take all the columns. 39 00:01:54,700 --> 00:01:56,133 All right. So that's a little trick. 40 00:01:56,133 --> 00:01:57,966 More elegant let's say. 41 00:01:57,966 --> 00:02:02,066 And so now we are of course going to use our object 42 00:02:02,066 --> 00:02:05,066 which we called as C from which. 43 00:02:05,133 --> 00:02:09,266 Well we're going to use that fit method that will indeed 44 00:02:09,266 --> 00:02:13,966 for each feature of X train compute the mean of the feature 45 00:02:14,033 --> 00:02:17,300 meaning the mean of the age, and then the mean of the salary, 46 00:02:17,666 --> 00:02:21,333 and then compute the standard deviation of the feature, the age and the salary. 47 00:02:21,733 --> 00:02:23,700 And that's exactly what the fit method will do. 48 00:02:23,700 --> 00:02:28,700 It will only compute the mean and the standard deviation of all the values. 49 00:02:28,866 --> 00:02:34,233 And then you have the transform method that will indeed apply this formula by, 50 00:02:34,233 --> 00:02:38,066 you know, transforming each of the values here of each feature 51 00:02:38,266 --> 00:02:42,000 into this value resulting from this formula. 52 00:02:42,300 --> 00:02:42,633 All right. 53 00:02:42,633 --> 00:02:46,833 So it's important to understand the difference between fit and transform fit. 54 00:02:46,833 --> 00:02:50,066 We'll just get the mean and standard deviation of each of your features. 55 00:02:50,266 --> 00:02:54,133 And transform will apply this formula to indeed transform 56 00:02:54,133 --> 00:02:57,133 your values so that they can all be in the same scale. 57 00:02:57,266 --> 00:02:57,966 All right. 58 00:02:57,966 --> 00:03:01,066 And now the good news is that one of the methods 59 00:03:01,066 --> 00:03:04,933 of the standard scalar class is actually fit transform, 60 00:03:05,100 --> 00:03:09,400 which of course will proceed to the two tools at the same time, meaning 61 00:03:09,400 --> 00:03:13,333 it will fit your matrix of features to get the mean and standard deviation. 62 00:03:13,500 --> 00:03:15,566 And then right after that, transform 63 00:03:15,566 --> 00:03:19,100 all the values of the features to turn them into this formula. 64 00:03:19,366 --> 00:03:19,933 All right. 65 00:03:19,933 --> 00:03:22,700 So let's call this method right away to make it efficient. 66 00:03:22,700 --> 00:03:28,133 You know fit underscore transform form then some parentheses. 67 00:03:28,133 --> 00:03:32,233 And now obviously you know what to input inside this fit transform method. 68 00:03:32,566 --> 00:03:35,966 Well that's of course exactly the same as Xtrain here 69 00:03:36,533 --> 00:03:40,666 because indeed we will only apply feature scaling 70 00:03:40,700 --> 00:03:45,933 to our numerical columns here containing non-integer values. 71 00:03:45,933 --> 00:03:47,866 Right. Non dummy variables value.