1 00:00:00,300 --> 00:00:01,200 So let's do this. 2 00:00:01,200 --> 00:00:04,100 You can not actually really guess how we're going to do this. 3 00:00:04,100 --> 00:00:06,300 So let's implement this solution together. 4 00:00:06,300 --> 00:00:10,366 The first step is to create an object of the column transform a class. 5 00:00:10,600 --> 00:00:15,133 And therefore we're going to create a new variable here which I'm going to call c t 6 00:00:15,400 --> 00:00:19,500 which stands for you know the object of the column transform a class. 7 00:00:19,800 --> 00:00:21,533 So city equals. 8 00:00:21,533 --> 00:00:25,333 And now of course remember to create an instance or an object of a class. 9 00:00:25,566 --> 00:00:27,000 You have to call the class itself. 10 00:00:27,000 --> 00:00:29,766 So column transform or a class. 11 00:00:29,766 --> 00:00:30,900 There we go. 12 00:00:30,900 --> 00:00:33,900 Now since it is a class we have to add some parentheses 13 00:00:33,966 --> 00:00:37,200 inside which we have to enter two arguments. 14 00:00:37,200 --> 00:00:41,500 So these two arguments are first transformers where we will specify 15 00:00:41,500 --> 00:00:45,633 what kind of transformation we want to do and on which indexes the columns 16 00:00:45,633 --> 00:00:47,000 we want to transform. 17 00:00:47,000 --> 00:00:51,300 And the second argument is remainder which will specify this. 18 00:00:51,300 --> 00:00:53,266 We actually want to keep the columns 19 00:00:53,266 --> 00:00:57,400 that won't be applied some transformations meaning age and salary. 20 00:00:57,833 --> 00:00:58,500 So let's do this. 21 00:00:58,500 --> 00:01:01,600 Let's first enter the names of these two arguments. 22 00:01:01,833 --> 00:01:03,266 The first one is trend 23 00:01:04,366 --> 00:01:05,400 for mirrors. 24 00:01:05,400 --> 00:01:07,933 There we go. This one Transformers. 25 00:01:07,933 --> 00:01:09,400 We're going to add a column here. 26 00:01:09,400 --> 00:01:11,033 And then let's just add the next one. 27 00:01:11,033 --> 00:01:12,300 And then we'll input them. 28 00:01:12,300 --> 00:01:14,900 The next one is read main. 29 00:01:14,900 --> 00:01:16,900 Do this one exactly. 30 00:01:16,900 --> 00:01:18,633 And now let's input them. 31 00:01:18,633 --> 00:01:23,633 So for the first one Transformers we actually have to specify three things. 32 00:01:23,866 --> 00:01:27,066 First the kind of transformation which is encoding. 33 00:01:27,466 --> 00:01:31,500 Second, what kind of encoding we want to do, which is one hot encoding. 34 00:01:31,800 --> 00:01:34,733 And third, the indexes of the columns 35 00:01:34,733 --> 00:01:37,733 we want to encode meaning country to country column. 36 00:01:38,166 --> 00:01:40,066 So we have to actually input all that 37 00:01:40,066 --> 00:01:43,766 in a pair of square brackets and then some parenthesis. 38 00:01:44,000 --> 00:01:47,533 That's just the format expected by this Transformers argument. 39 00:01:47,933 --> 00:01:50,133 We're actually going to enter here a tuple. 40 00:01:50,133 --> 00:01:52,933 You know three elements inside these parenthesis. 41 00:01:52,933 --> 00:01:55,933 The first element is, as we said, the kind of transformation. 42 00:01:56,100 --> 00:01:59,733 And to specify that we want to do some encoding transformation, 43 00:02:00,000 --> 00:02:03,266 we have to enter here in quotes and coder. 44 00:02:03,700 --> 00:02:05,066 So that's the first element. 45 00:02:05,066 --> 00:02:06,766 Then second element. 46 00:02:06,766 --> 00:02:08,300 We simply have to enter 47 00:02:08,300 --> 00:02:12,100 the exact name of the class that will proceed to this encoding. 48 00:02:12,366 --> 00:02:15,600 So I'm going to copy this because this is going 49 00:02:15,600 --> 00:02:18,966 to be exactly the second element expected here. 50 00:02:18,966 --> 00:02:21,600 And I'm adding some parenthesis because that's a class. 51 00:02:22,633 --> 00:02:24,966 And finally the third element inside 52 00:02:24,966 --> 00:02:29,433 this tuple of parentheses is in a pair of new square brackets. 53 00:02:29,666 --> 00:02:34,033 The indexes of the columns we want to apply one hot encoding. 54 00:02:34,033 --> 00:02:35,633 You know we want to transform. 55 00:02:35,633 --> 00:02:38,366 And so these indexes of course are only one index. 56 00:02:38,366 --> 00:02:40,266 It is the index of the country column. 57 00:02:40,266 --> 00:02:43,766 The country column is the first column of our matrix of features. 58 00:02:43,933 --> 00:02:46,933 And remember that in Python indexes start at zero. 59 00:02:47,033 --> 00:02:49,200 Therefore this country column has an zero. 60 00:02:49,200 --> 00:02:51,300 And that's what we have to enter here. 61 00:02:51,300 --> 00:02:52,733 Only zero. 62 00:02:52,733 --> 00:02:53,200 All right. 63 00:02:53,200 --> 00:02:56,633 So that's all good for the first argument here. 64 00:02:56,633 --> 00:03:00,900 You know Transformers the Transformers argument is equal to all this. 65 00:03:00,900 --> 00:03:02,633 You know in square brackets. 66 00:03:02,633 --> 00:03:03,300 So great. 67 00:03:03,300 --> 00:03:06,300 And now the second argument remainder. 68 00:03:06,300 --> 00:03:09,900 So here we want to specify in quotes the following code name 69 00:03:09,900 --> 00:03:14,633 which is base through and which is a code name that will say 70 00:03:14,633 --> 00:03:19,133 that we indeed want to keep the columns that won't be applied, some transformation 71 00:03:19,133 --> 00:03:22,833 that won't be one hot encoded, which are of course age and salary. 72 00:03:23,133 --> 00:03:26,533 If we don't include this remainder equals path through here. 73 00:03:26,700 --> 00:03:30,200 Then when we apply the transformation ex, we will only keep the, 74 00:03:30,200 --> 00:03:33,933 you know, the first three columns resulting from one hot encoding. 75 00:03:34,066 --> 00:03:37,600 And of course we want to keep age and salary into our matrix of features. 76 00:03:37,600 --> 00:03:40,200 That's what it's used for okay. 77 00:03:40,200 --> 00:03:43,200 So good. Now we have our city object. 78 00:03:43,366 --> 00:03:47,533 It is of course not connected yet to our matrix features x. 79 00:03:47,766 --> 00:03:49,800 And so that's exactly what we want to do. 80 00:03:49,800 --> 00:03:52,133 But there is actually some good news this time. 81 00:03:52,133 --> 00:03:53,633 We don't have to do it in two steps. 82 00:03:53,633 --> 00:03:57,533 You know, by first calling a fit method from the object to connect our object 83 00:03:57,533 --> 00:04:00,533 to the matrix of features X, and then apply as a second set 84 00:04:00,533 --> 00:04:03,000 the transform method to apply the transformation. 85 00:04:03,000 --> 00:04:07,266 No, this time we can do it at once because our column transformer class 86 00:04:07,266 --> 00:04:11,366 actually has a method called Fit transform, which will do exactly 87 00:04:11,366 --> 00:04:15,300 the process of fitting and transforming at once at the same time, you know? 88 00:04:15,300 --> 00:04:17,966 So that's perfect. Let's use this. 89 00:04:17,966 --> 00:04:21,233 And to use this well of course we have to call first our city object, 90 00:04:21,500 --> 00:04:24,500 from which we're going to call this fit 91 00:04:24,700 --> 00:04:28,566 transform method, which will get as input. 92 00:04:28,633 --> 00:04:32,000 Well of course X because that's what we want to transform. 93 00:04:32,000 --> 00:04:35,700 We want to transform the matrix of features x inside which 1 to 1. 94 00:04:35,700 --> 00:04:38,733 How to encode the country column. Perfect. 95 00:04:39,133 --> 00:04:41,966 So now two things to understand. 96 00:04:41,966 --> 00:04:46,733 Well first, this fit transform method will of course return as output. 97 00:04:47,000 --> 00:04:48,733 The new matrix of features 98 00:04:48,733 --> 00:04:52,266 x with three columns, one hot encoding the country column. 99 00:04:52,633 --> 00:04:55,800 And therefore, since that's exactly what we want to get 100 00:04:55,800 --> 00:04:59,766 as the new matrix of features X, well we're just going to update 101 00:05:00,000 --> 00:05:01,433 this new matrix of features x. 102 00:05:01,433 --> 00:05:05,900 That's why I'm adding here x equals the result of the output of this fit 103 00:05:05,933 --> 00:05:07,133 transform method. 104 00:05:07,133 --> 00:05:09,666 But then we have to do one more thing, 105 00:05:09,666 --> 00:05:12,633 which is related to the fact that the fit transform method 106 00:05:12,633 --> 00:05:15,933 actually doesn't return the output as a numpy array, 107 00:05:16,166 --> 00:05:20,000 and that is absolutely converter it to have x as a numpy array, 108 00:05:20,100 --> 00:05:23,033 because this will be expected by the future machine 109 00:05:23,033 --> 00:05:24,933 learning models, which we're going to build. 110 00:05:24,933 --> 00:05:27,633 You know, in order to train the future machine learning models 111 00:05:27,633 --> 00:05:30,733 where we're going to use a train function which is actually called fit. 112 00:05:30,933 --> 00:05:35,866 And this train function will expect the matrix of features x as a numpy array. 113 00:05:36,066 --> 00:05:37,833 So here we want to force 114 00:05:37,833 --> 00:05:40,933 the output of this fit transform method to be a numpy array. 115 00:05:41,133 --> 00:05:45,033 And to do this we simply need to call well numpy first, which has a shortcut 116 00:05:45,033 --> 00:05:49,366 named np from which we're going to call this array function, 117 00:05:49,366 --> 00:05:53,866 which will take as input exactly the output of the fit transform method.