1 00:00:00,300 --> 00:00:00,833 All right. 2 00:00:00,833 --> 00:00:01,166 Great. 3 00:00:01,166 --> 00:00:04,400 So this will apply feature scaling to our two comes into training set. 4 00:00:04,566 --> 00:00:08,566 And now I hope you know what the next step is going to be. 5 00:00:08,566 --> 00:00:10,566 And you won't fall into the trap. 6 00:00:10,566 --> 00:00:13,200 Now we have to also transform 7 00:00:13,200 --> 00:00:16,700 our matrix of features to the test set, meaning x. 8 00:00:16,700 --> 00:00:19,766 Test this matrix of features, but 9 00:00:20,033 --> 00:00:24,166 since this data is like new data, which we get, you know, 10 00:00:24,200 --> 00:00:25,800 later on in production, 11 00:00:25,800 --> 00:00:29,366 well, for this data we will only apply the transform method 12 00:00:29,666 --> 00:00:33,500 because indeed the features of the test set need to be scaled 13 00:00:33,533 --> 00:00:37,500 by the same scaler that was used on the training set. 14 00:00:37,800 --> 00:00:39,400 We can not get a new scaler. 15 00:00:39,400 --> 00:00:40,833 You know, if we apply the fit transfer 16 00:00:40,833 --> 00:00:44,100 method here on X test, we would get a new scaler. 17 00:00:44,633 --> 00:00:49,466 And that would absolutely not make sense because X test will actually be the input 18 00:00:49,566 --> 00:00:53,166 of the predict function that will return the predictions, 19 00:00:53,166 --> 00:00:55,500 you know, after the machine learning model is trained. 20 00:00:55,500 --> 00:00:58,433 And since this machine learning model will be trained 21 00:00:58,433 --> 00:01:02,866 with a particular scaler, you know, the scaler applied on the training set. 22 00:01:03,233 --> 00:01:07,066 Well, in order to make predictions that will be congruent with the way 23 00:01:07,066 --> 00:01:08,066 the model was trained. 24 00:01:08,066 --> 00:01:11,966 Well, we need to apply the same scaler that was used on the training set 25 00:01:12,166 --> 00:01:16,366 onto the test set, so that we can get indeed the same transformation 26 00:01:16,600 --> 00:01:18,033 and therefore Indian. 27 00:01:18,033 --> 00:01:22,400 Some relevant predictions with the predict method applied to X test. 28 00:01:22,733 --> 00:01:26,300 So here it's clearly the transform method that must only be applied. 29 00:01:26,500 --> 00:01:28,500 And therefore what we're going to do to make it 30 00:01:28,500 --> 00:01:31,800 efficient is well we're going to copy this line of code. 31 00:01:32,000 --> 00:01:33,800 And just below we're going to paste it. 32 00:01:33,800 --> 00:01:37,033 We're going to replace of course Xtrain by X test. 33 00:01:37,366 --> 00:01:40,466 And then here as well X trained by X test. 34 00:01:40,833 --> 00:01:44,233 And then just call of course the transform method 35 00:01:44,233 --> 00:01:49,166 from that same scaler that was applied on the training set. 36 00:01:49,400 --> 00:01:52,100 Because indeed this is part of the training. Right. 37 00:01:52,100 --> 00:01:53,900 Even if we haven't started the training. 38 00:01:53,900 --> 00:01:57,766 Well, this operation that we apply here on our training set 39 00:01:58,100 --> 00:02:00,866 is, you know, the preparation of the training. 40 00:02:00,866 --> 00:02:01,200 All right. 41 00:02:01,200 --> 00:02:04,100 So I hope it's clear it's very important that you understand this. 42 00:02:04,100 --> 00:02:06,900 And now, well, I have to say congratulations, 43 00:02:06,900 --> 00:02:10,733 because we're actually done implementing our final tool. 44 00:02:10,733 --> 00:02:14,766 And of course I'm going to show you the result of feature scaling here. 45 00:02:14,766 --> 00:02:18,566 So let me create two more code cells inside 46 00:02:18,566 --> 00:02:22,633 which we're going to print first X train. 47 00:02:22,966 --> 00:02:25,366 And then let me copy this. 48 00:02:25,366 --> 00:02:28,700 And then we're going to print X test. 49 00:02:29,100 --> 00:02:30,233 All right. 50 00:02:30,233 --> 00:02:33,533 So let's first run this to apply feature scaling. 51 00:02:33,833 --> 00:02:36,200 Perfect. There we go. No execution error. 52 00:02:36,200 --> 00:02:37,833 Then let's print X train. 53 00:02:38,833 --> 00:02:42,300 And of course we get well still the same values 54 00:02:42,300 --> 00:02:46,466 for the dummy variables which are indeed still between minus three and plus three. 55 00:02:46,466 --> 00:02:51,166 But then our age and salary variables were transformed 56 00:02:51,266 --> 00:02:55,300 so that they take new values between minus two and plus two. 57 00:02:55,633 --> 00:02:58,166 Sometimes you will see values between minus three and plus three. 58 00:02:58,166 --> 00:02:59,666 Here it's minus two and plus two. 59 00:02:59,666 --> 00:03:02,800 But anyway, now all our variables 60 00:03:02,800 --> 00:03:06,433 are on the same scale and this will be perfect to improve 61 00:03:06,433 --> 00:03:10,266 or optimize the training of certain machine learning models. 62 00:03:10,266 --> 00:03:13,800 And of course you will see exactly which ones they're going to be. 63 00:03:13,966 --> 00:03:16,966 The further we progress in this machine learning course. 64 00:03:17,300 --> 00:03:18,400 So now you know everything. 65 00:03:18,400 --> 00:03:21,633 Let's also execute this cell to print X test. 66 00:03:21,633 --> 00:03:22,800 And once again will 67 00:03:22,800 --> 00:03:27,600 you still have your dummy variables here for the same to customers that were here. 68 00:03:27,733 --> 00:03:30,300 But then the age and the salary were scaled 69 00:03:30,300 --> 00:03:33,300 so that they take once again values between minus two and plus two. 70 00:03:33,633 --> 00:03:36,533 All right okay. So great. 71 00:03:36,533 --> 00:03:38,900 I'm really happy that we are now done 72 00:03:38,900 --> 00:03:42,866 with this data preprocessing toolkit, because that means only one thing. 73 00:03:42,933 --> 00:03:47,100 That means that we are ready to start the exciting steps of the journey, 74 00:03:47,300 --> 00:03:51,566 which is to build machine learning models that will perform amazing predictions. 75 00:03:51,966 --> 00:03:54,600 And we're going to start with the regression models, 76 00:03:54,600 --> 00:03:57,633 which will predict some continuous numerical values. 77 00:03:57,966 --> 00:04:00,966 And we will learn how to do that on different data sets. 78 00:04:00,966 --> 00:04:05,400 But before we move on to this next part, I just want to show you the data 79 00:04:05,400 --> 00:04:09,333 preprocessing template, which will be so useful for us 80 00:04:09,466 --> 00:04:12,866 to tackle in the Fleshlight, the data preprocessing 81 00:04:12,866 --> 00:04:15,866 phase for each of our future machine learning models. 82 00:04:15,900 --> 00:04:19,166 Because indeed, you will see that this template was made 83 00:04:19,166 --> 00:04:22,166 so that we will only have each time 84 00:04:22,200 --> 00:04:26,000 1 or 2 things to change, and most of the time one thing to change. 85 00:04:26,300 --> 00:04:31,666 Because indeed, in this template I included the three always used tools 86 00:04:31,666 --> 00:04:35,600 that we will use for our machinery models, which are importing the libraries. 87 00:04:35,600 --> 00:04:35,833 Right? 88 00:04:35,833 --> 00:04:39,066 We will always need these libraries, then importing the data set. 89 00:04:39,300 --> 00:04:40,733 And here appreciate that 90 00:04:40,733 --> 00:04:44,466 we will only have one thing to change, which will be the name of the data set, 91 00:04:44,766 --> 00:04:47,566 because indeed, this line of code will automatically 92 00:04:47,566 --> 00:04:51,733 take all the columns except the last one, meaning all your features. 93 00:04:51,900 --> 00:04:55,966 And this line of code will take automatically the dependent variable. 94 00:04:55,966 --> 00:04:59,233 So here you will only have the name of the data set to change. 95 00:04:59,533 --> 00:05:03,866 And then of course I included this tool because for most of our machinery models 96 00:05:03,866 --> 00:05:07,333 we will have to split the data set into these two separate sets. 97 00:05:07,500 --> 00:05:08,433 One training set 98 00:05:08,433 --> 00:05:12,900 to train our machinery model and one to set to evaluate its performance. 99 00:05:12,900 --> 00:05:16,866 And here, once again, we will actually have nothing to change. 100 00:05:17,100 --> 00:05:19,166 So in the whole template, 101 00:05:19,166 --> 00:05:23,333 we will only have one thing to change, which will be the name of the data set. 102 00:05:23,333 --> 00:05:27,400 And that's why this data preprocessing template will be so useful for us, 103 00:05:27,600 --> 00:05:32,400 because we will each time tackle the data preprocessing phase in flashlight. 104 00:05:32,666 --> 00:05:36,000 So make sure to have this template ready each time. 105 00:05:36,000 --> 00:05:38,566 We're going to build our future machine learning models. 106 00:05:38,566 --> 00:05:40,133 And now take a good break. 107 00:05:40,133 --> 00:05:41,100 You really deserve it. 108 00:05:41,100 --> 00:05:43,200 After this data preprocessing phase. 109 00:05:43,200 --> 00:05:46,900 And this answers to the questions that reduce any confusion. 110 00:05:47,100 --> 00:05:48,266 So digest it well. 111 00:05:48,266 --> 00:05:49,600 And as soon as you're ready 112 00:05:49,600 --> 00:05:53,366 to tackle the first branch of machine learning model, which is regression, 113 00:05:53,566 --> 00:05:56,733 well, let's continue our journey together in this next part. 114 00:05:56,933 --> 00:05:58,866 And until then, enjoy machine learning.