1 00:00:00,033 --> 00:00:02,300 And now let's implement this new step. 2 00:00:02,300 --> 00:00:03,466 Feature scaling. 3 00:00:03,466 --> 00:00:04,400 But just before 4 00:00:04,400 --> 00:00:08,433 since we're going to apply a lot of transformations on both x and y. 5 00:00:08,600 --> 00:00:13,800 Well we're going to do some print to see the before and after the transformation. 6 00:00:14,033 --> 00:00:14,666 All right. 7 00:00:14,666 --> 00:00:16,833 So let's create two new code cells here. 8 00:00:16,833 --> 00:00:20,733 Let's first do a print of the matrix of features X. 9 00:00:21,133 --> 00:00:25,033 Then a print of the dependent variable vector y. 10 00:00:25,400 --> 00:00:27,966 Now we're going to run this cell to print x. 11 00:00:27,966 --> 00:00:30,700 And x contains of course in a 2D array. 12 00:00:30,700 --> 00:00:34,666 As you can see with a double pair of square brackets all the position level 13 00:00:34,666 --> 00:00:37,466 is going from 1 to 10. So that was all the expected. 14 00:00:37,466 --> 00:00:41,700 Then let's print y which will this time containing a one 15 00:00:41,700 --> 00:00:46,433 dimensional vector all the salaries corresponding to these position levels. 16 00:00:46,466 --> 00:00:47,700 So all good there. 17 00:00:47,700 --> 00:00:51,366 But now we will have to do a first transformation. 18 00:00:51,366 --> 00:00:54,200 And I'm not even talking about feature scaling. 19 00:00:54,200 --> 00:00:57,400 That first transformation will be to reshape 20 00:00:57,400 --> 00:01:00,600 this y into actually an array. 21 00:01:00,600 --> 00:01:02,433 You know, a two dimensional array 22 00:01:02,433 --> 00:01:06,000 where you have same the salaries displayed vertically. 23 00:01:06,300 --> 00:01:09,900 And so now the question is why do we want y to have such a format, 24 00:01:09,900 --> 00:01:11,200 you know, into 2D array. 25 00:01:11,200 --> 00:01:16,100 Well that's because you know, the standard scalar class that will perform 26 00:01:16,100 --> 00:01:19,533 standardization meaning feature scaling expects 27 00:01:19,600 --> 00:01:22,666 one unique format in its input. 28 00:01:22,700 --> 00:01:26,433 You know, when you apply the fit transform method, which is a 2D array, 29 00:01:26,700 --> 00:01:31,200 if you input here a one dimensional vector like what we have here, 30 00:01:31,466 --> 00:01:34,800 this will return an error simply because, well, 31 00:01:35,100 --> 00:01:39,633 this standard scalar class expects a 2D array as its input. 32 00:01:39,933 --> 00:01:43,600 So now we just have to transform this into a 2D array. 33 00:01:43,833 --> 00:01:47,466 And you actually know exactly how to do that because we already did it. 34 00:01:47,733 --> 00:01:48,600 I'll give you a hint. 35 00:01:48,600 --> 00:01:51,566 This was in the multiple linear regression section. 36 00:01:51,566 --> 00:01:55,033 And so now well of course I would like you to press pause on this video 37 00:01:55,233 --> 00:02:00,200 and try to transform Y or, you know, reshape y into this 38 00:02:00,333 --> 00:02:04,266 2D array with the salaries displayed vertically. 39 00:02:04,933 --> 00:02:05,266 All right. 40 00:02:05,266 --> 00:02:08,633 So we're going to create a new code cell to do this. 41 00:02:09,200 --> 00:02:11,866 And so well first we want to update y. 42 00:02:11,866 --> 00:02:15,400 And therefore you know we will start with this y equals. 43 00:02:15,533 --> 00:02:19,866 And then we'll do the necessary transformation to return this new y. 44 00:02:20,166 --> 00:02:21,900 And so how do we do this. 45 00:02:21,900 --> 00:02:25,200 So the first thing to do is to take y again from which 46 00:02:25,200 --> 00:02:30,833 we're going to call that reshape shape function into which we're going to input. 47 00:02:31,200 --> 00:02:34,466 Well you know the new shape that we would like y to have. 48 00:02:34,500 --> 00:02:36,666 And remember how we have to enter this new shape. 49 00:02:36,666 --> 00:02:38,666 Well we have to input here two elements. 50 00:02:38,666 --> 00:02:41,966 The first one being the number of rows of this new y. 51 00:02:42,000 --> 00:02:44,033 You know, this new format of y we want to have. 52 00:02:44,033 --> 00:02:46,166 And then the number of columns. 53 00:02:46,166 --> 00:02:49,100 So that's easy since we want to have the salaries displayed 54 00:02:49,100 --> 00:02:52,166 vertically, you know, in different rows actually. 55 00:02:52,466 --> 00:02:57,600 Well what we want to have for the number of rows is of course Len of y. 56 00:02:57,600 --> 00:02:59,000 You know, the length of y, 57 00:02:59,000 --> 00:03:02,900 meaning the number of elements in y, meaning the number of salaries. 58 00:03:03,066 --> 00:03:05,100 Okay. So that's the number of rows. 59 00:03:05,100 --> 00:03:08,600 And then the second element here is the number of columns. 60 00:03:08,733 --> 00:03:10,500 And of course we want one column 61 00:03:10,500 --> 00:03:14,600 because we want to display the salaries vertically in a 2D array of course. 62 00:03:14,700 --> 00:03:18,300 And therefore here we input one as in one column. 63 00:03:18,300 --> 00:03:21,066 So we're going to have actually since we have ten salaries 64 00:03:21,066 --> 00:03:24,066 we're going to have ten rows and one column. 65 00:03:24,300 --> 00:03:26,100 Okay. So it was good to do it again. 66 00:03:26,100 --> 00:03:30,066 Now you become more familiar with this reshape trick. 67 00:03:30,766 --> 00:03:31,100 Okay. 68 00:03:31,100 --> 00:03:31,633 And now of course 69 00:03:31,633 --> 00:03:35,433 we're going to print this to see and to check that everything's all right. 70 00:03:35,433 --> 00:03:39,300 And mostly that we have the right format expected by this 71 00:03:39,300 --> 00:03:42,866 standard scale class, which we'll use then to apply feature scaling. 72 00:03:43,000 --> 00:03:45,766 So there we go. Print y. 73 00:03:45,766 --> 00:03:49,833 And let's first execute this to reshape y. 74 00:03:49,833 --> 00:03:54,600 And now let's print y to check two things that first we have a 2D array. 75 00:03:54,600 --> 00:03:57,633 We can clearly see that with the double pair of square brackets. 76 00:03:57,900 --> 00:04:00,766 And also the salary is displayed vertically 77 00:04:00,766 --> 00:04:05,533 just like this matrix of features, which is also a 2D array of course. 78 00:04:06,166 --> 00:04:06,566 All right. 79 00:04:06,566 --> 00:04:08,500 So now everything's perfect. 80 00:04:08,500 --> 00:04:10,833 We're ready to apply feature scaling. 81 00:04:10,833 --> 00:04:14,466 And so we're going to do that right away starting by 82 00:04:14,466 --> 00:04:17,466 creating a new code cell here. 83 00:04:17,533 --> 00:04:18,200 All right. 84 00:04:18,200 --> 00:04:21,033 So now how are we going to do this efficiently. 85 00:04:21,033 --> 00:04:23,366 We always want to be efficient when we code. 86 00:04:23,366 --> 00:04:27,533 So of course we're going to grab our tool in our data preprocessing toolkit. 87 00:04:27,833 --> 00:04:30,433 I'm talking of course about the feature scaling tool. 88 00:04:30,433 --> 00:04:31,900 And we'll have to adapt this a bit 89 00:04:31,900 --> 00:04:34,900 because this was applied on the training and test sets. 90 00:04:34,900 --> 00:04:39,166 But no worries, we will adapt it very quickly and efficiently. 91 00:04:39,166 --> 00:04:45,366 So I'm copying this and I'm pasting that right here and now. 92 00:04:45,366 --> 00:04:48,233 Of course I would like you to please press pause again 93 00:04:48,233 --> 00:04:52,666 and try to figure out on your own what we have to modify here 94 00:04:52,666 --> 00:04:56,466 to make this feature scaling work for situation.