1 00:00:00,133 --> 00:00:02,066 All right, here we go for that final step. 2 00:00:02,066 --> 00:00:05,966 Predicting the salary of the position level 6.5 3 00:00:05,966 --> 00:00:09,333 with both linear regression and polynomial regression. 4 00:00:09,333 --> 00:00:10,533 So we're going to start with 5 00:00:10,533 --> 00:00:14,166 linear regression first therefore creating a new code cell here. 6 00:00:14,500 --> 00:00:15,800 Let's do a step by step. 7 00:00:15,800 --> 00:00:19,700 Of course the first step you had to do here was to take your linear 8 00:00:19,700 --> 00:00:23,433 regressor object, you know, from the simple linear regression model. 9 00:00:23,700 --> 00:00:26,800 And this object was named Lin Regge. 10 00:00:27,300 --> 00:00:28,766 This one. 11 00:00:28,766 --> 00:00:31,166 And then very simply from this object 12 00:00:31,166 --> 00:00:34,166 you're going to call the predict method 13 00:00:34,500 --> 00:00:38,933 to predict the salary of the position level 6.5. 14 00:00:39,700 --> 00:00:40,166 All right. 15 00:00:40,166 --> 00:00:41,966 So what exactly do you need 16 00:00:41,966 --> 00:00:45,566 to input here in this brick method to get such a predicted salary? 17 00:00:45,833 --> 00:00:50,366 Well I'm sure some of you try to directly input 6.5, 18 00:00:50,666 --> 00:00:54,300 but unfortunately this is not that simple because indeed 19 00:00:54,300 --> 00:00:59,700 you always have to input your observations in an array, right? 20 00:00:59,700 --> 00:01:02,666 That's what is specified here in the parameters. 21 00:01:02,666 --> 00:01:06,666 The input of the brick method should be an array like or a sparse matrix. 22 00:01:06,900 --> 00:01:10,533 So an array like this for example a numpy array or a simple array 23 00:01:10,533 --> 00:01:12,666 with double pair of square brackets. 24 00:01:12,666 --> 00:01:14,266 Well you know I said it. 25 00:01:14,266 --> 00:01:17,133 That's exactly how you need to input an array. 26 00:01:17,133 --> 00:01:22,333 Because generally in Python an array is built with pairs of square brackets. 27 00:01:22,700 --> 00:01:25,500 If you add just a single pair 28 00:01:25,500 --> 00:01:29,166 of square brackets here, well that actually creates a list. 29 00:01:29,400 --> 00:01:32,033 Or you can also see it as a vector. 30 00:01:32,033 --> 00:01:35,200 But in order to create an array, well, you know, an array contains 31 00:01:35,366 --> 00:01:38,200 several dimensions like two dimensions in our case here. 32 00:01:38,200 --> 00:01:41,633 And therefore you need to add a double pair of square brackets. 33 00:01:42,066 --> 00:01:44,633 What does this double pair of square brackets mean? 34 00:01:44,633 --> 00:01:49,500 Well, the first pair of square brackets here corresponds to the first dimension, 35 00:01:49,733 --> 00:01:53,633 and the second pair of square brackets here corresponds to the second dimension. 36 00:01:53,633 --> 00:01:57,566 So the first dimension is actually corresponding to the rows in your array, 37 00:01:57,766 --> 00:02:00,833 and the second dimension is corresponding to your column. 38 00:02:00,933 --> 00:02:04,800 So for example, if I add just to show you, if I add for example five here, 39 00:02:05,066 --> 00:02:09,433 that would create actually an array of one row and two columns. 40 00:02:09,733 --> 00:02:14,133 And if I add here a comma and then another pair of square brackets, 41 00:02:14,133 --> 00:02:17,333 and then for example two and three, well this would 42 00:02:17,333 --> 00:02:20,533 create indeed an array of two rows and two columns. 43 00:02:20,700 --> 00:02:23,233 In the first row you would have 6.5 and five. 44 00:02:23,233 --> 00:02:25,666 And in second row you would have two and three. 45 00:02:25,666 --> 00:02:28,666 So you see the first square bracket here corresponds 46 00:02:28,666 --> 00:02:31,666 to the rows and the second one corresponds to the columns. 47 00:02:31,766 --> 00:02:32,033 All right. 48 00:02:32,033 --> 00:02:35,200 So let's go back to where we were exactly this. 49 00:02:35,400 --> 00:02:39,066 So this indeed is exactly the format expected by this break 50 00:02:39,133 --> 00:02:41,433 method an array of two dimensions. 51 00:02:41,433 --> 00:02:44,933 Even if this only contains one cell, you know one value. 52 00:02:44,933 --> 00:02:46,733 Well, it has to be in this format. 53 00:02:46,733 --> 00:02:49,800 And now while you're ready to get that prediction. 54 00:02:49,800 --> 00:02:53,066 So before we execute this, let's remember that 55 00:02:53,066 --> 00:02:57,733 this person asked for a 160 K salary. 56 00:02:57,733 --> 00:03:01,833 And this is justified by the fact that this person earned a 160 57 00:03:01,833 --> 00:03:05,933 K salary in the previous company, which is what we have to check right now. 58 00:03:05,933 --> 00:03:09,933 And we're going to check that first with the linear regression model, 59 00:03:09,933 --> 00:03:15,300 which returns indeed a salary of $330,000 per year. 60 00:03:15,733 --> 00:03:20,766 So if we actually use this model to negotiate with that person, 61 00:03:21,033 --> 00:03:23,233 we would actually find it super weird, right? 62 00:03:23,233 --> 00:03:26,333 Because this predicted salary is way over 63 00:03:26,333 --> 00:03:30,000 the real salary that this person had in its previous company. 64 00:03:30,333 --> 00:03:32,666 So clearly the prediction here is wrong. 65 00:03:32,666 --> 00:03:35,400 And, you know, that's what we can clearly see on the graph. 66 00:03:35,400 --> 00:03:38,266 You know, the graph above showing the results of the linear 67 00:03:38,266 --> 00:03:42,233 regression model 6.5 is somewhere around here. 68 00:03:42,466 --> 00:03:44,100 And indeed, if we want to get the prediction, 69 00:03:44,100 --> 00:03:48,200 we have to project it to that blue regression line and then projected 70 00:03:48,200 --> 00:03:51,266 again to the vertical axis, which is around here. 71 00:03:51,266 --> 00:03:52,800 And indeed that's what we get. 72 00:03:52,800 --> 00:03:55,400 You know, this is multiplied by ten to the power of six. 73 00:03:55,400 --> 00:03:58,833 So indeed we get something around 330.