1 00:00:00,200 --> 00:00:01,066 Okay my friends. 2 00:00:01,066 --> 00:00:04,500 So we just covered the data preprocessing part and we covered it 3 00:00:04,500 --> 00:00:08,100 in a flash light thanks to our data preprocessing template. 4 00:00:08,400 --> 00:00:11,600 And now we're about to really build and train 5 00:00:11,600 --> 00:00:14,800 the simple linear regression model on the training set of course. 6 00:00:14,800 --> 00:00:15,100 Right. 7 00:00:15,100 --> 00:00:19,300 So remember that we splitted that data set into the training set and the test set. 8 00:00:19,533 --> 00:00:23,566 The training set will be used to train R simple linear regression model. 9 00:00:23,566 --> 00:00:26,300 And the test set will be used to evaluate it. 10 00:00:26,300 --> 00:00:28,266 So now we have to start with the training. 11 00:00:28,266 --> 00:00:29,300 And so there we go. 12 00:00:29,300 --> 00:00:31,500 Let's create a new code cell. 13 00:00:31,500 --> 00:00:35,666 And let's implement the very simple linear regression model. 14 00:00:36,300 --> 00:00:36,600 All right. 15 00:00:36,600 --> 00:00:40,700 So the first thing we'll have to do is to import the right class 16 00:00:40,866 --> 00:00:43,866 with which we're going to build this simple linear regression model. 17 00:00:43,900 --> 00:00:49,033 Because indeed we could either implement it from scratch or we could use libraries. 18 00:00:49,166 --> 00:00:52,166 And of course we're going to use libraries because I want to provide 19 00:00:52,266 --> 00:00:56,333 a very clear coat template which allows you to build 20 00:00:56,333 --> 00:00:59,333 any simple linear regression models in a flashlight. 21 00:00:59,366 --> 00:01:02,533 And this library that we're going to use is scikit learn, 22 00:01:02,700 --> 00:01:07,466 from which we're going to get access to a certain module called linear model. 23 00:01:07,466 --> 00:01:11,766 And from this module we're going to call a certain class called linear regression. 24 00:01:12,000 --> 00:01:13,966 And our simple linear regression model 25 00:01:13,966 --> 00:01:17,733 which we're going to build will be exactly an instance of this class. 26 00:01:17,733 --> 00:01:20,633 Right. It will be an object of this class. 27 00:01:20,633 --> 00:01:21,733 All right. So let's do this. 28 00:01:21,733 --> 00:01:26,033 Let's start by importing from, you know, from the scikit 29 00:01:26,033 --> 00:01:29,500 learn library which has the code name sklearn. 30 00:01:29,766 --> 00:01:34,200 So from sklearn then as we said we're going to get access to a certain module. 31 00:01:34,200 --> 00:01:39,366 So we have to add a dot here and which is linear model. 32 00:01:39,600 --> 00:01:42,400 And from this linear module of the scikit 33 00:01:42,400 --> 00:01:46,966 learn library well we're going to import the linear 34 00:01:47,266 --> 00:01:52,366 regress session class exactly this one linear regression. 35 00:01:52,766 --> 00:01:57,600 And then as we said this simple linear regression model which we're going to 36 00:01:57,600 --> 00:02:03,566 build will be an instance or an object of this linear regression class. 37 00:02:03,800 --> 00:02:07,266 And therefore here we have to create a new variable 38 00:02:07,500 --> 00:02:11,566 which will be exactly this instance of the linear regression class. 39 00:02:11,800 --> 00:02:13,066 And we're going to call this object. 40 00:02:13,066 --> 00:02:16,500 We can call it by any name but we're going to call it regressor 41 00:02:16,800 --> 00:02:19,666 because indeed we are doing right now regression. 42 00:02:19,666 --> 00:02:20,100 Right. 43 00:02:20,100 --> 00:02:23,833 I remind a big difference between regression and classification. 44 00:02:24,033 --> 00:02:25,966 Regression is when you have to predict 45 00:02:25,966 --> 00:02:30,100 a continuous real value, like a salary, as we're about to do. 46 00:02:30,400 --> 00:02:34,766 And classification is when you have to predict a category or, you know, a class, 47 00:02:34,900 --> 00:02:38,400 which we will do in part three classification or all right. 48 00:02:38,733 --> 00:02:41,733 So regressor that's a new variable which at the same time 49 00:02:41,733 --> 00:02:45,333 will become the object of the linear regression class. 50 00:02:45,333 --> 00:02:49,766 And you can exactly see this object as the linear regression model itself. 51 00:02:49,766 --> 00:02:52,900 You know, I remind that a class allows you to implement 52 00:02:53,100 --> 00:02:55,566 a couple of instructions to build something. 53 00:02:55,566 --> 00:02:56,633 And well, this linear 54 00:02:56,633 --> 00:03:00,666 regression class built exactly this simple linear regression model. 55 00:03:00,666 --> 00:03:01,033 All right. 56 00:03:01,033 --> 00:03:04,500 So you have to see this regressor object as exactly this model. 57 00:03:04,933 --> 00:03:05,800 Right. So regressor. 58 00:03:05,800 --> 00:03:08,233 And then to create an object of a class. 59 00:03:08,233 --> 00:03:09,800 Well there's nothing more simple. 60 00:03:09,800 --> 00:03:14,500 You just have to call the class itself linear regression. 61 00:03:14,800 --> 00:03:17,333 And then add some parentheses and that's it. 62 00:03:17,333 --> 00:03:20,400 Usually there are some parameters inside that we can implement. 63 00:03:20,600 --> 00:03:22,900 But here you don't have to enter anything. 64 00:03:22,900 --> 00:03:25,200 This will just create a simple linear regression model. 65 00:03:25,200 --> 00:03:27,000 And it is so simple that usually 66 00:03:27,000 --> 00:03:29,800 we don't have to play too much with the parameters. 67 00:03:29,800 --> 00:03:30,533 All right. 68 00:03:30,533 --> 00:03:35,166 And that line of code directly creates the simple linear regression model. 69 00:03:35,466 --> 00:03:37,400 And that's only the building part. 70 00:03:37,400 --> 00:03:39,633 You know we actually get a model. 71 00:03:39,633 --> 00:03:42,400 But now of course we have to train it on the training set. 72 00:03:42,400 --> 00:03:46,000 And therefore we have to connect it in some way to the training set 73 00:03:46,166 --> 00:03:50,100 and the action or, you know, the function that connects it 74 00:03:50,100 --> 00:03:53,300 in order to train it is called the fit function.