1 00:00:00,366 --> 00:00:01,300 Hello my friends. 2 00:00:01,300 --> 00:00:02,766 I hope you're feeling fantastic. 3 00:00:02,766 --> 00:00:03,933 Now, after this, 4 00:00:03,933 --> 00:00:07,800 first data preprocessing phase that we tackled in a flashlight together. 5 00:00:08,100 --> 00:00:12,033 Now we're ready for the exciting step, which is to train the multiple linear 6 00:00:12,033 --> 00:00:12,766 regression model 7 00:00:12,766 --> 00:00:16,600 on the training set that we now collected thanks to the previous step. 8 00:00:16,966 --> 00:00:21,433 But first, before we do this, I would like to answer two important questions. 9 00:00:21,800 --> 00:00:25,466 The first question is do we have to avoid, you know, 10 00:00:25,466 --> 00:00:29,066 to do something to avoid the dummy variable trap? 11 00:00:29,666 --> 00:00:34,233 And the answer is no, because indeed the multiple linear 12 00:00:34,233 --> 00:00:37,366 regression class that we're about to import 13 00:00:37,366 --> 00:00:40,966 and which will build the multiple linear regression model itself 14 00:00:40,966 --> 00:00:44,366 and actually also train it, because remember that a class actually 15 00:00:44,366 --> 00:00:48,266 can complete several actions, including building it and training it. 16 00:00:48,600 --> 00:00:52,533 And well, this class will automatically avoid this trap. 17 00:00:52,800 --> 00:00:57,000 Which means that indeed, you know one of the three first columns here. 18 00:00:57,066 --> 00:01:00,066 Because remember, Kirill explained that one is redundant. 19 00:01:00,133 --> 00:01:04,366 Well, will automatically be outcast, so you have nothing to worry about 20 00:01:04,500 --> 00:01:06,066 regarding the dummy variable trap. 21 00:01:06,066 --> 00:01:08,766 You don't have to remove one of the columns here. 22 00:01:08,766 --> 00:01:10,300 And that's the beauty of classes. 23 00:01:10,300 --> 00:01:14,000 You know, these are advanced implementations that allow you to build 24 00:01:14,000 --> 00:01:17,066 a model and machine learning model in just a few lines of code, 25 00:01:17,100 --> 00:01:20,100 and this will take care of the dummy variable trap for you. 26 00:01:20,600 --> 00:01:25,600 Now second question do we have to work on the features 27 00:01:25,600 --> 00:01:28,666 to select the best ones with, you know, the techniques 28 00:01:28,666 --> 00:01:32,500 that Kirill introduced to you, like for example, backward elimination. 29 00:01:32,766 --> 00:01:35,766 Do we have to deploy the backward elimination technique 30 00:01:35,933 --> 00:01:39,066 in order to select the features that have the highest p values? 31 00:01:39,066 --> 00:01:42,066 And and there are the most artistically significant. 32 00:01:42,533 --> 00:01:45,300 And the answer is once again no. 33 00:01:45,300 --> 00:01:46,100 Why is that? 34 00:01:46,100 --> 00:01:49,900 Well, for the exact same reason as the dummy variable trap, 35 00:01:50,166 --> 00:01:54,466 the class that we're about to call to build our multiple linear regression 36 00:01:54,466 --> 00:01:59,233 model will automatically identify the best features. 37 00:01:59,233 --> 00:02:00,000 You know, the features 38 00:02:00,000 --> 00:02:03,900 that have the highest p values, or that are the most artistically significant 39 00:02:04,166 --> 00:02:07,066 to figure out how to predict the dependent variable. 40 00:02:07,066 --> 00:02:09,966 You know, the profit with the highest accuracy. 41 00:02:09,966 --> 00:02:12,866 So once again, you don't have to worry about this. 42 00:02:12,866 --> 00:02:14,666 The class of scikit learn. 43 00:02:14,666 --> 00:02:16,900 You know, this amazing data science library 44 00:02:16,900 --> 00:02:19,800 will take care of everything for you. Okay. 45 00:02:19,800 --> 00:02:22,300 So I'm glad to answer these two questions. 46 00:02:22,300 --> 00:02:25,833 You know, the purpose of building machine learning models today is to be efficient 47 00:02:26,100 --> 00:02:29,766 because in your career, you know, I can't tell you how many times I had 48 00:02:29,766 --> 00:02:33,600 to test several machine learning models on my data sets and select the best one. 49 00:02:33,600 --> 00:02:36,633 Well, you know, if I had to deploy the backward elimination 50 00:02:36,633 --> 00:02:39,800 technique on my data set, well, I would have lost a lot of time. 51 00:02:39,900 --> 00:02:42,500 And here we have a class that takes care of everything. 52 00:02:42,500 --> 00:02:43,800 You just have to deploy 53 00:02:43,800 --> 00:02:46,866 your class on your data set, and then you will get an accuracy. 54 00:02:46,866 --> 00:02:50,966 And you will compare that accuracy with the accuracies of other models. 55 00:02:51,000 --> 00:02:53,533 And you know, that process is called model selection. 56 00:02:53,533 --> 00:02:56,533 That's what we actually cover in the portion of this course. 57 00:02:56,833 --> 00:02:58,800 So really I want you to be efficient with, 58 00:02:58,800 --> 00:03:01,800 you know, your implementations and your machine learning toolkit 59 00:03:02,033 --> 00:03:05,033 so that you can optimize your model selection process. 60 00:03:05,500 --> 00:03:05,966 All right. 61 00:03:05,966 --> 00:03:07,266 So now that we said this 62 00:03:07,266 --> 00:03:10,800 well let's build together the multiple linear regression model. 63 00:03:11,366 --> 00:03:14,066 And now actually I have some good news 64 00:03:14,066 --> 00:03:17,466 because you know we're still doing linear regression. 65 00:03:17,800 --> 00:03:20,966 And in the previous section we actually did simple linear regression 66 00:03:21,300 --> 00:03:24,166 with which we did for had one feature in our data set. 67 00:03:24,166 --> 00:03:27,266 And now we're doing multiple linear regression, which is exactly the same 68 00:03:27,266 --> 00:03:30,266 as simple linear regression except that we have several features. 69 00:03:30,600 --> 00:03:33,600 So the good news is actually that the class 70 00:03:33,600 --> 00:03:37,700 that we're about to use to build and train this multiple linear regression model 71 00:03:37,900 --> 00:03:41,466 is actually the exact same class as for the simple linear regression model. 72 00:03:41,466 --> 00:03:43,800 It will just recognize that there are several features, 73 00:03:43,800 --> 00:03:46,600 and therefore that we are doing multiple linear regression, 74 00:03:46,600 --> 00:03:48,500 but the rest will be exactly the same. 75 00:03:48,500 --> 00:03:52,433 It will be trained to understand the correlations between all your features. 76 00:03:52,433 --> 00:03:53,833 Actually I can open it here. 77 00:03:53,833 --> 00:03:58,533 All your features and the profit which is your dependent variable. 78 00:03:58,800 --> 00:04:00,900 And then it will take care of the dummy variable. 79 00:04:00,900 --> 00:04:04,066 And it will also take care of selecting the best features 80 00:04:04,066 --> 00:04:07,066 that are the most statistically significant. 81 00:04:07,133 --> 00:04:08,000 So that's the good news. 82 00:04:08,000 --> 00:04:11,166 But still I just want to re-implement this again from scratch 83 00:04:11,166 --> 00:04:14,400 to make sure it is best integrated in your head. 84 00:04:14,700 --> 00:04:15,000 All right. 85 00:04:15,000 --> 00:04:18,866 So let's close this and let's implement our multiple linear regression model. 86 00:04:19,433 --> 00:04:21,200 I actually tried to do it faster than me. 87 00:04:21,200 --> 00:04:22,200 Try to do it before me. 88 00:04:22,200 --> 00:04:24,966 You know you can press pause on this video and do it. 89 00:04:24,966 --> 00:04:27,766 And me, I'm going to do it pretty efficiently. 90 00:04:27,766 --> 00:04:31,800 So remember we have to start from the sklearn, 91 00:04:31,800 --> 00:04:34,933 the scikit learn library from which we're going to get access 92 00:04:34,933 --> 00:04:38,633 to this specific module which is linear model. 93 00:04:38,666 --> 00:04:41,600 There we go. Google Colab guesses it perfectly. 94 00:04:41,600 --> 00:04:44,600 And from this module we're going to import it. 95 00:04:44,633 --> 00:04:48,166 Well the linear regression 96 00:04:48,166 --> 00:04:51,533 class Google Collab guess is again perfectly all right. 97 00:04:51,533 --> 00:04:52,300 So that's the exact 98 00:04:52,300 --> 00:04:55,466 same class as in the previous section on simple linear regression. 99 00:04:55,833 --> 00:04:56,466 And there you go. 100 00:04:56,466 --> 00:04:58,133 We're going to do just exactly the same. 101 00:04:58,133 --> 00:05:01,500 You know, this is exactly the same code as what we did in the previous section, 102 00:05:01,766 --> 00:05:06,066 because now we're going to create a new variable which will be our multiple linear 103 00:05:06,066 --> 00:05:07,100 regression model, 104 00:05:07,100 --> 00:05:10,866 which will be created as an object of this linear regression class. 105 00:05:11,100 --> 00:05:15,700 So let's introduce this new variable regressor equals. 106 00:05:15,900 --> 00:05:19,800 Well since this will be an object of the linear regression class, I'm 107 00:05:19,800 --> 00:05:25,266 just copying this class and pasting it here and adding some parenthesis. 108 00:05:25,266 --> 00:05:25,633 All right. 109 00:05:25,633 --> 00:05:30,100 So that regressor is created as an instance of this linear regression class. 110 00:05:30,666 --> 00:05:33,666 Now the question is do we have to enter any parameters here. 111 00:05:33,933 --> 00:05:37,633 Well just like simple linear regression the answer is no. 112 00:05:37,733 --> 00:05:38,800 We are just going to keep 113 00:05:38,800 --> 00:05:42,666 the default values of the parameters inside this linear regression class. 114 00:05:43,066 --> 00:05:45,866 I will explain in a whole section parameter tuning. 115 00:05:45,866 --> 00:05:47,633 You know when you can improve your model. 116 00:05:47,633 --> 00:05:49,500 But for linear regression it's pretty simple. 117 00:05:49,500 --> 00:05:51,900 So usually we don't have anything to input here.