1 00:00:00,166 --> 00:00:01,166 All right, my friends. 2 00:00:01,166 --> 00:00:03,733 Are you ready for the demo? 3 00:00:03,733 --> 00:00:07,200 I remind that this demo works for any data set, you know, 4 00:00:07,200 --> 00:00:09,100 regardless of the number of features. 5 00:00:09,100 --> 00:00:10,100 And as long as they have, 6 00:00:10,100 --> 00:00:12,200 you know, the features in the first columns 7 00:00:12,200 --> 00:00:14,800 and then the dependent variable in the last column. 8 00:00:14,800 --> 00:00:18,800 And also assuming that any missing data or categorical data 9 00:00:18,800 --> 00:00:23,000 was already taken care of thanks to your data preprocessing toolkit. 10 00:00:23,400 --> 00:00:23,800 All right. 11 00:00:23,800 --> 00:00:27,600 So this is going to be very exciting because it's really now that I'm going 12 00:00:27,600 --> 00:00:32,400 to show you the power of code templates and how you can quickly 13 00:00:32,400 --> 00:00:35,933 and efficiently select the best regression model. 14 00:00:36,566 --> 00:00:37,000 All right. 15 00:00:37,000 --> 00:00:37,833 So let's do this. 16 00:00:37,833 --> 00:00:40,066 Enough talking I'm going to proceed to the demo. 17 00:00:40,066 --> 00:00:44,533 Now just resetting everything because we're going to do something fun. 18 00:00:44,800 --> 00:00:47,800 We will actually use this run 19 00:00:47,833 --> 00:00:52,233 all option of the runtime which will run all our cells at once. 20 00:00:52,466 --> 00:00:55,066 So that, you know, we can really optimize the efficiency. 21 00:00:55,066 --> 00:01:00,533 But let's not forget to upload the data set in each of the implementations. 22 00:01:00,866 --> 00:01:04,433 Otherwise, this cell won't be able to execute. 23 00:01:04,500 --> 00:01:05,900 So we're going to upload it. 24 00:01:05,900 --> 00:01:07,933 Now it is connecting to runtime. 25 00:01:07,933 --> 00:01:10,866 And then second we should be able to see the up the button. 26 00:01:10,866 --> 00:01:11,766 There we go. 27 00:01:11,766 --> 00:01:13,966 So let's click this upload button. 28 00:01:13,966 --> 00:01:17,933 And now on your machine you're going to find the folder. 29 00:01:17,933 --> 00:01:19,400 You know the model selection folder. 30 00:01:19,400 --> 00:01:21,600 That's the whole machinery. It is it folder. 31 00:01:21,600 --> 00:01:22,200 And that's 32 00:01:22,200 --> 00:01:26,300 this new model selection folder containing you know that regression folder 33 00:01:26,300 --> 00:01:30,033 with all the good templates for regression and the classification with all the good 34 00:01:30,066 --> 00:01:31,700 templates for classification. 35 00:01:31,700 --> 00:01:35,233 If you missed that folder somehow, don't worry, it's 36 00:01:35,233 --> 00:01:37,333 worth given right before this tutorial. 37 00:01:37,333 --> 00:01:41,533 You know, in the article at the bottom you had a zip folder attached 38 00:01:41,533 --> 00:01:43,733 which you could download on your machine 39 00:01:43,733 --> 00:01:46,466 and which contains exactly the same as what I have here. 40 00:01:46,466 --> 00:01:46,833 All right. 41 00:01:46,833 --> 00:01:50,733 So now we're going to go to the regression folder which contains 42 00:01:50,733 --> 00:01:53,933 all the implementations, meaning all the code templates 43 00:01:53,933 --> 00:01:58,333 for each of your regression models, both in ipynb format, which you can open 44 00:01:58,333 --> 00:02:02,433 with either Google Colab or Jupyter Notebook, and in py format, 45 00:02:02,666 --> 00:02:06,900 which you can open with a classic Python terminal or Spyder in Anaconda. 46 00:02:07,033 --> 00:02:08,400 So you have everything 47 00:02:08,400 --> 00:02:12,666 and you also have the data set, you know, containing these four features. 48 00:02:12,666 --> 00:02:16,966 The temperature of the vacuum, the ambient pressure and the humidity. 49 00:02:17,166 --> 00:02:19,433 And we predict the energy output. 50 00:02:19,433 --> 00:02:19,733 All right. 51 00:02:19,733 --> 00:02:21,300 So that's a very classic data set. 52 00:02:21,300 --> 00:02:22,800 Once again very generic. 53 00:02:22,800 --> 00:02:26,800 Trying to represent the other future data sets you'll be working on. 54 00:02:27,000 --> 00:02:30,466 And well speaking of this data set that's exactly what we have to select here. 55 00:02:30,466 --> 00:02:35,800 So we're going to click open to upload the data set in our notebook. 56 00:02:35,800 --> 00:02:36,766 And there it is. 57 00:02:36,766 --> 00:02:40,233 And now as I told you here in the implementation you only have 58 00:02:40,466 --> 00:02:42,500 to enter the name of your data set. 59 00:02:42,500 --> 00:02:44,700 And as far as we concerned here for our demo. 60 00:02:44,700 --> 00:02:47,700 Well this data set is called data dot CSV. 61 00:02:48,000 --> 00:02:48,700 All right. 62 00:02:48,700 --> 00:02:52,266 So now we're going to quickly do exactly the same for other implementations. 63 00:02:52,633 --> 00:02:53,300 Upload 64 00:02:54,300 --> 00:02:57,266 then data dot CSV then open 65 00:02:57,266 --> 00:03:01,033 and loading it's uploading it and we'll have it in a second. 66 00:03:01,300 --> 00:03:07,000 And then we'll just replace the name of the data set here by data CSV. 67 00:03:07,266 --> 00:03:08,666 So that's for polynomial regression. 68 00:03:08,666 --> 00:03:12,933 Now for support vector regression will same upload data 69 00:03:12,933 --> 00:03:19,133 dot CSV open and loading it uploading it in a second we should have it. 70 00:03:19,133 --> 00:03:19,800 There we go. 71 00:03:19,800 --> 00:03:24,266 Now we only replace this by data dot csv. 72 00:03:24,933 --> 00:03:27,100 Then for decision tree regression. 73 00:03:27,100 --> 00:03:27,766 There we go. 74 00:03:27,766 --> 00:03:34,200 Upload then data dot csv open and we will have it in a second. 75 00:03:34,333 --> 00:03:36,166 Upload it in the notebook. 76 00:03:36,166 --> 00:03:39,200 And now replacing this by data dot CSV. 77 00:03:39,433 --> 00:03:42,266 And finally for random forest regression 78 00:03:42,266 --> 00:03:44,966 upload data CSV. 79 00:03:44,966 --> 00:03:45,966 Open. 80 00:03:45,966 --> 00:03:49,900 And now replacing this by data that is V. 81 00:03:49,933 --> 00:03:52,933 And now, my friends, we are finally ready 82 00:03:53,000 --> 00:03:55,833 to test each of our regression models 83 00:03:55,833 --> 00:03:59,333 and figure out in flashlight which one is the best. 84 00:03:59,533 --> 00:04:04,100 Remember, the closer the R-squared coefficient is to one, the better. 85 00:04:04,100 --> 00:04:05,500 Is your regression model. 86 00:04:05,500 --> 00:04:08,366 So in order to figure out which is going to be the best 87 00:04:08,366 --> 00:04:11,600 model, we'll just take the one with the highest R-squared. 88 00:04:11,600 --> 00:04:14,600 You know, with R-squared that is the closest to one. 89 00:04:14,833 --> 00:04:15,733 All right. 90 00:04:15,733 --> 00:04:16,533 Are you ready. 91 00:04:16,533 --> 00:04:19,533 Let's do this starting with multiple linear regression. 92 00:04:19,566 --> 00:04:23,200 So now we're simply going to go to run time and then run. 93 00:04:23,200 --> 00:04:26,200 Oh and all the cells are now executing. 94 00:04:26,500 --> 00:04:27,833 And there you go. 95 00:04:27,833 --> 00:04:32,700 We end up with an R-squared coefficient of oh point 93. 96 00:04:32,966 --> 00:04:33,633 Very good. 97 00:04:33,633 --> 00:04:34,533 Very good. One. 98 00:04:34,533 --> 00:04:36,700 As we can clearly see you know the predictions are amazing. 99 00:04:36,700 --> 00:04:38,766 They're very close to the real results. 100 00:04:38,766 --> 00:04:41,700 So remember this first column is the vector of predictions. 101 00:04:41,700 --> 00:04:44,433 And this second one is the vector of real results. 102 00:04:44,433 --> 00:04:47,533 And that's why here we have an amazing R-squared coefficient.