1
00:00:00,166 --> 00:00:01,166
All right, my friends.

2
00:00:01,166 --> 00:00:03,733
Are you ready for the demo?

3
00:00:03,733 --> 00:00:07,200
I remind that this demo works
for any data set, you know,

4
00:00:07,200 --> 00:00:09,100
regardless of the number of features.

5
00:00:09,100 --> 00:00:10,100
And as long as they have,

6
00:00:10,100 --> 00:00:12,200
you know,
the features in the first columns

7
00:00:12,200 --> 00:00:14,800
and then the dependent variable
in the last column.

8
00:00:14,800 --> 00:00:18,800
And also assuming that any missing data
or categorical data

9
00:00:18,800 --> 00:00:23,000
was already taken care of
thanks to your data preprocessing toolkit.

10
00:00:23,400 --> 00:00:23,800
All right.

11
00:00:23,800 --> 00:00:27,600
So this is going to be very exciting
because it's really now that I'm going

12
00:00:27,600 --> 00:00:32,400
to show you the power of code templates
and how you can quickly

13
00:00:32,400 --> 00:00:35,933
and efficiently select
the best regression model.

14
00:00:36,566 --> 00:00:37,000
All right.

15
00:00:37,000 --> 00:00:37,833
So let's do this.

16
00:00:37,833 --> 00:00:40,066
Enough talking I'm
going to proceed to the demo.

17
00:00:40,066 --> 00:00:44,533
Now just resetting everything
because we're going to do something fun.

18
00:00:44,800 --> 00:00:47,800
We will actually use this run

19
00:00:47,833 --> 00:00:52,233
all option of the runtime
which will run all our cells at once.

20
00:00:52,466 --> 00:00:55,066
So that, you know,
we can really optimize the efficiency.

21
00:00:55,066 --> 00:01:00,533
But let's not forget to upload the data
set in each of the implementations.

22
00:01:00,866 --> 00:01:04,433
Otherwise,
this cell won't be able to execute.

23
00:01:04,500 --> 00:01:05,900
So we're going to upload it.

24
00:01:05,900 --> 00:01:07,933
Now it is connecting to runtime.

25
00:01:07,933 --> 00:01:10,866
And then second we should be able
to see the up the button.

26
00:01:10,866 --> 00:01:11,766
There we go.

27
00:01:11,766 --> 00:01:13,966
So let's click this upload button.

28
00:01:13,966 --> 00:01:17,933
And now on your machine
you're going to find the folder.

29
00:01:17,933 --> 00:01:19,400
You know the model selection folder.

30
00:01:19,400 --> 00:01:21,600
That's the whole machinery.
It is it folder.

31
00:01:21,600 --> 00:01:22,200
And that's

32
00:01:22,200 --> 00:01:26,300
this new model selection folder containing
you know that regression folder

33
00:01:26,300 --> 00:01:30,033
with all the good templates for regression
and the classification with all the good

34
00:01:30,066 --> 00:01:31,700
templates for classification.

35
00:01:31,700 --> 00:01:35,233
If you missed that folder
somehow, don't worry, it's

36
00:01:35,233 --> 00:01:37,333
worth given right before this tutorial.

37
00:01:37,333 --> 00:01:41,533
You know, in the article at the bottom
you had a zip folder attached

38
00:01:41,533 --> 00:01:43,733
which you could download on your machine

39
00:01:43,733 --> 00:01:46,466
and which contains exactly
the same as what I have here.

40
00:01:46,466 --> 00:01:46,833
All right.

41
00:01:46,833 --> 00:01:50,733
So now we're going to go to the regression
folder which contains

42
00:01:50,733 --> 00:01:53,933
all the implementations,
meaning all the code templates

43
00:01:53,933 --> 00:01:58,333
for each of your regression models,
both in ipynb format, which you can open

44
00:01:58,333 --> 00:02:02,433
with either Google Colab
or Jupyter Notebook, and in py format,

45
00:02:02,666 --> 00:02:06,900
which you can open with a classic Python
terminal or Spyder in Anaconda.

46
00:02:07,033 --> 00:02:08,400
So you have everything

47
00:02:08,400 --> 00:02:12,666
and you also have the data set,
you know, containing these four features.

48
00:02:12,666 --> 00:02:16,966
The temperature of the vacuum,
the ambient pressure and the humidity.

49
00:02:17,166 --> 00:02:19,433
And we predict the energy output.

50
00:02:19,433 --> 00:02:19,733
All right.

51
00:02:19,733 --> 00:02:21,300
So that's a very classic data set.

52
00:02:21,300 --> 00:02:22,800
Once again very generic.

53
00:02:22,800 --> 00:02:26,800
Trying to represent the other future
data sets you'll be working on.

54
00:02:27,000 --> 00:02:30,466
And well speaking of this data set that's
exactly what we have to select here.

55
00:02:30,466 --> 00:02:35,800
So we're going to click open
to upload the data set in our notebook.

56
00:02:35,800 --> 00:02:36,766
And there it is.

57
00:02:36,766 --> 00:02:40,233
And now as I told you
here in the implementation you only have

58
00:02:40,466 --> 00:02:42,500
to enter the name of your data set.

59
00:02:42,500 --> 00:02:44,700
And as far as we concerned
here for our demo.

60
00:02:44,700 --> 00:02:47,700
Well this data set is called data dot CSV.

61
00:02:48,000 --> 00:02:48,700
All right.

62
00:02:48,700 --> 00:02:52,266
So now we're going to quickly do exactly
the same for other implementations.

63
00:02:52,633 --> 00:02:53,300
Upload

64
00:02:54,300 --> 00:02:57,266
then data dot CSV then open

65
00:02:57,266 --> 00:03:01,033
and loading it's uploading it
and we'll have it in a second.

66
00:03:01,300 --> 00:03:07,000
And then we'll just replace the name of
the data set here by data CSV.

67
00:03:07,266 --> 00:03:08,666
So that's for polynomial regression.

68
00:03:08,666 --> 00:03:12,933
Now for support
vector regression will same upload data

69
00:03:12,933 --> 00:03:19,133
dot CSV open and loading it uploading it
in a second we should have it.

70
00:03:19,133 --> 00:03:19,800
There we go.

71
00:03:19,800 --> 00:03:24,266
Now we only replace this by data dot csv.

72
00:03:24,933 --> 00:03:27,100
Then for decision tree regression.

73
00:03:27,100 --> 00:03:27,766
There we go.

74
00:03:27,766 --> 00:03:34,200
Upload then data dot csv open
and we will have it in a second.

75
00:03:34,333 --> 00:03:36,166
Upload it in the notebook.

76
00:03:36,166 --> 00:03:39,200
And now replacing this by data dot CSV.

77
00:03:39,433 --> 00:03:42,266
And finally for random forest regression

78
00:03:42,266 --> 00:03:44,966
upload data CSV.

79
00:03:44,966 --> 00:03:45,966
Open.

80
00:03:45,966 --> 00:03:49,900
And now replacing this by data that is V.

81
00:03:49,933 --> 00:03:52,933
And now, my friends, we are finally ready

82
00:03:53,000 --> 00:03:55,833
to test each of our regression models

83
00:03:55,833 --> 00:03:59,333
and figure out in flashlight
which one is the best.

84
00:03:59,533 --> 00:04:04,100
Remember, the closer the R-squared
coefficient is to one, the better.

85
00:04:04,100 --> 00:04:05,500
Is your regression model.

86
00:04:05,500 --> 00:04:08,366
So in order to figure out
which is going to be the best

87
00:04:08,366 --> 00:04:11,600
model, we'll just take the one
with the highest R-squared.

88
00:04:11,600 --> 00:04:14,600
You know, with R-squared
that is the closest to one.

89
00:04:14,833 --> 00:04:15,733
All right.

90
00:04:15,733 --> 00:04:16,533
Are you ready.

91
00:04:16,533 --> 00:04:19,533
Let's do this
starting with multiple linear regression.

92
00:04:19,566 --> 00:04:23,200
So now we're simply
going to go to run time and then run.

93
00:04:23,200 --> 00:04:26,200
Oh and all the cells are now executing.

94
00:04:26,500 --> 00:04:27,833
And there you go.

95
00:04:27,833 --> 00:04:32,700
We end up with an R-squared coefficient of
oh point 93.

96
00:04:32,966 --> 00:04:33,633
Very good.

97
00:04:33,633 --> 00:04:34,533
Very good. One.

98
00:04:34,533 --> 00:04:36,700
As we can clearly see
you know the predictions are amazing.

99
00:04:36,700 --> 00:04:38,766
They're very close to the real results.

100
00:04:38,766 --> 00:04:41,700
So remember this
first column is the vector of predictions.

101
00:04:41,700 --> 00:04:44,433
And this second one
is the vector of real results.

102
00:04:44,433 --> 00:04:47,533
And that's why here we have an amazing
R-squared coefficient.