﻿1
00:00:00,000 --> 00:00:00,833
Hello my friends.

2
00:00:00,833 --> 00:00:05,333
Welcome to this new practical activity
on polynomial regression.

3
00:00:05,633 --> 00:00:08,533
This time we're going to learn
how to build together

4
00:00:08,533 --> 00:00:13,333
a nonlinear regression model,
which will allow us to tackle a problem

5
00:00:13,333 --> 00:00:18,066
with a non-linear data set, meaning a data
set with nonlinear relationships

6
00:00:18,366 --> 00:00:23,100
on which therefore a multiple linear
regression model would not be relevant.

7
00:00:23,633 --> 00:00:26,900
Now we're all going to go into part
two regression,

8
00:00:26,933 --> 00:00:31,066
and this time we're going to go to section
six polynomial regression to learn

9
00:00:31,066 --> 00:00:34,733
how to build indeed this nonlinear
regression model.

10
00:00:35,200 --> 00:00:35,633
All right.

11
00:00:35,633 --> 00:00:40,100
And as usual we're going to start
with Python inside which you will find two

12
00:00:40,100 --> 00:00:45,366
files polynomial regression ipynb which is
of course your Python implementation

13
00:00:45,533 --> 00:00:48,566
which you can open either in Google Colab
or Jupyter Notebook.

14
00:00:48,866 --> 00:00:52,633
And the data set called position salaries.

15
00:00:53,266 --> 00:00:53,633
All right.

16
00:00:53,633 --> 00:00:56,700
So as usual we going to start
by describing the data set.

17
00:00:56,700 --> 00:00:59,766
Once again I'd like to remind that
this is a simple data set.

18
00:00:59,966 --> 00:01:00,933
But no worries.

19
00:01:00,933 --> 00:01:04,900
The further we progress in this course,
the more we will work with real world

20
00:01:04,900 --> 00:01:06,200
and complex data sets.

21
00:01:06,200 --> 00:01:10,500
You will see at the end we will work with
data sets with many more observations

22
00:01:10,700 --> 00:01:12,333
and more complexities.

23
00:01:12,333 --> 00:01:15,200
So what is this data set about?

24
00:01:15,200 --> 00:01:18,133
Well let's imagine the following scenario.

25
00:01:18,133 --> 00:01:21,966
Let's imagine
that we are actually an HR department

26
00:01:21,966 --> 00:01:23,900
and that we want to hire someone.

27
00:01:23,900 --> 00:01:28,033
And we actually found someone
that seems to be a great fit for the job.

28
00:01:28,033 --> 00:01:31,933
So we would like to offer this person
a position in our company.

29
00:01:32,366 --> 00:01:36,300
And so this person says yes,
but at the end of the interview process

30
00:01:36,300 --> 00:01:40,833
comes the inevitable question
what is your salary expectation?

31
00:01:41,333 --> 00:01:45,166
And let's say that this person is, you
know, very well advanced in its career,

32
00:01:45,166 --> 00:01:51,800
and therefore that person
is asking for $160,000 per year.

33
00:01:52,500 --> 00:01:55,400
And then also as HR negotiators,

34
00:01:55,400 --> 00:01:59,266
we ask this person,
why are you expecting such a high salary?

35
00:01:59,533 --> 00:02:00,966
And this person replies,

36
00:02:00,966 --> 00:02:04,700
well, that's because that's
what I earned in my previous company.

37
00:02:04,700 --> 00:02:05,800
That was my salary.

38
00:02:05,800 --> 00:02:10,166
In my previous company,
I earned $160,000 per year.

39
00:02:10,333 --> 00:02:16,000
So I'm expecting at least $160,000
per year in your company.

40
00:02:16,900 --> 00:02:19,300
Is that the truth, or is that a bluff?

41
00:02:19,300 --> 00:02:21,900
Well, that's
exactly what we're going to figure out.

42
00:02:21,900 --> 00:02:24,833
Thanks to our polynomial regression model.

43
00:02:24,833 --> 00:02:28,200
We're going to build
a polynomial regression model

44
00:02:28,466 --> 00:02:31,866
to predict the previous salary
of this candidate.

45
00:02:32,333 --> 00:02:34,200
So how are we going to do this?

46
00:02:34,200 --> 00:02:37,500
Well, of course in order to make such
a prediction we need data.

47
00:02:37,633 --> 00:02:40,633
And that's exactly the data
we collected here.

48
00:02:40,800 --> 00:02:43,166
So what is this data
and how did we collect it.

49
00:02:43,166 --> 00:02:47,600
This data is actually the different
salaries of the previous company

50
00:02:47,600 --> 00:02:51,366
for the different positions
from business analyst to CEO.

51
00:02:51,800 --> 00:02:53,600
And now how did we collect such data.

52
00:02:53,600 --> 00:02:57,433
Well you know there are many websites
online which actually display

53
00:02:57,433 --> 00:03:01,000
the different salaries
of the different positions in companies.

54
00:03:01,100 --> 00:03:03,666
I can give you an example like Glassdoor.

55
00:03:03,666 --> 00:03:07,200
Well, let's say that we did this
and that's how we collected all this

56
00:03:07,500 --> 00:03:10,500
data containing all the salaries
for the different positions

57
00:03:10,733 --> 00:03:14,366
of this previous company
for which this person worked.

58
00:03:14,433 --> 00:03:15,166
Okay.

59
00:03:15,166 --> 00:03:17,766
So we have this data
and now we need to know obviously

60
00:03:17,766 --> 00:03:21,533
which position this person
had within this previous company.

61
00:03:22,100 --> 00:03:23,500
Well that's easy.

62
00:03:23,500 --> 00:03:28,566
Let's say we went to LinkedIn and
we checked out the profile of this person.

63
00:03:28,866 --> 00:03:33,433
And we actually saw that this person
was actually a region manager.

64
00:03:33,600 --> 00:03:34,466
Okay.

65
00:03:34,466 --> 00:03:37,300
However, on the LinkedIn
we also see something else.

66
00:03:37,300 --> 00:03:40,933
It turns out that this person actually
has been a region manager

67
00:03:40,933 --> 00:03:44,933
for quite a while,
like let's say two years and therefore,

68
00:03:45,133 --> 00:03:49,533
you know, the salary of this person
should not exactly be $150,000,

69
00:03:49,533 --> 00:03:52,600
as we can see on this data set,
but instead it should be

70
00:03:52,600 --> 00:03:56,633
somewhere between 100 and $50,000,
the salary of position

71
00:03:56,633 --> 00:04:01,133
number six and $200,000
the salary of position number seven.

72
00:04:01,433 --> 00:04:04,300
So in order to extrapolate,
we're going to suppose that

73
00:04:04,300 --> 00:04:07,966
this person has a position in between 6
and 7.

74
00:04:08,133 --> 00:04:11,033
And we'll consider this position
to be 6.5,

75
00:04:11,033 --> 00:04:14,300
so that then
we can actually deploy our model,

76
00:04:14,300 --> 00:04:18,966
you know, after training it
of course, on the position level 6.5,

77
00:04:18,966 --> 00:04:22,466
so that we can get the predicted salary
of such a position level.

78
00:04:22,733 --> 00:04:27,800
And we will compare this predicted salary
to the salary expected by this person,

79
00:04:27,933 --> 00:04:30,800
to see if indeed there is truth or bluff.

80
00:04:30,800 --> 00:04:31,666
All right.

81
00:04:31,666 --> 00:04:32,600
Are you ready?

82
00:04:32,600 --> 00:04:33,533
Let's do this.

83
00:04:33,533 --> 00:04:35,966
Let's build
our polynomial regression model.
