0
1
00:00:01,090 --> 00:00:06,490
In this lesson it's finally time to train our model.
1

2
00:00:06,520 --> 00:00:13,330
We're going to model our house prices using a technique called multivariable regression which is also
2

3
00:00:13,330 --> 00:00:16,220
known as multiple linear regression.
3

4
00:00:16,300 --> 00:00:18,000
But what does that mean?
4

5
00:00:18,100 --> 00:00:24,470
In a previous module we've been estimating movie revenue based on movie budgets.
5

6
00:00:24,610 --> 00:00:29,770
We were using a simple linear regression model that looked like this.
6

7
00:00:29,770 --> 00:00:36,640
We had one explanatory variable which were the movie budgets and we were busy estimating the values
7

8
00:00:36,940 --> 00:00:45,280
of these theta parameters, theta zero and theta one. When fitting more than one explanatory variable, more
8

9
00:00:45,280 --> 00:00:46,510
than one feature,
9

10
00:00:46,660 --> 00:00:53,760
the equation for the regression will look like this instead. This is the format of our model for our
10

11
00:00:53,850 --> 00:00:55,500
multivariable regression.
11

12
00:00:56,530 --> 00:00:59,860
Now, we don't have n different terms.
12

13
00:00:59,890 --> 00:01:05,180
We've got 13 features in our data set and these are the ones we're going to use.
13

14
00:01:05,200 --> 00:01:11,770
So in fact when we adapt this generic model to our specific circumstances, our equation will actually
14

15
00:01:11,770 --> 00:01:13,300
look something like this.
15

16
00:01:13,480 --> 00:01:20,200
The estimate of the price is equal to "theta0 + theta1 * RM", plus theta2 times our second 
16

17
00:01:20,200 --> 00:01:26,650
feature, plus theta3 times our third feature and so on until we get to our last one.
17

18
00:01:26,650 --> 00:01:33,910
What this equation is telling us is that our estimate of the property price will be a linear combination
18

19
00:01:34,240 --> 00:01:36,400
of these 13 features.
19

20
00:01:36,400 --> 00:01:39,640
So it's still a linear model.
20

21
00:01:39,700 --> 00:01:40,130
Okay.
21

22
00:01:40,170 --> 00:01:45,360
So this is the model for our regression that we will be fitting our data to.
22

23
00:01:45,550 --> 00:01:53,140
And when we run our Python code, we'll get a value for each of these theta parameters that we see here.
23

24
00:01:53,200 --> 00:01:57,460
These are going to be our coefficients that we're interested in.
24

25
00:01:57,460 --> 00:02:05,560
But before we write our code and run the numbers, did you ever wonder why this technique was called regression
25

26
00:02:05,740 --> 00:02:07,290
in the first place?
26

27
00:02:07,310 --> 00:02:14,320
Where does this name regression actually come from? I mean regression is such a strange word to be using
27

28
00:02:14,470 --> 00:02:16,840
for fitting a line to our data.
28

29
00:02:16,840 --> 00:02:17,170
Right?
29

30
00:02:17,710 --> 00:02:24,890
So this technique and the name actually go back a long long time to an English gentleman called Francis
30

31
00:02:25,060 --> 00:02:26,120
Galton,
31

32
00:02:26,560 --> 00:02:34,100
Sir Francis Galton in fact. Sir Galton lived in Victorian England and had quite a few talents.
32

33
00:02:34,180 --> 00:02:41,020
One of the many things he's remembered for today is his invention of the Galton board.
33

34
00:02:41,020 --> 00:02:41,600
Yep.
34

35
00:02:41,710 --> 00:02:44,550
He named this invention after himself.
35

36
00:02:44,620 --> 00:02:52,090
Anyhow, a Galton board is basically what you end up with if you were to take his Japanese pachinko machine
36

37
00:02:52,570 --> 00:02:55,820
and then extract all the fun out of it.
37

38
00:02:56,110 --> 00:03:02,000
The Galton Board is a contraption that has a bunch of balls and pins in it.
38

39
00:03:02,140 --> 00:03:09,190
When you turn this board upside down the balls go through a single opening at the top and then they
39

40
00:03:09,190 --> 00:03:11,800
start bouncing off the pins.
40

41
00:03:12,250 --> 00:03:19,210
And here's the fascinating part, what you end up with at the bottom of the board will look something
41

42
00:03:19,240 --> 00:03:22,240
like a histogram from matplotlib.
42

43
00:03:22,300 --> 00:03:28,330
In fact you get a very particular type of arrangement in the slots at the bottom.
43

44
00:03:28,630 --> 00:03:35,050
And if you squint a little bit you'll actually recognise our old friend, the normal distribution.
44

45
00:03:35,050 --> 00:03:41,140
Most of the balls end up in the middle but a very, very few balls bounce around like crazy on these pins
45

46
00:03:41,200 --> 00:03:43,920
and they end up on the sides of the board.
46

47
00:03:43,960 --> 00:03:50,270
They end up as outliers on the very, very left or on the very, very right hand edge.
47

48
00:03:50,410 --> 00:03:55,810
Needless to say a lot of statisticians at the time were very excited about this invention.
48

49
00:03:56,050 --> 00:04:00,770
But the thing is Sir Francis Galton was not a one trick pony.
49

50
00:04:00,790 --> 00:04:01,790
No sir.
50

51
00:04:01,930 --> 00:04:07,260
Being quite interested in both distributions and outliers,
51

52
00:04:07,270 --> 00:04:13,960
he looked very, very, very carefully at the sizes of things, from the sizes of the seeds of his sweet peas
52

53
00:04:14,170 --> 00:04:16,690
to the sizes of people.
53

54
00:04:16,690 --> 00:04:23,950
In particular, he was very, very interested in how size changes from one generation to the next, especially
54

55
00:04:24,070 --> 00:04:26,440
with regards to outliers.
55

56
00:04:26,440 --> 00:04:33,580
Now one thing that Sir Galton noticed was that when you had a very, very tall father and that father
56

57
00:04:33,580 --> 00:04:42,970
had a son and that son grew up to be an adult, the adult son was usually shorter than his father.
57

58
00:04:43,240 --> 00:04:47,920
Sir Galton called this phenomenon Regression to the Mean.
58

59
00:04:48,640 --> 00:04:55,680
That's where we get that word from, regression. Now you can even observe this phenomenon of the regression
59

60
00:04:55,680 --> 00:05:02,860
to the mean in say like the NBA. Professional basketball players tend to be very very tall.
60

61
00:05:03,000 --> 00:05:03,750
Right?
61

62
00:05:03,810 --> 00:05:10,890
For example Shaquille O'Neal was 7 foot 1 or 216 centimeters.
62

63
00:05:10,890 --> 00:05:14,250
At least that's what he's listed at on Wikipedia.
63

64
00:05:14,250 --> 00:05:23,130
But what about the sons of an NBA player? Shaquille O'Neal son Sharif O'Neal is slightly shorter than
64

65
00:05:23,130 --> 00:05:29,450
his father at two meters and eight centimeters or 6 foot 10.
65

66
00:05:29,450 --> 00:05:36,710
Another example is actually Michael Jordan who was listed at 6 foot 6 and Michael Jordan's son Jeffrey
66

67
00:05:36,710 --> 00:05:39,530
Jordan is listed at 6 foot 1.
67

68
00:05:39,530 --> 00:05:46,910
Now I realize that those are just two examples and there aren't that many sons of basketball players in the
68

69
00:05:46,910 --> 00:05:54,590
NBA whose height you can pull up on Wikipedia, but you'll find that this pattern holds on average the
69

70
00:05:54,830 --> 00:06:00,480
NBA players sons will end up shorter than their dads on average.
70

71
00:06:00,500 --> 00:06:03,680
This is regression to the mean in action.