0
1
00:00:00,630 --> 00:00:07,530
In a previous lesson we've already written this line of code "from sklearn.linear_model import
1

2
00:00:07,530 --> 00:00:08,980
LinearRegression".
2

3
00:00:09,060 --> 00:00:14,610
In other words we've already imported our linear regression functionality into this Python intro notebook.
3

4
00:00:15,630 --> 00:00:16,590
For consistency,
4

5
00:00:16,620 --> 00:00:21,350
let's follow the same pattern that we employed when we were estimating our movie revenue.
5

6
00:00:21,450 --> 00:00:27,510
We're going to create a variable called "regr" and this variable is going to store our linear regression
6

7
00:00:27,510 --> 00:00:28,020
object.
7

8
00:00:31,280 --> 00:00:32,830
To run our regression,
8

9
00:00:32,840 --> 00:00:36,710
all we have to do is call the good old fit method.
9

10
00:00:36,710 --> 00:00:40,250
So we're gonna see regr.fit,
10

11
00:00:40,250 --> 00:00:47,840
open the parentheses and then for our explanatory or independent variable we're going to use the amount
11

12
00:00:47,840 --> 00:00:49,670
of drugs in the tissue.
12

13
00:00:49,670 --> 00:00:53,550
So we're gonna type LSD and then put a comma after it,
13

14
00:00:53,690 --> 00:01:01,070
and now we can add our dependent variable or the values that we're going to try to predict.
14

15
00:01:01,070 --> 00:01:07,040
In this case this is the score. And that's it.
15

16
00:01:07,070 --> 00:01:13,430
Now if you remember previously our explanatory variable was called capital X and our dependent variable
16

17
00:01:13,430 --> 00:01:17,270
was called lowercase y.
17

18
00:01:17,270 --> 00:01:24,570
Now remember our scikit-learn's fit method essentially computes the parameters of this equation.
18

19
00:01:24,650 --> 00:01:29,870
We've got our theta zero which is our intercept and we've got our theta one which is our coefficient
19

20
00:01:29,990 --> 00:01:33,110
in front of our explanatory variable.
20

21
00:01:33,170 --> 00:01:37,380
Let's see what happens now when we try to run our regression.
21

22
00:01:37,910 --> 00:01:40,220
We in fact get an error.
22

23
00:01:40,370 --> 00:01:49,080
Looking down, we see that we get a value error: "Found input variables with inconsistent numbers of samples".
23

24
00:01:49,080 --> 00:01:50,520
Now this is odd, right?
24

25
00:01:50,520 --> 00:01:53,900
Why would there be an inconsistent number of samples?
25

26
00:01:53,940 --> 00:01:58,780
We've got seven rows in each of our two columns.
26

27
00:01:58,980 --> 00:02:03,660
We've got seven rows in LSD and we've got seven rows in the math score.
27

28
00:02:03,660 --> 00:02:06,810
So why would there be an inconsistent number of samples?
28

29
00:02:06,890 --> 00:02:11,030
Now, even though this error message seems a little bit like a red herring,
29

30
00:02:11,280 --> 00:02:16,350
the reason that we're getting this problem actually has to do with the fact that we are working with
30

31
00:02:16,350 --> 00:02:24,690
series and not data frames. We're getting this error because this notation here data[],
31

32
00:02:25,230 --> 00:02:33,150
and then the column name actually extracts an object of type series instead of an object of type data
32

33
00:02:33,150 --> 00:02:34,360
frame.
33

34
00:02:34,350 --> 00:02:40,470
Now, if you recall, the way to get a data frame from another data frame is to double up on the square
34

35
00:02:40,500 --> 00:02:41,520
brackets.
35

36
00:02:41,550 --> 00:02:51,270
So if we add an additional pair of square brackets, this notation now will extract an object of type
36

37
00:02:51,480 --> 00:02:55,770
data frame and store it in LSD and in score.
37

38
00:02:55,800 --> 00:03:03,600
So if I press Shift+Enter now, we are now no longer working with series we are working with data frames.
38

39
00:03:04,290 --> 00:03:09,750
Going back down to where we're fitting our regression I can press Shift+Enter again to rerun this line
39

40
00:03:09,750 --> 00:03:15,270
of code and we can see that our regression runs without a problem.
40

41
00:03:15,270 --> 00:03:17,190
Now you don't have to take my word for it
41

42
00:03:17,220 --> 00:03:23,550
regarding the change in types - you can double check this yourself. So you can always add a cell below
42

43
00:03:23,940 --> 00:03:32,820
and check the types, so if I say type(LSD), then I see it is a data frame and if I take away the square brackets,
43

44
00:03:33,060 --> 00:03:40,380
press Shift+Enter and rerun this again, I can see that it is a series in this case and this is what we've
44

45
00:03:40,710 --> 00:03:44,600
talked about when working with data frames in the previous lesson.
45

46
00:03:44,640 --> 00:03:50,340
So now that we've successfully fitted our regression, let's take a look at the values of this theta one
46

47
00:03:50,460 --> 00:03:52,650
and this theta zero parameter.
47

48
00:03:52,650 --> 00:03:58,890
If you remember the way we did this in the past it was looking at the attribute of our regression object -
48

49
00:03:59,370 --> 00:04:03,480
the attribute in question was called "coef" with an underscore.
49

50
00:04:03,810 --> 00:04:11,000
So let's add that here "regr.coef_" and let's hit Shift+Enter, and see what happens.
50

51
00:04:11,160 --> 00:04:15,610
Our Jupyter notebook will print out array, then parentheses,
51

52
00:04:15,630 --> 00:04:22,840
then one set of square brackets, two set of square brackets and then the value of our theta one parameter.
52

53
00:04:24,010 --> 00:04:28,970
So we can see that our coefficient is stored inside an array.
53

54
00:04:29,110 --> 00:04:31,420
It's an array of one element.
54

55
00:04:31,480 --> 00:04:32,710
Let's pick that element out.
55

56
00:04:32,740 --> 00:04:38,500
So I'm going to say [0] to access the first element in the array.
56

57
00:04:38,620 --> 00:04:41,860
Let's see what happens when I press Shift+Enter now.
57

58
00:04:41,900 --> 00:04:42,560
Huh.
58

59
00:04:44,080 --> 00:04:50,680
In this case one of the square brackets has disappeared, but we're still left with an array.
59

60
00:04:50,680 --> 00:04:55,130
We're not yet able to access the number inside directly.
60

61
00:04:55,180 --> 00:04:59,400
We're still getting a collection containing just one element.
61

62
00:04:59,710 --> 00:05:02,350
So you might ask, did this just work?
62

63
00:05:02,350 --> 00:05:06,860
We access the first element of our array and we still got an array.
63

64
00:05:06,940 --> 00:05:09,210
It seems like a bug, right?
64

65
00:05:09,310 --> 00:05:13,810
Well the answer is is that we have to go one level deeper to get the raw value.
65

66
00:05:13,840 --> 00:05:17,080
You've probably noticed that there was one square bracket less, right?
66

67
00:05:17,080 --> 00:05:24,220
So if I don't have this at the end, I have two square brackets but if I do access an element inside my
67

68
00:05:24,220 --> 00:05:26,490
array I have one square bracket.
68

69
00:05:27,400 --> 00:05:30,640
And the reason we get this is that we have to go two levels deep.
69

70
00:05:30,640 --> 00:05:34,910
We actually have an array of arrays.
70

71
00:05:35,140 --> 00:05:42,480
Mind blown, right? Our coefficient is buried inside an array that's inside another array.
71

72
00:05:42,490 --> 00:05:44,380
So how do we access an array inside an array?
72

73
00:05:44,650 --> 00:05:50,980
Well, we can access the first element which is the array and then we can access the first element of
73

74
00:05:50,980 --> 00:05:53,870
that array again to get the raw value.
74

75
00:05:54,980 --> 00:05:59,860
And this is how you would access a particular value of a nested array.
75

76
00:05:59,920 --> 00:06:01,500
Let me add this to a print statement.
76

77
00:06:01,510 --> 00:06:15,840
So I'm going to say "print", and just say "Theta 1 : "comma and then close our brackets here.
77

78
00:06:15,840 --> 00:06:18,720
So I'm going to add this to a print statement like this.
78

79
00:06:18,730 --> 00:06:24,340
Now let's take a look at our intercept - our intercept was the intercept_ attribute from
79

80
00:06:24,370 --> 00:06:32,440
our regression object. So I can say "regr.intercept_" and  press Shift+Enter and we see that
80

81
00:06:32,890 --> 00:06:37,530
our intercept is also inside a collection that's also inside of an array.
81

82
00:06:37,840 --> 00:06:45,430
But there's only one set of square brackets here, so we can access the raw value inside just having that
82

83
00:06:45,520 --> 00:06:50,560
[0] following the name of the attribute.
83

84
00:06:50,560 --> 00:06:59,370
I hit Shift+Enter we see the raw value printed out. And again I can wrap this inside a print statement
84

85
00:07:01,130 --> 00:07:11,560
'Intercept: ', regr.intercept_intercept[0], and close the parentheses at the end.
85

86
00:07:11,830 --> 00:07:12,280
There we go.
86

87
00:07:12,280 --> 00:07:13,750
So here's our intercept.
87

88
00:07:13,750 --> 00:07:17,500
Now what about the goodness of fit or our R squared.
88

89
00:07:17,500 --> 00:07:23,560
To find out how much of the variation in our data is explained by the amount of drugs in the volunteers
89

90
00:07:23,560 --> 00:07:24,550
tissue,
90

91
00:07:24,550 --> 00:07:28,050
we call the score method on our regression.
91

92
00:07:28,150 --> 00:07:33,160
So we type "regr.score" and then we have to provide two values.
92

93
00:07:33,310 --> 00:07:41,740
One is our explanatory variable and the other one is our dependent variable - which was score. So 
93

94
00:07:41,760 --> 00:07:45,200
regr.score(LSD, score)
94

95
00:07:45,480 --> 00:07:53,190
We're going to print this out, and we see that our R squared is approximately 0.88. Let's wrap this inside a print
95

96
00:07:53,190 --> 00:08:05,270
statement as well - " 'R-Square: ', " - there we go.
96

97
00:08:05,270 --> 00:08:11,330
So in this cell we fitted our regression, so we've run our machine learning model and we're printing
97

98
00:08:11,330 --> 00:08:18,470
out a couple of stats about our regression. A couple of the statistics that describe
98

99
00:08:18,680 --> 00:08:25,130
what went on with the calculation. One of them is the coefficient, another one is the intercept of our
99

100
00:08:25,130 --> 00:08:30,120
line and another one is the R-squared or the goodness of fit.
100

101
00:08:30,320 --> 00:08:36,440
So we've got some basic information about our regression and we see that the amount of drugs in the
101

102
00:08:36,440 --> 00:08:44,120
contestants tissue explains close to 88% of the math test performance and we also see
102

103
00:08:44,120 --> 00:08:51,650
that for every increase in LSD parts per million, our volunteers math performance was approximately 9
103

104
00:08:51,650 --> 00:08:57,420
percent worse than the control - this is what the theta one is telling us.
104

105
00:08:58,070 --> 00:09:03,770
Now even though this is all very well and good, it'd be really nice to represent this graphically because
105

106
00:09:03,950 --> 00:09:10,880
we like pictures - pictures are very very important for making sense of data so let's create another plot.
106

107
00:09:11,610 --> 00:09:13,760
I'm going to do this in the cell below.
107

108
00:09:13,790 --> 00:09:20,150
Now one thing that you've already seen a little bit in the Python code is that when creating nice looking
108

109
00:09:20,150 --> 00:09:25,230
graphs it's a two part process. In the first part,
109

110
00:09:25,250 --> 00:09:30,290
we do all the styling and in the second part we plot the data and show it off.
110

111
00:09:30,290 --> 00:09:32,880
So what I'm going to do is I'm going to do the second part first.
111

112
00:09:32,920 --> 00:09:37,160
I'm going to plot the data and then I'm going to add my styling code later on.
112

113
00:09:37,160 --> 00:09:46,460
So plotting the data as it is I can write plt.scatter and then provide the inputs to our scatter
113

114
00:09:46,460 --> 00:09:54,010
plot and that's going to be the LSD parts per million, comma, and then the score.
114

115
00:09:54,050 --> 00:09:55,390
These are the math scores.
115

116
00:09:55,490 --> 00:09:57,950
So let's see what this plot looks like
116

117
00:09:57,950 --> 00:10:03,870
by adding by adding plt.show() beneath.
117

118
00:10:04,040 --> 00:10:04,700
Here we go.
118

119
00:10:04,700 --> 00:10:09,800
This is what our plot looks like before we've done any styling on it.
119

120
00:10:09,890 --> 00:10:12,080
Now I think this chart looks,
120

121
00:10:12,440 --> 00:10:19,540
I think this looks super ugly actually so we're going to have to do something about this. For starters
121

122
00:10:19,630 --> 00:10:23,410
let's add some arguments by keyword to this plot.
122

123
00:10:23,410 --> 00:10:29,680
So in your scatter method you're going to put a comma at the end after score and then write
123

124
00:10:29,690 --> 00:10:33,600
color = 'blue'
124

125
00:10:33,790 --> 00:10:40,670
Let's hit Shift+Enter to see what it looks like. Now we've got our data points in blue.
125

126
00:10:40,790 --> 00:10:44,920
So this is a slight improvement to the black and white version.
126

127
00:10:44,930 --> 00:10:46,880
Now we don't have many dots on here.
127

128
00:10:46,880 --> 00:10:48,930
We don't have many, many data points.
128

129
00:10:49,100 --> 00:10:55,860
So let's increase the size of these individual dots on our chart.
129

130
00:10:56,240 --> 00:10:59,920
And I want to leave this to you as a challenge.
130

131
00:11:00,020 --> 00:11:06,640
So I've got the documentation of the scatter method up in front of you right now.
131

132
00:11:06,890 --> 00:11:13,280
And what I would like you to do is I'd like you to look at this documentation and see if you can figure
132

133
00:11:13,280 --> 00:11:17,930
out how to increase the size of these data points
133

134
00:11:18,140 --> 00:11:25,100
and also maybe add some transparency - in other words, instead of having it a solid blue color make those
134

135
00:11:25,100 --> 00:11:28,290
blue dots slightly transparent.
135

136
00:11:28,430 --> 00:11:31,330
I'll give you a few seconds to pause the video.
136

137
00:11:31,520 --> 00:11:39,400
The hint I'll give you is that it's going to be in the keyword arguments of the scatter function. And,
137

138
00:11:39,400 --> 00:11:41,020
here's the solution.
138

139
00:11:41,080 --> 00:11:45,530
So we wanted to increase the size of our dots.
139

140
00:11:45,700 --> 00:11:49,790
So the way to do this is to look at these keyword arguments.
140

141
00:11:49,810 --> 00:12:00,040
So, for example, "s" is the size in points of our dots and the transparency is this alpha value here.
141

142
00:12:00,370 --> 00:12:08,220
The alpha value will be between zero which is transparent and one which is opaque. Coming back to our
142

143
00:12:08,220 --> 00:12:09,100
Python code,
143

144
00:12:09,150 --> 00:12:13,200
we can add these key word arguments to our scatter method.
144

145
00:12:13,260 --> 00:12:23,070
So after 'blue', I'm going to add a comma and then I'm going to see "s=" and let's experiment here
145

146
00:12:23,070 --> 00:12:23,900
a little bit.
146

147
00:12:23,940 --> 00:12:28,390
So what happens if I say "s = 500" and hit Shift+Enter?
147

148
00:12:28,640 --> 00:12:36,180
I get enormous blue dots. This actually doesn't look half bad but I think I'm going to go with something
148

149
00:12:36,180 --> 00:12:44,440
like, maybe 100 is the right value here. So that's the size of our data points covered.
149

150
00:12:44,870 --> 00:12:46,640
Let's change our transparency now.
150

151
00:12:46,820 --> 00:12:50,050
This was in the alpha parameter. So I'm going to say
151

152
00:12:50,060 --> 00:12:52,640
"alpha = ", I don't know,
152

153
00:12:52,760 --> 00:12:58,470
0.7 - it's going to be a value between 0 and 1, remember?
153

154
00:12:58,550 --> 00:13:06,680
So hitting Shift+Enter, I get a nice little bit of transparency here on my data points which are now
154

155
00:13:06,680 --> 00:13:08,960
little bit larger so we can actually tell what's going on.
155

156
00:13:09,980 --> 00:13:11,570
OK, so I'm going to leave it at that.
156

157
00:13:11,660 --> 00:13:19,370
And now I'm going to add some labels to our chart and make it look a little nicer. Since we've done this
157

158
00:13:19,370 --> 00:13:19,790
before,
158

159
00:13:19,790 --> 00:13:23,870
I'm going to leave this to you as a challenge so you can return and remember the Python code that you
159

160
00:13:23,870 --> 00:13:24,580
wrote.
160

161
00:13:24,770 --> 00:13:33,260
Can you set the title of the plot as a whole as "Arithmetic vs LSD-25" and then add some labels on
161

162
00:13:33,260 --> 00:13:40,070
the side - one for the x axis that reads "Tissue LSD ppm" and one for the y axis that reads
162

163
00:13:40,070 --> 00:13:40,990
"Performance Score"?
163

164
00:13:43,740 --> 00:13:45,720
And here's the solution.
164

165
00:13:45,720 --> 00:13:53,490
So we take our plotting object, put a dot after it and write "title()", and then provide the string
165

166
00:13:54,400 --> 00:14:08,010
"Arithmetic vs LSD 25" and then we do the same for the labels on the x axis and y axis, so plt.xlabel(
166

167
00:14:08,010 --> 00:14:17,490
"Tissue LSD ppm") and plt.ylabel(
167

168
00:14:21,960 --> 00:14:26,460
"Performance Score").
168

169
00:14:26,460 --> 00:14:27,520
There we go.
169

170
00:14:27,510 --> 00:14:34,410
Let's  hit Shift+Enter and take a look at what this looks like and we see that may be the thing to do is to increase
170

171
00:14:34,590 --> 00:14:38,470
the font size a little bit on these three labels.
171

172
00:14:38,580 --> 00:14:46,140
I think in our previous chart 17 for the title and 14 for the labels worked really well. So I'm going to say 
172

173
00:14:46,140 --> 00:14:57,110
"fontsize=17" for the title, and then I'm going to add another keyword argument to our X labels and Y labels
173

174
00:14:57,380 --> 00:15:05,730
"fontsize = 14" and "fontsize = 14" again.
174

175
00:15:06,090 --> 00:15:07,650
So let's take a look.
175

176
00:15:07,800 --> 00:15:10,240
That's starting to look pretty good.
176

177
00:15:10,350 --> 00:15:13,000
Now to round things off a little bit,
177

178
00:15:13,260 --> 00:15:16,060
we can try again setting a limit on the range.
178

179
00:15:16,080 --> 00:15:17,740
So ylim
179

180
00:15:19,160 --> 00:15:34,850
is gonna be between 25 and 85, 25 and xlim is gonna be between maybe 1 and 6.5
180

181
00:15:34,850 --> 00:15:38,710
Yeah.
181

182
00:15:38,800 --> 00:15:40,600
Doesn't need to go all the way to seven,
182

183
00:15:40,600 --> 00:15:46,710
I reckon. And for the style maybe "plt.style.use"
183

184
00:15:46,810 --> 00:15:50,890
then we can choose our good old friend
184

185
00:15:51,160 --> 00:15:59,480
'fivethirtyeight'. Let's hit Shift+Enter and to apply the changes.
185

186
00:15:59,690 --> 00:16:02,130
I don't think that worked.
186

187
00:16:02,150 --> 00:16:03,150
Let's try again.
187

188
00:16:03,350 --> 00:16:04,370
Okay, so here we go.
188

189
00:16:04,400 --> 00:16:11,240
This is how it would look like with our styling as it is currently.
189

190
00:16:11,540 --> 00:16:14,420
We've got our range set.
190

191
00:16:14,670 --> 00:16:21,250
We've got our colors set and we've got the font size set as well.
191

192
00:16:21,270 --> 00:16:29,520
At the very top of the cell I'm going to add again this little percentage sign and write 'matplotlib
192

193
00:16:30,150 --> 00:16:31,470
inline'.
193

194
00:16:31,470 --> 00:16:41,900
And what this does is it tells Jupyter notebook to export this graph as it is when we say File > Download
194

195
00:16:41,900 --> 00:16:44,120
as > Notebook.
195

196
00:16:44,120 --> 00:16:50,790
So there's really only one thing left to do which is plotting our regression line on here.
196

197
00:16:51,200 --> 00:16:58,070
Because at the moment we've got our data points we've got our chart nicely formatted and looking good,
197

198
00:16:58,250 --> 00:17:06,350
all we have to do now is plot our predictions from our machine learning model on here. So our machine
198

199
00:17:06,350 --> 00:17:16,340
learning model will have a prediction for every level of LSD tissue concentration in the data set. To
199

200
00:17:16,340 --> 00:17:17,720
get hold of these predictions,
200

201
00:17:17,720 --> 00:17:22,990
we use a method called predict so we would write "regr.
201

202
00:17:23,240 --> 00:17:26,440
predict",
202

203
00:17:26,790 --> 00:17:31,410
And as a parameter here, as an argument here,
203

204
00:17:31,410 --> 00:17:35,790
we would supply the LSD tissue concentration.
204

205
00:17:35,790 --> 00:17:43,790
So this predicts a math score based on the amount of drugs in the tissue.
205

206
00:17:43,800 --> 00:17:46,710
Now we'll want to store that information somewhere.
206

207
00:17:46,710 --> 00:17:53,940
So I'm going to create a variable called "predicted_score" and set it equal to the output
207

208
00:17:54,540 --> 00:17:57,770
from this method right here.
208

209
00:17:57,780 --> 00:18:02,570
Now remember, you've got a press Shift+Enter to actually run this cell.
209

210
00:18:02,760 --> 00:18:07,170
Otherwise the cells below won't know about this code that we've just written.
210

211
00:18:07,370 --> 00:18:09,120
So I'm going to hit Shift+Enter now.
211

212
00:18:11,920 --> 00:18:16,430
So looking down at our chart, we see the actual scores indicated by the blue dots,
212

213
00:18:16,800 --> 00:18:25,500
and now we just have to plot the predicted scores alongside these actual ones. And all these predicted
213

214
00:18:25,500 --> 00:18:28,800
scores are gonna be connected by a line.
214

215
00:18:28,800 --> 00:18:37,710
This is the line that we want to superimpose on our graph, so we can write "plt.plot" and then provide
215

216
00:18:38,810 --> 00:18:40,160
the line that we want to draw.
216

217
00:18:40,250 --> 00:18:50,640
It's gonna be the LSD tissue concentration on the x axis and then on the y axis it's gonna be
217

218
00:18:50,640 --> 00:18:52,580
our predicted scores, right?
218

219
00:18:56,140 --> 00:18:59,370
"predicted_score"
219

220
00:19:00,040 --> 00:19:01,800
Now let's hit Shift+Enter.
220

221
00:19:01,990 --> 00:19:03,220
And here we go.
221

222
00:19:03,220 --> 00:19:10,700
We've got our predicted values connected by a line superimposed upon our scatter plot.
222

223
00:19:10,960 --> 00:19:13,750
Of course we can style this line any way we want to.
223

224
00:19:13,900 --> 00:19:29,630
So I'm going to say " color = 'red' " and " linewidth = 3 ".
224

225
00:19:29,910 --> 00:19:35,810
Now we've got a chart with even more contrast between the blue data points and our red fitted regression
225

226
00:19:35,810 --> 00:19:37,210
line.
226

227
00:19:37,280 --> 00:19:43,260
So I think that concludes all the analysis that we're gonna do for our Python intro.
227

228
00:19:43,610 --> 00:19:45,870
And what have we learned from all this?
228

229
00:19:45,920 --> 00:19:48,200
Well, drugs are bad for you,
229

230
00:19:48,200 --> 00:19:52,760
boys and girls, especially if you're studying math tests.
230

231
00:19:52,760 --> 00:19:57,770
But the other thing that you'll notice is that if you look at the original paper and you look at the
231

232
00:19:57,770 --> 00:20:05,030
equation that the researchers have estimated we can actually see what their estimate was for the intercept
232

233
00:20:05,480 --> 00:20:09,470
and the coefficient that we've estimated as well.
233

234
00:20:09,530 --> 00:20:17,800
So they've got 89.7 for the intercept and -9.44 for
234

235
00:20:17,870 --> 00:20:27,350
the coefficient in their equation. In contrast, our coefficient is -9.0 and our intercept
235

236
00:20:27,470 --> 00:20:32,150
is 89.1, not 89.7.
236

237
00:20:32,180 --> 00:20:39,690
So we can see that we're not able to reproduce the researchers' output exactly.
237

238
00:20:39,780 --> 00:20:47,840
Now I suspect that's because the researchers and we are not working off exactly the same numbers. You
238

239
00:20:47,840 --> 00:20:52,310
see, they actually don't provide the information on parts per million
239

240
00:20:52,370 --> 00:20:56,660
and the math scores in the PDF that we're looking at.
240

241
00:20:56,660 --> 00:21:03,560
I actually had to hunt around the web to get these numbers and they might be slightly different from
241

242
00:21:03,740 --> 00:21:05,550
what's in the original paper.
242

243
00:21:05,840 --> 00:21:12,920
But, that said, I think that our results are so close that we can say that we've successfully reproduced
243

244
00:21:13,310 --> 00:21:16,360
the research that's in the paper there.
244

245
00:21:16,520 --> 00:21:21,850
Oh and, by the way, this is in no way relevant to the study at all.
245

246
00:21:21,950 --> 00:21:30,590
But many of the calculations that we just did the original authors ran on something called an IBM 360
246

247
00:21:30,590 --> 00:21:32,620
computer.
247

248
00:21:32,780 --> 00:21:39,020
The reason you've probably never heard of the IBM 360 is because you don't have one of these monstrosities
248

249
00:21:39,110 --> 00:21:41,420
sitting in your living room.
249

250
00:21:41,470 --> 00:21:42,040
Now,
250

251
00:21:42,470 --> 00:21:48,590
I find this so funny that the researchers actually mentioned this particular computer model in their
251

252
00:21:48,590 --> 00:21:56,870
actual paper and I can't figure out if it's maybe some 1968 humble brag about how high tech they are
252

253
00:21:57,290 --> 00:22:05,690
or if IBM actually paid them for this shout out. In any case, plugging the computer model that you've
253

254
00:22:05,690 --> 00:22:11,900
done your research on in your scientific paper has probably gone a little bit out of fashion these
254

255
00:22:11,900 --> 00:22:12,940
days.
255

256
00:22:13,770 --> 00:22:17,190
But yeah, I I did find this interesting.
256

257
00:22:17,190 --> 00:22:21,870
Now, if your reaction of me telling you about this just now was "Wait a minute,
257

258
00:22:22,010 --> 00:22:25,680
IBM made computers?". Then,
258

259
00:22:25,940 --> 00:22:34,520
I highly, highly recommend watching this documentary called Silicon Cowboys. Silicon Cowboys is a really,
259

260
00:22:34,520 --> 00:22:41,840
really fascinating film about a little startup called Compaq that battled it out with big blue in days
260

261
00:22:41,840 --> 00:22:43,310
gone by.
261

262
00:22:43,850 --> 00:22:48,220
So yeah watch it and I'll see you in the next lessons.
262

263
00:22:48,230 --> 00:22:48,670
Take care.