0 1 00:00:00,630 --> 00:00:07,530 In a previous lesson we've already written this line of code "from sklearn.linear_model import 1 2 00:00:07,530 --> 00:00:08,980 LinearRegression". 2 3 00:00:09,060 --> 00:00:14,610 In other words we've already imported our linear regression functionality into this Python intro notebook. 3 4 00:00:15,630 --> 00:00:16,590 For consistency, 4 5 00:00:16,620 --> 00:00:21,350 let's follow the same pattern that we employed when we were estimating our movie revenue. 5 6 00:00:21,450 --> 00:00:27,510 We're going to create a variable called "regr" and this variable is going to store our linear regression 6 7 00:00:27,510 --> 00:00:28,020 object. 7 8 00:00:31,280 --> 00:00:32,830 To run our regression, 8 9 00:00:32,840 --> 00:00:36,710 all we have to do is call the good old fit method. 9 10 00:00:36,710 --> 00:00:40,250 So we're gonna see regr.fit, 10 11 00:00:40,250 --> 00:00:47,840 open the parentheses and then for our explanatory or independent variable we're going to use the amount 11 12 00:00:47,840 --> 00:00:49,670 of drugs in the tissue. 12 13 00:00:49,670 --> 00:00:53,550 So we're gonna type LSD and then put a comma after it, 13 14 00:00:53,690 --> 00:01:01,070 and now we can add our dependent variable or the values that we're going to try to predict. 14 15 00:01:01,070 --> 00:01:07,040 In this case this is the score. And that's it. 15 16 00:01:07,070 --> 00:01:13,430 Now if you remember previously our explanatory variable was called capital X and our dependent variable 16 17 00:01:13,430 --> 00:01:17,270 was called lowercase y. 17 18 00:01:17,270 --> 00:01:24,570 Now remember our scikit-learn's fit method essentially computes the parameters of this equation. 18 19 00:01:24,650 --> 00:01:29,870 We've got our theta zero which is our intercept and we've got our theta one which is our coefficient 19 20 00:01:29,990 --> 00:01:33,110 in front of our explanatory variable. 20 21 00:01:33,170 --> 00:01:37,380 Let's see what happens now when we try to run our regression. 21 22 00:01:37,910 --> 00:01:40,220 We in fact get an error. 22 23 00:01:40,370 --> 00:01:49,080 Looking down, we see that we get a value error: "Found input variables with inconsistent numbers of samples". 23 24 00:01:49,080 --> 00:01:50,520 Now this is odd, right? 24 25 00:01:50,520 --> 00:01:53,900 Why would there be an inconsistent number of samples? 25 26 00:01:53,940 --> 00:01:58,780 We've got seven rows in each of our two columns. 26 27 00:01:58,980 --> 00:02:03,660 We've got seven rows in LSD and we've got seven rows in the math score. 27 28 00:02:03,660 --> 00:02:06,810 So why would there be an inconsistent number of samples? 28 29 00:02:06,890 --> 00:02:11,030 Now, even though this error message seems a little bit like a red herring, 29 30 00:02:11,280 --> 00:02:16,350 the reason that we're getting this problem actually has to do with the fact that we are working with 30 31 00:02:16,350 --> 00:02:24,690 series and not data frames. We're getting this error because this notation here data[], 31 32 00:02:25,230 --> 00:02:33,150 and then the column name actually extracts an object of type series instead of an object of type data 32 33 00:02:33,150 --> 00:02:34,360 frame. 33 34 00:02:34,350 --> 00:02:40,470 Now, if you recall, the way to get a data frame from another data frame is to double up on the square 34 35 00:02:40,500 --> 00:02:41,520 brackets. 35 36 00:02:41,550 --> 00:02:51,270 So if we add an additional pair of square brackets, this notation now will extract an object of type 36 37 00:02:51,480 --> 00:02:55,770 data frame and store it in LSD and in score. 37 38 00:02:55,800 --> 00:03:03,600 So if I press Shift+Enter now, we are now no longer working with series we are working with data frames. 38 39 00:03:04,290 --> 00:03:09,750 Going back down to where we're fitting our regression I can press Shift+Enter again to rerun this line 39 40 00:03:09,750 --> 00:03:15,270 of code and we can see that our regression runs without a problem. 40 41 00:03:15,270 --> 00:03:17,190 Now you don't have to take my word for it 41 42 00:03:17,220 --> 00:03:23,550 regarding the change in types - you can double check this yourself. So you can always add a cell below 42 43 00:03:23,940 --> 00:03:32,820 and check the types, so if I say type(LSD), then I see it is a data frame and if I take away the square brackets, 43 44 00:03:33,060 --> 00:03:40,380 press Shift+Enter and rerun this again, I can see that it is a series in this case and this is what we've 44 45 00:03:40,710 --> 00:03:44,600 talked about when working with data frames in the previous lesson. 45 46 00:03:44,640 --> 00:03:50,340 So now that we've successfully fitted our regression, let's take a look at the values of this theta one 46 47 00:03:50,460 --> 00:03:52,650 and this theta zero parameter. 47 48 00:03:52,650 --> 00:03:58,890 If you remember the way we did this in the past it was looking at the attribute of our regression object - 48 49 00:03:59,370 --> 00:04:03,480 the attribute in question was called "coef" with an underscore. 49 50 00:04:03,810 --> 00:04:11,000 So let's add that here "regr.coef_" and let's hit Shift+Enter, and see what happens. 50 51 00:04:11,160 --> 00:04:15,610 Our Jupyter notebook will print out array, then parentheses, 51 52 00:04:15,630 --> 00:04:22,840 then one set of square brackets, two set of square brackets and then the value of our theta one parameter. 52 53 00:04:24,010 --> 00:04:28,970 So we can see that our coefficient is stored inside an array. 53 54 00:04:29,110 --> 00:04:31,420 It's an array of one element. 54 55 00:04:31,480 --> 00:04:32,710 Let's pick that element out. 55 56 00:04:32,740 --> 00:04:38,500 So I'm going to say [0] to access the first element in the array. 56 57 00:04:38,620 --> 00:04:41,860 Let's see what happens when I press Shift+Enter now. 57 58 00:04:41,900 --> 00:04:42,560 Huh. 58 59 00:04:44,080 --> 00:04:50,680 In this case one of the square brackets has disappeared, but we're still left with an array. 59 60 00:04:50,680 --> 00:04:55,130 We're not yet able to access the number inside directly. 60 61 00:04:55,180 --> 00:04:59,400 We're still getting a collection containing just one element. 61 62 00:04:59,710 --> 00:05:02,350 So you might ask, did this just work? 62 63 00:05:02,350 --> 00:05:06,860 We access the first element of our array and we still got an array. 63 64 00:05:06,940 --> 00:05:09,210 It seems like a bug, right? 64 65 00:05:09,310 --> 00:05:13,810 Well the answer is is that we have to go one level deeper to get the raw value. 65 66 00:05:13,840 --> 00:05:17,080 You've probably noticed that there was one square bracket less, right? 66 67 00:05:17,080 --> 00:05:24,220 So if I don't have this at the end, I have two square brackets but if I do access an element inside my 67 68 00:05:24,220 --> 00:05:26,490 array I have one square bracket. 68 69 00:05:27,400 --> 00:05:30,640 And the reason we get this is that we have to go two levels deep. 69 70 00:05:30,640 --> 00:05:34,910 We actually have an array of arrays. 70 71 00:05:35,140 --> 00:05:42,480 Mind blown, right? Our coefficient is buried inside an array that's inside another array. 71 72 00:05:42,490 --> 00:05:44,380 So how do we access an array inside an array? 72 73 00:05:44,650 --> 00:05:50,980 Well, we can access the first element which is the array and then we can access the first element of 73 74 00:05:50,980 --> 00:05:53,870 that array again to get the raw value. 74 75 00:05:54,980 --> 00:05:59,860 And this is how you would access a particular value of a nested array. 75 76 00:05:59,920 --> 00:06:01,500 Let me add this to a print statement. 76 77 00:06:01,510 --> 00:06:15,840 So I'm going to say "print", and just say "Theta 1 : "comma and then close our brackets here. 77 78 00:06:15,840 --> 00:06:18,720 So I'm going to add this to a print statement like this. 78 79 00:06:18,730 --> 00:06:24,340 Now let's take a look at our intercept - our intercept was the intercept_ attribute from 79 80 00:06:24,370 --> 00:06:32,440 our regression object. So I can say "regr.intercept_" and press Shift+Enter and we see that 80 81 00:06:32,890 --> 00:06:37,530 our intercept is also inside a collection that's also inside of an array. 81 82 00:06:37,840 --> 00:06:45,430 But there's only one set of square brackets here, so we can access the raw value inside just having that 82 83 00:06:45,520 --> 00:06:50,560 [0] following the name of the attribute. 83 84 00:06:50,560 --> 00:06:59,370 I hit Shift+Enter we see the raw value printed out. And again I can wrap this inside a print statement 84 85 00:07:01,130 --> 00:07:11,560 'Intercept: ', regr.intercept_intercept[0], and close the parentheses at the end. 85 86 00:07:11,830 --> 00:07:12,280 There we go. 86 87 00:07:12,280 --> 00:07:13,750 So here's our intercept. 87 88 00:07:13,750 --> 00:07:17,500 Now what about the goodness of fit or our R squared. 88 89 00:07:17,500 --> 00:07:23,560 To find out how much of the variation in our data is explained by the amount of drugs in the volunteers 89 90 00:07:23,560 --> 00:07:24,550 tissue, 90 91 00:07:24,550 --> 00:07:28,050 we call the score method on our regression. 91 92 00:07:28,150 --> 00:07:33,160 So we type "regr.score" and then we have to provide two values. 92 93 00:07:33,310 --> 00:07:41,740 One is our explanatory variable and the other one is our dependent variable - which was score. So 93 94 00:07:41,760 --> 00:07:45,200 regr.score(LSD, score) 94 95 00:07:45,480 --> 00:07:53,190 We're going to print this out, and we see that our R squared is approximately 0.88. Let's wrap this inside a print 95 96 00:07:53,190 --> 00:08:05,270 statement as well - " 'R-Square: ', " - there we go. 96 97 00:08:05,270 --> 00:08:11,330 So in this cell we fitted our regression, so we've run our machine learning model and we're printing 97 98 00:08:11,330 --> 00:08:18,470 out a couple of stats about our regression. A couple of the statistics that describe 98 99 00:08:18,680 --> 00:08:25,130 what went on with the calculation. One of them is the coefficient, another one is the intercept of our 99 100 00:08:25,130 --> 00:08:30,120 line and another one is the R-squared or the goodness of fit. 100 101 00:08:30,320 --> 00:08:36,440 So we've got some basic information about our regression and we see that the amount of drugs in the 101 102 00:08:36,440 --> 00:08:44,120 contestants tissue explains close to 88% of the math test performance and we also see 102 103 00:08:44,120 --> 00:08:51,650 that for every increase in LSD parts per million, our volunteers math performance was approximately 9 103 104 00:08:51,650 --> 00:08:57,420 percent worse than the control - this is what the theta one is telling us. 104 105 00:08:58,070 --> 00:09:03,770 Now even though this is all very well and good, it'd be really nice to represent this graphically because 105 106 00:09:03,950 --> 00:09:10,880 we like pictures - pictures are very very important for making sense of data so let's create another plot. 106 107 00:09:11,610 --> 00:09:13,760 I'm going to do this in the cell below. 107 108 00:09:13,790 --> 00:09:20,150 Now one thing that you've already seen a little bit in the Python code is that when creating nice looking 108 109 00:09:20,150 --> 00:09:25,230 graphs it's a two part process. In the first part, 109 110 00:09:25,250 --> 00:09:30,290 we do all the styling and in the second part we plot the data and show it off. 110 111 00:09:30,290 --> 00:09:32,880 So what I'm going to do is I'm going to do the second part first. 111 112 00:09:32,920 --> 00:09:37,160 I'm going to plot the data and then I'm going to add my styling code later on. 112 113 00:09:37,160 --> 00:09:46,460 So plotting the data as it is I can write plt.scatter and then provide the inputs to our scatter 113 114 00:09:46,460 --> 00:09:54,010 plot and that's going to be the LSD parts per million, comma, and then the score. 114 115 00:09:54,050 --> 00:09:55,390 These are the math scores. 115 116 00:09:55,490 --> 00:09:57,950 So let's see what this plot looks like 116 117 00:09:57,950 --> 00:10:03,870 by adding by adding plt.show() beneath. 117 118 00:10:04,040 --> 00:10:04,700 Here we go. 118 119 00:10:04,700 --> 00:10:09,800 This is what our plot looks like before we've done any styling on it. 119 120 00:10:09,890 --> 00:10:12,080 Now I think this chart looks, 120 121 00:10:12,440 --> 00:10:19,540 I think this looks super ugly actually so we're going to have to do something about this. For starters 121 122 00:10:19,630 --> 00:10:23,410 let's add some arguments by keyword to this plot. 122 123 00:10:23,410 --> 00:10:29,680 So in your scatter method you're going to put a comma at the end after score and then write 123 124 00:10:29,690 --> 00:10:33,600 color = 'blue' 124 125 00:10:33,790 --> 00:10:40,670 Let's hit Shift+Enter to see what it looks like. Now we've got our data points in blue. 125 126 00:10:40,790 --> 00:10:44,920 So this is a slight improvement to the black and white version. 126 127 00:10:44,930 --> 00:10:46,880 Now we don't have many dots on here. 127 128 00:10:46,880 --> 00:10:48,930 We don't have many, many data points. 128 129 00:10:49,100 --> 00:10:55,860 So let's increase the size of these individual dots on our chart. 129 130 00:10:56,240 --> 00:10:59,920 And I want to leave this to you as a challenge. 130 131 00:11:00,020 --> 00:11:06,640 So I've got the documentation of the scatter method up in front of you right now. 131 132 00:11:06,890 --> 00:11:13,280 And what I would like you to do is I'd like you to look at this documentation and see if you can figure 132 133 00:11:13,280 --> 00:11:17,930 out how to increase the size of these data points 133 134 00:11:18,140 --> 00:11:25,100 and also maybe add some transparency - in other words, instead of having it a solid blue color make those 134 135 00:11:25,100 --> 00:11:28,290 blue dots slightly transparent. 135 136 00:11:28,430 --> 00:11:31,330 I'll give you a few seconds to pause the video. 136 137 00:11:31,520 --> 00:11:39,400 The hint I'll give you is that it's going to be in the keyword arguments of the scatter function. And, 137 138 00:11:39,400 --> 00:11:41,020 here's the solution. 138 139 00:11:41,080 --> 00:11:45,530 So we wanted to increase the size of our dots. 139 140 00:11:45,700 --> 00:11:49,790 So the way to do this is to look at these keyword arguments. 140 141 00:11:49,810 --> 00:12:00,040 So, for example, "s" is the size in points of our dots and the transparency is this alpha value here. 141 142 00:12:00,370 --> 00:12:08,220 The alpha value will be between zero which is transparent and one which is opaque. Coming back to our 142 143 00:12:08,220 --> 00:12:09,100 Python code, 143 144 00:12:09,150 --> 00:12:13,200 we can add these key word arguments to our scatter method. 144 145 00:12:13,260 --> 00:12:23,070 So after 'blue', I'm going to add a comma and then I'm going to see "s=" and let's experiment here 145 146 00:12:23,070 --> 00:12:23,900 a little bit. 146 147 00:12:23,940 --> 00:12:28,390 So what happens if I say "s = 500" and hit Shift+Enter? 147 148 00:12:28,640 --> 00:12:36,180 I get enormous blue dots. This actually doesn't look half bad but I think I'm going to go with something 148 149 00:12:36,180 --> 00:12:44,440 like, maybe 100 is the right value here. So that's the size of our data points covered. 149 150 00:12:44,870 --> 00:12:46,640 Let's change our transparency now. 150 151 00:12:46,820 --> 00:12:50,050 This was in the alpha parameter. So I'm going to say 151 152 00:12:50,060 --> 00:12:52,640 "alpha = ", I don't know, 152 153 00:12:52,760 --> 00:12:58,470 0.7 - it's going to be a value between 0 and 1, remember? 153 154 00:12:58,550 --> 00:13:06,680 So hitting Shift+Enter, I get a nice little bit of transparency here on my data points which are now 154 155 00:13:06,680 --> 00:13:08,960 little bit larger so we can actually tell what's going on. 155 156 00:13:09,980 --> 00:13:11,570 OK, so I'm going to leave it at that. 156 157 00:13:11,660 --> 00:13:19,370 And now I'm going to add some labels to our chart and make it look a little nicer. Since we've done this 157 158 00:13:19,370 --> 00:13:19,790 before, 158 159 00:13:19,790 --> 00:13:23,870 I'm going to leave this to you as a challenge so you can return and remember the Python code that you 159 160 00:13:23,870 --> 00:13:24,580 wrote. 160 161 00:13:24,770 --> 00:13:33,260 Can you set the title of the plot as a whole as "Arithmetic vs LSD-25" and then add some labels on 161 162 00:13:33,260 --> 00:13:40,070 the side - one for the x axis that reads "Tissue LSD ppm" and one for the y axis that reads 162 163 00:13:40,070 --> 00:13:40,990 "Performance Score"? 163 164 00:13:43,740 --> 00:13:45,720 And here's the solution. 164 165 00:13:45,720 --> 00:13:53,490 So we take our plotting object, put a dot after it and write "title()", and then provide the string 165 166 00:13:54,400 --> 00:14:08,010 "Arithmetic vs LSD 25" and then we do the same for the labels on the x axis and y axis, so plt.xlabel( 166 167 00:14:08,010 --> 00:14:17,490 "Tissue LSD ppm") and plt.ylabel( 167 168 00:14:21,960 --> 00:14:26,460 "Performance Score"). 168 169 00:14:26,460 --> 00:14:27,520 There we go. 169 170 00:14:27,510 --> 00:14:34,410 Let's hit Shift+Enter and take a look at what this looks like and we see that may be the thing to do is to increase 170 171 00:14:34,590 --> 00:14:38,470 the font size a little bit on these three labels. 171 172 00:14:38,580 --> 00:14:46,140 I think in our previous chart 17 for the title and 14 for the labels worked really well. So I'm going to say 172 173 00:14:46,140 --> 00:14:57,110 "fontsize=17" for the title, and then I'm going to add another keyword argument to our X labels and Y labels 173 174 00:14:57,380 --> 00:15:05,730 "fontsize = 14" and "fontsize = 14" again. 174 175 00:15:06,090 --> 00:15:07,650 So let's take a look. 175 176 00:15:07,800 --> 00:15:10,240 That's starting to look pretty good. 176 177 00:15:10,350 --> 00:15:13,000 Now to round things off a little bit, 177 178 00:15:13,260 --> 00:15:16,060 we can try again setting a limit on the range. 178 179 00:15:16,080 --> 00:15:17,740 So ylim 179 180 00:15:19,160 --> 00:15:34,850 is gonna be between 25 and 85, 25 and xlim is gonna be between maybe 1 and 6.5 180 181 00:15:34,850 --> 00:15:38,710 Yeah. 181 182 00:15:38,800 --> 00:15:40,600 Doesn't need to go all the way to seven, 182 183 00:15:40,600 --> 00:15:46,710 I reckon. And for the style maybe "plt.style.use" 183 184 00:15:46,810 --> 00:15:50,890 then we can choose our good old friend 184 185 00:15:51,160 --> 00:15:59,480 'fivethirtyeight'. Let's hit Shift+Enter and to apply the changes. 185 186 00:15:59,690 --> 00:16:02,130 I don't think that worked. 186 187 00:16:02,150 --> 00:16:03,150 Let's try again. 187 188 00:16:03,350 --> 00:16:04,370 Okay, so here we go. 188 189 00:16:04,400 --> 00:16:11,240 This is how it would look like with our styling as it is currently. 189 190 00:16:11,540 --> 00:16:14,420 We've got our range set. 190 191 00:16:14,670 --> 00:16:21,250 We've got our colors set and we've got the font size set as well. 191 192 00:16:21,270 --> 00:16:29,520 At the very top of the cell I'm going to add again this little percentage sign and write 'matplotlib 192 193 00:16:30,150 --> 00:16:31,470 inline'. 193 194 00:16:31,470 --> 00:16:41,900 And what this does is it tells Jupyter notebook to export this graph as it is when we say File > Download 194 195 00:16:41,900 --> 00:16:44,120 as > Notebook. 195 196 00:16:44,120 --> 00:16:50,790 So there's really only one thing left to do which is plotting our regression line on here. 196 197 00:16:51,200 --> 00:16:58,070 Because at the moment we've got our data points we've got our chart nicely formatted and looking good, 197 198 00:16:58,250 --> 00:17:06,350 all we have to do now is plot our predictions from our machine learning model on here. So our machine 198 199 00:17:06,350 --> 00:17:16,340 learning model will have a prediction for every level of LSD tissue concentration in the data set. To 199 200 00:17:16,340 --> 00:17:17,720 get hold of these predictions, 200 201 00:17:17,720 --> 00:17:22,990 we use a method called predict so we would write "regr. 201 202 00:17:23,240 --> 00:17:26,440 predict", 202 203 00:17:26,790 --> 00:17:31,410 And as a parameter here, as an argument here, 203 204 00:17:31,410 --> 00:17:35,790 we would supply the LSD tissue concentration. 204 205 00:17:35,790 --> 00:17:43,790 So this predicts a math score based on the amount of drugs in the tissue. 205 206 00:17:43,800 --> 00:17:46,710 Now we'll want to store that information somewhere. 206 207 00:17:46,710 --> 00:17:53,940 So I'm going to create a variable called "predicted_score" and set it equal to the output 207 208 00:17:54,540 --> 00:17:57,770 from this method right here. 208 209 00:17:57,780 --> 00:18:02,570 Now remember, you've got a press Shift+Enter to actually run this cell. 209 210 00:18:02,760 --> 00:18:07,170 Otherwise the cells below won't know about this code that we've just written. 210 211 00:18:07,370 --> 00:18:09,120 So I'm going to hit Shift+Enter now. 211 212 00:18:11,920 --> 00:18:16,430 So looking down at our chart, we see the actual scores indicated by the blue dots, 212 213 00:18:16,800 --> 00:18:25,500 and now we just have to plot the predicted scores alongside these actual ones. And all these predicted 213 214 00:18:25,500 --> 00:18:28,800 scores are gonna be connected by a line. 214 215 00:18:28,800 --> 00:18:37,710 This is the line that we want to superimpose on our graph, so we can write "plt.plot" and then provide 215 216 00:18:38,810 --> 00:18:40,160 the line that we want to draw. 216 217 00:18:40,250 --> 00:18:50,640 It's gonna be the LSD tissue concentration on the x axis and then on the y axis it's gonna be 217 218 00:18:50,640 --> 00:18:52,580 our predicted scores, right? 218 219 00:18:56,140 --> 00:18:59,370 "predicted_score" 219 220 00:19:00,040 --> 00:19:01,800 Now let's hit Shift+Enter. 220 221 00:19:01,990 --> 00:19:03,220 And here we go. 221 222 00:19:03,220 --> 00:19:10,700 We've got our predicted values connected by a line superimposed upon our scatter plot. 222 223 00:19:10,960 --> 00:19:13,750 Of course we can style this line any way we want to. 223 224 00:19:13,900 --> 00:19:29,630 So I'm going to say " color = 'red' " and " linewidth = 3 ". 224 225 00:19:29,910 --> 00:19:35,810 Now we've got a chart with even more contrast between the blue data points and our red fitted regression 225 226 00:19:35,810 --> 00:19:37,210 line. 226 227 00:19:37,280 --> 00:19:43,260 So I think that concludes all the analysis that we're gonna do for our Python intro. 227 228 00:19:43,610 --> 00:19:45,870 And what have we learned from all this? 228 229 00:19:45,920 --> 00:19:48,200 Well, drugs are bad for you, 229 230 00:19:48,200 --> 00:19:52,760 boys and girls, especially if you're studying math tests. 230 231 00:19:52,760 --> 00:19:57,770 But the other thing that you'll notice is that if you look at the original paper and you look at the 231 232 00:19:57,770 --> 00:20:05,030 equation that the researchers have estimated we can actually see what their estimate was for the intercept 232 233 00:20:05,480 --> 00:20:09,470 and the coefficient that we've estimated as well. 233 234 00:20:09,530 --> 00:20:17,800 So they've got 89.7 for the intercept and -9.44 for 234 235 00:20:17,870 --> 00:20:27,350 the coefficient in their equation. In contrast, our coefficient is -9.0 and our intercept 235 236 00:20:27,470 --> 00:20:32,150 is 89.1, not 89.7. 236 237 00:20:32,180 --> 00:20:39,690 So we can see that we're not able to reproduce the researchers' output exactly. 237 238 00:20:39,780 --> 00:20:47,840 Now I suspect that's because the researchers and we are not working off exactly the same numbers. You 238 239 00:20:47,840 --> 00:20:52,310 see, they actually don't provide the information on parts per million 239 240 00:20:52,370 --> 00:20:56,660 and the math scores in the PDF that we're looking at. 240 241 00:20:56,660 --> 00:21:03,560 I actually had to hunt around the web to get these numbers and they might be slightly different from 241 242 00:21:03,740 --> 00:21:05,550 what's in the original paper. 242 243 00:21:05,840 --> 00:21:12,920 But, that said, I think that our results are so close that we can say that we've successfully reproduced 243 244 00:21:13,310 --> 00:21:16,360 the research that's in the paper there. 244 245 00:21:16,520 --> 00:21:21,850 Oh and, by the way, this is in no way relevant to the study at all. 245 246 00:21:21,950 --> 00:21:30,590 But many of the calculations that we just did the original authors ran on something called an IBM 360 246 247 00:21:30,590 --> 00:21:32,620 computer. 247 248 00:21:32,780 --> 00:21:39,020 The reason you've probably never heard of the IBM 360 is because you don't have one of these monstrosities 248 249 00:21:39,110 --> 00:21:41,420 sitting in your living room. 249 250 00:21:41,470 --> 00:21:42,040 Now, 250 251 00:21:42,470 --> 00:21:48,590 I find this so funny that the researchers actually mentioned this particular computer model in their 251 252 00:21:48,590 --> 00:21:56,870 actual paper and I can't figure out if it's maybe some 1968 humble brag about how high tech they are 252 253 00:21:57,290 --> 00:22:05,690 or if IBM actually paid them for this shout out. In any case, plugging the computer model that you've 253 254 00:22:05,690 --> 00:22:11,900 done your research on in your scientific paper has probably gone a little bit out of fashion these 254 255 00:22:11,900 --> 00:22:12,940 days. 255 256 00:22:13,770 --> 00:22:17,190 But yeah, I I did find this interesting. 256 257 00:22:17,190 --> 00:22:21,870 Now, if your reaction of me telling you about this just now was "Wait a minute, 257 258 00:22:22,010 --> 00:22:25,680 IBM made computers?". Then, 258 259 00:22:25,940 --> 00:22:34,520 I highly, highly recommend watching this documentary called Silicon Cowboys. Silicon Cowboys is a really, 259 260 00:22:34,520 --> 00:22:41,840 really fascinating film about a little startup called Compaq that battled it out with big blue in days 260 261 00:22:41,840 --> 00:22:43,310 gone by. 261 262 00:22:43,850 --> 00:22:48,220 So yeah watch it and I'll see you in the next lessons. 262 263 00:22:48,230 --> 00:22:48,670 Take care.