1 00:00:00,133 --> 00:00:02,566 Hello and welcome to this art tutorial. 2 00:00:02,566 --> 00:00:04,633 So now it's time for the fun part. 3 00:00:04,633 --> 00:00:08,166 I can't wait to show you the training set results and the test set results. 4 00:00:08,633 --> 00:00:11,633 So let's start to build the graphs. 5 00:00:12,266 --> 00:00:16,166 So the first thing we're going to do is to import the ggplot2 6 00:00:16,166 --> 00:00:19,166 library, which is a very good way of plotting something in R. 7 00:00:19,500 --> 00:00:21,466 So I'm going to import it. 8 00:00:21,466 --> 00:00:24,466 But before importing this library there might be some of you 9 00:00:24,500 --> 00:00:26,166 who are starting R for the first time. 10 00:00:26,166 --> 00:00:29,633 So if we go to our packages here, you can see that 11 00:00:29,700 --> 00:00:32,633 the ggplot2 library is here on my RStudio. 12 00:00:32,633 --> 00:00:34,633 But it might not be the case for you because 13 00:00:34,633 --> 00:00:38,633 and that would mean that the ggplot2 library is not installed on your R. 14 00:00:38,966 --> 00:00:42,666 So just for those of you for which it's the case, I'm going to install 15 00:00:44,000 --> 00:00:45,500 ggplot2 library. 16 00:00:45,500 --> 00:00:48,633 So we have to use this command install packages here. 17 00:00:49,166 --> 00:00:53,033 And then in parenthesis we add ggplot2 18 00:00:53,600 --> 00:00:57,300 g plot two. 19 00:00:57,600 --> 00:01:00,766 Then we select this press command and control plus enter to execute. 20 00:01:01,633 --> 00:01:04,633 And here and as you can see it's installing the library. 21 00:01:04,833 --> 00:01:06,100 And that's done. 22 00:01:06,100 --> 00:01:10,866 So that's the message you should be obtaining when you install the package. 23 00:01:11,466 --> 00:01:13,700 Now it should be appearing in your library. 24 00:01:13,700 --> 00:01:17,666 So mine is here and we are ready to start plotting the graph. 25 00:01:18,600 --> 00:01:21,100 I'm just going to put that in comment. 26 00:01:21,100 --> 00:01:23,200 So command shift plus C. 27 00:01:23,200 --> 00:01:23,866 Now it's incoming 28 00:01:23,866 --> 00:01:26,866 because once it's installed we don't need to install it again. 29 00:01:27,666 --> 00:01:27,966 Okay. 30 00:01:27,966 --> 00:01:32,100 So now let's start to plot our training set results graph. 31 00:01:32,700 --> 00:01:36,333 So okay so first thing we need to do as you can see the ggplot 32 00:01:36,333 --> 00:01:38,266 two library is not selected here. 33 00:01:38,266 --> 00:01:39,933 So we will need to select it. 34 00:01:39,933 --> 00:01:43,633 But we want to automate all this if we want to use simple linear 35 00:01:43,633 --> 00:01:45,300 regression on another data set. 36 00:01:45,300 --> 00:01:47,000 So we will just type here library. 37 00:01:48,366 --> 00:01:50,666 Then in parentheses the name of the library. 38 00:01:50,666 --> 00:01:53,666 Not in quotes actually did ggplot2. 39 00:01:53,800 --> 00:01:55,833 And as you can see if I select this line 40 00:01:55,833 --> 00:01:59,466 and execute here, you see that it's not selected. 41 00:01:59,466 --> 00:02:04,133 And if I execute it selected here we just have a suggestion 42 00:02:04,133 --> 00:02:08,033 to type suppress package started messages to eliminate package started messages. 43 00:02:08,533 --> 00:02:10,000 Well that's not very important. 44 00:02:10,000 --> 00:02:13,000 You can execute that if you don't want the package started messages. 45 00:02:13,400 --> 00:02:14,866 But that's okay. 46 00:02:14,866 --> 00:02:17,866 So I'm just going to enlarge this 47 00:02:18,200 --> 00:02:18,566 okay. 48 00:02:18,566 --> 00:02:22,500 And now like for this course the approach we will be taking to build 49 00:02:22,500 --> 00:02:27,366 this graph with ggplot2 is going to be a step by step approach, because we are 50 00:02:27,366 --> 00:02:31,166 first going to plot all the observation points in the training set. 51 00:02:31,633 --> 00:02:34,633 Then we're going to plot the regression line. 52 00:02:34,633 --> 00:02:38,266 And then we're going to add a title and then a label to the x axis 53 00:02:38,266 --> 00:02:39,633 and a label to the y axis. 54 00:02:39,633 --> 00:02:41,500 So you know it's going to be step by step. 55 00:02:41,500 --> 00:02:43,500 First the observation points, 56 00:02:43,500 --> 00:02:46,600 then the regression line, then the title, then the labels. 57 00:02:46,966 --> 00:02:47,300 Okay. 58 00:02:47,300 --> 00:02:50,033 So let's start with the observation points. 59 00:02:50,033 --> 00:02:51,233 So okay. 60 00:02:51,233 --> 00:02:54,966 So generally to introduce a plot in ggplot we start by typing 61 00:02:54,966 --> 00:02:57,966 ggplot parenthesis. 62 00:02:58,266 --> 00:02:58,933 All right. 63 00:02:58,933 --> 00:02:59,466 And then that's 64 00:02:59,466 --> 00:03:02,966 when we start to build step by step the little components of our graph. 65 00:03:03,266 --> 00:03:06,000 And since we're adding each component one by one 66 00:03:06,000 --> 00:03:09,000 we are going to separate each of these components by plus. 67 00:03:09,266 --> 00:03:13,766 So let's go for the first component plus here I will press enter. 68 00:03:14,100 --> 00:03:16,566 And here we go for the first component. 69 00:03:16,566 --> 00:03:19,333 So the first component are the observation points. 70 00:03:19,333 --> 00:03:22,733 So very intuitively we are going to write geom 71 00:03:23,700 --> 00:03:26,400 underscore point. 72 00:03:26,400 --> 00:03:27,300 All right. 73 00:03:27,300 --> 00:03:30,300 So we are going to use this function of ggplot 74 00:03:30,500 --> 00:03:34,566 to scatter plot all our observation points of the training set. 75 00:03:35,066 --> 00:03:38,333 So the first thing that we need to do is to precise 76 00:03:38,333 --> 00:03:41,666 which are going to be the x axis and the y axis. 77 00:03:41,666 --> 00:03:46,766 And that's in the esthetic function which we call x. 78 00:03:47,233 --> 00:03:51,933 So that's the function that will take as input the x variables and the y variable. 79 00:03:51,966 --> 00:03:54,600 So the x variables is our years experience variable. 80 00:03:54,600 --> 00:03:56,400 That means the number of years of experience. 81 00:03:56,400 --> 00:03:58,566 That's what we want to have in the x axis. 82 00:03:58,566 --> 00:04:01,766 And the y variable will be our salaries but the real ones 83 00:04:01,933 --> 00:04:03,866 not the predicted ones. Yet. 84 00:04:03,866 --> 00:04:07,466 So therefore we are going to add in parentheses x 85 00:04:08,433 --> 00:04:09,366 equals. 86 00:04:09,366 --> 00:04:12,366 And then we're going to put all our years of experience. 87 00:04:12,600 --> 00:04:13,800 But be careful. 88 00:04:13,800 --> 00:04:17,366 We want to take the years of experience of the observations in the training set. 89 00:04:17,766 --> 00:04:22,433 So here we'll just have to specify training set dollar 90 00:04:22,800 --> 00:04:24,700 and then years experience. 91 00:04:24,700 --> 00:04:28,500 So that means that we are taking the years experience values 92 00:04:29,100 --> 00:04:32,100 of the observations in the training set okay. 93 00:04:32,566 --> 00:04:35,466 And now same for Y. 94 00:04:35,466 --> 00:04:40,000 And here we do the same only with the salaries 95 00:04:40,866 --> 00:04:43,166 which is here okay. 96 00:04:43,166 --> 00:04:46,166 And that plots all your observation points. 97 00:04:46,233 --> 00:04:49,600 The real ones in the plot okay. 98 00:04:49,600 --> 00:04:52,600 So we could stop here, but we would like to add a call 99 00:04:52,666 --> 00:04:56,500 because we will want to make the distinction between the observation 100 00:04:56,500 --> 00:05:00,133 points and the regression line to, you know, make a nice plot. 101 00:05:00,466 --> 00:05:03,533 So come here and then I'm going to press 102 00:05:03,533 --> 00:05:06,600 enter to add a new argument of geom point. 103 00:05:06,600 --> 00:05:09,133 Because this can be a little confusing here we have 104 00:05:09,133 --> 00:05:10,966 we have a coming here and coming here. 105 00:05:10,966 --> 00:05:14,500 This coming here is just to separate the x and the y in the esthetic function. 106 00:05:14,733 --> 00:05:16,433 And this coming here is to separate 107 00:05:16,433 --> 00:05:19,733 the first argument and the second argument of the June point function. 108 00:05:20,100 --> 00:05:21,833 So the second argument is color. 109 00:05:23,133 --> 00:05:25,566 And let's pick red. 110 00:05:25,566 --> 00:05:28,200 I usually like to have my observation points in red. 111 00:05:28,200 --> 00:05:30,100 But you know that's just my taste. 112 00:05:30,100 --> 00:05:33,000 If you have some other tastes, you're welcome to, choose 113 00:05:33,000 --> 00:05:36,000 your favorite color for the observation points. 114 00:05:36,600 --> 00:05:38,966 Okay, so that's it for the geom point. 115 00:05:38,966 --> 00:05:42,966 With this zoom point part, here we are plotting all the observation 116 00:05:42,966 --> 00:05:45,966 points of our training set.