1 00:00:00,200 --> 00:00:02,300 Hello and welcome to this tutorial. 2 00:00:02,300 --> 00:00:05,300 In this one we're going to learn how to import the data set. 3 00:00:05,833 --> 00:00:07,733 Okay so here I am on R. 4 00:00:07,733 --> 00:00:11,100 And as for Python we need to set the working directory. 5 00:00:11,700 --> 00:00:14,700 So to do this we have to go to files here. 6 00:00:14,733 --> 00:00:16,266 Here I'm on my desktop. 7 00:00:16,266 --> 00:00:19,366 So let's go to our folder data pre-processing part one. 8 00:00:19,700 --> 00:00:21,500 Here it is section two. 9 00:00:21,500 --> 00:00:24,900 And this is the right folder that contains the data set data dot CSV. 10 00:00:25,133 --> 00:00:28,500 So this is the folder that you want to choose to set as working directory. 11 00:00:29,366 --> 00:00:33,600 And now to set this working directory on all you need to do one more thing. 12 00:00:33,600 --> 00:00:38,666 You need to click on this more button here and click on set as Working directory. 13 00:00:39,166 --> 00:00:39,900 And here it is. 14 00:00:39,900 --> 00:00:42,600 Now we know that this is the right working directory. 15 00:00:42,600 --> 00:00:45,766 And we are ready to start importing the data set. 16 00:00:46,866 --> 00:00:47,233 Okay. 17 00:00:47,233 --> 00:00:50,100 So to do this in R what we need to do is very simple. 18 00:00:50,100 --> 00:00:52,200 We will just need one line of code. 19 00:00:52,200 --> 00:00:55,200 So as in Python we're going to call it data set. 20 00:00:55,733 --> 00:00:58,833 So that's the variable that will be the data set itself. 21 00:00:59,666 --> 00:01:01,700 And now to import it data set is very easy. 22 00:01:01,700 --> 00:01:05,266 You need to type read dot csv. 23 00:01:05,966 --> 00:01:09,600 And then in parenthesis you just type the name of your data set in quotes. 24 00:01:10,266 --> 00:01:15,000 So here you have to type data dot CSV. 25 00:01:15,500 --> 00:01:17,866 And that's it. That's all you need to do. 26 00:01:17,866 --> 00:01:21,600 So now we're going to select this line of code and execute. 27 00:01:22,366 --> 00:01:22,800 Okay. 28 00:01:22,800 --> 00:01:24,466 So let's have a look at our data set. 29 00:01:24,466 --> 00:01:27,600 To do this you just need to click here on the data set. 30 00:01:27,600 --> 00:01:29,633 And it just displays here. 31 00:01:29,633 --> 00:01:32,533 Okay so we have our four columns country age 32 00:01:32,533 --> 00:01:35,533 salary purchased and our ten observations. 33 00:01:35,866 --> 00:01:38,733 And what's interesting to see here is that unlike 34 00:01:38,733 --> 00:01:41,733 Python indexes don't start at zero but at one. 35 00:01:41,933 --> 00:01:45,533 So that's why the first observation started one here and added ten. 36 00:01:45,966 --> 00:01:50,333 So that's the second distinction you have to understand between Python and R. 37 00:01:50,633 --> 00:01:53,033 Of course you don't have to program on both. 38 00:01:53,033 --> 00:01:54,666 You can choose the one you prefer, 39 00:01:54,666 --> 00:01:57,666 but if you want to program on both, as I usually do. 40 00:01:57,900 --> 00:02:00,466 It's good to have this distinction in mind. 41 00:02:00,466 --> 00:02:02,000 Okay, so we have our data set. 42 00:02:02,000 --> 00:02:02,700 It's all fine. 43 00:02:02,700 --> 00:02:07,333 And as I just said here, we don't have to make the distinction 44 00:02:07,333 --> 00:02:11,233 between a matrix of features and a dependent variable vector. 45 00:02:11,366 --> 00:02:15,566 And this will make perfect sense for you as we go along with this. 46 00:02:15,733 --> 00:02:17,533 Part one Data pre-processing. 47 00:02:17,533 --> 00:02:20,400 You'll perfectly understand why in the next tutorials. 48 00:02:20,400 --> 00:02:23,733 Okay, so that's it for, importing the data 49 00:02:23,733 --> 00:02:27,066 set step of the data pre-processing phase. 50 00:02:27,633 --> 00:02:28,933 So that's it for this tutorial. 51 00:02:28,933 --> 00:02:30,966 I look forward to seeing you in the next one, 52 00:02:30,966 --> 00:02:34,433 which will be to learn how to take care of missing data, 53 00:02:34,766 --> 00:02:37,366 because sometimes your data set will contain missing data. 54 00:02:37,366 --> 00:02:38,700 And you have to take care of this. 55 00:02:38,700 --> 00:02:41,266 And that's what we're going to learn in the next tutorial. 56 00:02:41,266 --> 00:02:42,666 So I look forward to seeing you there. 57 00:02:42,666 --> 00:02:44,466 And until then, enjoy machine learning.