1 00:00:00,566 --> 00:00:01,000 All right. 2 00:00:01,000 --> 00:00:04,533 And now let's do the same for our dependent variable vector. 3 00:00:04,866 --> 00:00:06,900 And you know this will be exactly the same. 4 00:00:06,900 --> 00:00:08,666 We'll just have to change one little thing. 5 00:00:08,666 --> 00:00:12,733 So I'm going to copy this and paste it right here. 6 00:00:13,100 --> 00:00:16,166 And now according to you based on what I've just explained, 7 00:00:16,666 --> 00:00:21,233 what do we have to change here in order to get the dependent variable vector, 8 00:00:21,433 --> 00:00:24,566 which is most of the time in in our data set. 9 00:00:24,566 --> 00:00:27,133 Indeed, the last column. 10 00:00:27,133 --> 00:00:27,433 Okay. 11 00:00:27,433 --> 00:00:30,000 So let's see first okay. 12 00:00:30,000 --> 00:00:31,200 We have to take our data set 13 00:00:31,200 --> 00:00:34,200 because we want to extract this text column from our data set. 14 00:00:34,500 --> 00:00:38,933 Then we have to use I look to collect indexes of the rows and columns we want. 15 00:00:39,300 --> 00:00:42,766 Then we indeed want to take all the rows of the data set, 16 00:00:42,766 --> 00:00:47,100 because we want to take all the purchase decisions of these customers, 17 00:00:47,100 --> 00:00:50,566 whether or not they decided to purchase yes or no the product. 18 00:00:50,733 --> 00:00:52,500 So okay, we want all the rows. 19 00:00:52,500 --> 00:00:55,633 But then which comes do we want to get here? 20 00:00:55,633 --> 00:00:58,266 Well we only want to get the last column. 21 00:00:58,266 --> 00:01:01,166 So according to you what is index. 22 00:01:01,166 --> 00:01:04,166 We need to input here in order to get only the last column. 23 00:01:04,700 --> 00:01:08,033 Well this time since we only want to get one column, 24 00:01:08,033 --> 00:01:09,900 we definitely don't want to get a range. 25 00:01:09,900 --> 00:01:12,866 And therefore I'm going to remove the range here. 26 00:01:12,866 --> 00:01:14,700 And then what are we left here? 27 00:01:14,700 --> 00:01:16,600 We're left here with minus one. 28 00:01:16,600 --> 00:01:20,800 And as I've told you, minus one is exactly the index of the last column. 29 00:01:21,000 --> 00:01:22,000 So there we go. 30 00:01:22,000 --> 00:01:26,166 That's exactly what we need to create this dependent variable vector. 31 00:01:26,666 --> 00:01:28,700 And thus this line of code is done. 32 00:01:28,700 --> 00:01:30,000 Congratulations. 33 00:01:30,000 --> 00:01:32,433 Now you know how to import data set. 34 00:01:32,433 --> 00:01:36,066 Create a matrix of features and create a dependent variable vector. 35 00:01:36,300 --> 00:01:37,500 And the cherry on the cake 36 00:01:37,500 --> 00:01:40,766 is that any time we want to create these for your data sets, 37 00:01:40,966 --> 00:01:44,500 you won't have anything to change, because this will automatically take 38 00:01:44,700 --> 00:01:47,033 all the first columns for the matrix of features 39 00:01:47,033 --> 00:01:49,900 and the last column for the dependent variable vector. 40 00:01:49,900 --> 00:01:50,266 All right. 41 00:01:50,266 --> 00:01:54,066 So now I'm going to show you indeed that x and y will be well created. 42 00:01:54,200 --> 00:01:57,700 And in order to do this we're going to add a new code cell here 43 00:01:57,900 --> 00:02:00,433 inside to which we're just going to print. 44 00:02:00,433 --> 00:02:03,933 So that's the famous print function which allows you to print anything, 45 00:02:03,933 --> 00:02:08,700 whether it is a text or, you know, an array like x or vector like y. 46 00:02:08,966 --> 00:02:11,066 So we're going to first print x 47 00:02:11,066 --> 00:02:15,633 and then I'm going to add a new code cell here where we're going to print y. 48 00:02:15,633 --> 00:02:19,000 And this is just to show you that indeed x and y will 49 00:02:19,000 --> 00:02:22,200 well be created with this code okay. 50 00:02:22,266 --> 00:02:24,800 So let's do this. Now time for the fun part. 51 00:02:24,800 --> 00:02:27,000 We're going to execute all the cells here. 52 00:02:27,000 --> 00:02:30,000 Because you know so far we've just written the implementations. 53 00:02:30,000 --> 00:02:33,500 But we have to run the cells in order to build all this. 54 00:02:33,700 --> 00:02:37,566 So let's first run this code cell importing the libraries. 55 00:02:38,300 --> 00:02:39,600 All right. So import it. 56 00:02:39,600 --> 00:02:40,800 As you can see if I click here. 57 00:02:40,800 --> 00:02:43,800 Yes this one here means it is executed. 58 00:02:44,133 --> 00:02:45,600 Now time to run the second one. 59 00:02:45,600 --> 00:02:48,666 But before running this we have to do something very important. 60 00:02:48,900 --> 00:02:52,266 It is to upload this data set here in CSV 61 00:02:52,266 --> 00:02:55,400 format inside our Google Colab notebook. 62 00:02:55,400 --> 00:02:57,933 And to do this you just need to click files here. 63 00:02:57,933 --> 00:03:00,933 You know this little folder, then upload. 64 00:03:01,333 --> 00:03:05,066 Then you're going to go to this whole machine learning 65 00:03:05,066 --> 00:03:09,366 is that folder containing all the codes and data set and which was provided to you 66 00:03:09,366 --> 00:03:10,500 in the first section. 67 00:03:10,500 --> 00:03:12,866 And I will give it to you again in every sections 68 00:03:12,866 --> 00:03:15,066 that you make sure not to miss it. 69 00:03:15,066 --> 00:03:19,066 And inside this folder you're going to go now to import one data preprocessing. 70 00:03:19,200 --> 00:03:22,900 In order to get indeed that data CSV 71 00:03:22,900 --> 00:03:26,700 file containing the data set we are importing right now. 72 00:03:26,700 --> 00:03:31,733 So open and now the data set will be indeed in Google Colab your notebook. 73 00:03:31,933 --> 00:03:35,200 And so now we can run this cell to import it. 74 00:03:35,200 --> 00:03:37,200 And there we go. It is already import it. 75 00:03:37,200 --> 00:03:38,833 You know it is executed. 76 00:03:38,833 --> 00:03:41,666 So now we're going to execute this cell in order 77 00:03:41,666 --> 00:03:45,333 to print the matrix of features X, just to check that indeed 78 00:03:45,333 --> 00:03:48,633 we get all the first columns inside this matrix. 79 00:03:49,133 --> 00:03:52,833 And indeed well let's check the data set once again. 80 00:03:53,100 --> 00:03:56,766 Remember the first column is meaning the features we wanted to get into. 81 00:03:56,766 --> 00:04:02,233 This matrix X are first country, second age, and third to salary. 82 00:04:02,233 --> 00:04:03,500 These are the three columns. 83 00:04:03,500 --> 00:04:07,166 And indeed inside X we have first the country column 84 00:04:07,166 --> 00:04:10,166 with all the countries of these customers their age, 85 00:04:10,366 --> 00:04:14,133 and in the third column their salary or their estimated salary. 86 00:04:14,666 --> 00:04:15,666 So that's perfect. 87 00:04:15,666 --> 00:04:19,366 We get indeed the matrix of features X containing all the features 88 00:04:19,366 --> 00:04:22,366 or also called the independent variables. 89 00:04:22,466 --> 00:04:22,833 All right. 90 00:04:22,833 --> 00:04:26,300 And now let's run this cell to print y the dependent variable vector. 91 00:04:26,300 --> 00:04:30,666 And indeed it gets the dependent variable vector containing all the decisions 92 00:04:30,800 --> 00:04:33,966 whether or not the customers purchased the product. 93 00:04:33,966 --> 00:04:35,866 Right. We can check. No. Yes. No no 94 00:04:37,300 --> 00:04:38,333 no. Yes. 95 00:04:38,333 --> 00:04:40,633 No no okay. So that's in the same order. 96 00:04:40,633 --> 00:04:41,433 That's perfect. 97 00:04:41,433 --> 00:04:42,866 We now have our data 98 00:04:42,866 --> 00:04:46,333 set our matrix of features X and our dependent variable vector y. 99 00:04:46,633 --> 00:04:50,300 And lastly let me remind why we had to create these two entities. 100 00:04:50,633 --> 00:04:53,766 That's because the way we're going to build our future machine 101 00:04:53,766 --> 00:04:58,900 learning models expects exactly these two entities in their inputs. 102 00:04:59,266 --> 00:05:01,866 You know, we will use some classes to build these models. 103 00:05:01,866 --> 00:05:05,200 And these classes don't expect the data set as a whole, 104 00:05:05,300 --> 00:05:08,233 but these two separate entities, and that's 105 00:05:08,233 --> 00:05:11,533 the only reason why we had to create these two separate entities. 106 00:05:11,533 --> 00:05:15,600 So now you know, and therefore congratulations, not only you improve 107 00:05:15,600 --> 00:05:18,800 your knowledge of machine learning, but also you now know how to import 108 00:05:18,800 --> 00:05:22,500 a data set and create a matrix, a feature and a dependent variable vector. 109 00:05:23,033 --> 00:05:26,133 So now we're going to proceed to the next step which is a new tool 110 00:05:26,166 --> 00:05:27,566 which I'm going to teach you. 111 00:05:27,566 --> 00:05:30,133 And that is taking care of missing data. 112 00:05:30,133 --> 00:05:33,033 Indeed, as you can see the data set contains some missing data. 113 00:05:33,033 --> 00:05:35,966 Right here. You can see this is an empty cell. 114 00:05:35,966 --> 00:05:39,066 So I will teach you exactly how to handle that case, 115 00:05:39,300 --> 00:05:42,100 which happens most of the time in data sets. 116 00:05:42,100 --> 00:05:43,900 So let's do this in a next tutorial. 117 00:05:43,900 --> 00:05:45,833 And until then enjoy machine learning.