1 00:00:00,366 --> 00:00:00,766 All right. 2 00:00:00,766 --> 00:00:03,300 So let's do this starting with X. 3 00:00:03,300 --> 00:00:05,433 So how are we going to create x. 4 00:00:05,433 --> 00:00:06,566 Basically that simple. 5 00:00:06,566 --> 00:00:10,833 We have our data sets you know containing exactly all this 6 00:00:10,833 --> 00:00:12,100 all these columns. 7 00:00:12,100 --> 00:00:15,800 And in order to create X well we simply have to take the three 8 00:00:15,800 --> 00:00:17,466 first columns of this data set. 9 00:00:17,466 --> 00:00:20,766 Because, you know, X will be exactly all these values here. 10 00:00:20,766 --> 00:00:22,800 You know with the three first columns. 11 00:00:22,800 --> 00:00:27,066 And so what we're simply going to do is play with the indexes to collect. 12 00:00:27,066 --> 00:00:31,733 Indeed the indexes of these three first columns, basically of the columns 13 00:00:31,933 --> 00:00:35,100 of all the columns of the data set except the last one. 14 00:00:35,633 --> 00:00:36,433 So let's do this. 15 00:00:36,433 --> 00:00:39,000 Let me show you how to do this. 16 00:00:39,000 --> 00:00:42,833 First, what you're going to do is take your data set that exact 17 00:00:42,833 --> 00:00:46,400 same variable which you created in this first line of code here okay. 18 00:00:46,733 --> 00:00:47,733 Data set. 19 00:00:47,733 --> 00:00:51,733 Then from this data set and I'm adding a dot here because we were about to use 20 00:00:52,000 --> 00:00:56,233 a function, you know, one of the attribute functions of a pandas dataframe. 21 00:00:56,433 --> 00:00:59,066 And that function is I look. 22 00:00:59,066 --> 00:01:01,200 And what will it allow us to do. 23 00:01:01,200 --> 00:01:05,700 Well as you can see I look here stands for locate indexes. 24 00:01:05,966 --> 00:01:09,866 And therefore what this function will do is it will take the indexes 25 00:01:10,033 --> 00:01:13,033 of the columns we want to extract from the data set, 26 00:01:13,300 --> 00:01:16,800 not only the indexes of the columns, but also the indexes of the rows. 27 00:01:16,800 --> 00:01:19,633 And actually we have to start here with the rows. 28 00:01:19,633 --> 00:01:23,433 We can specify the rows that we want to get and put into x. 29 00:01:23,700 --> 00:01:25,833 And of course these are all the rows. 30 00:01:25,833 --> 00:01:27,833 You know we want to get all the rows into x. 31 00:01:27,833 --> 00:01:31,500 We only want to take the first columns which we want to keep all the rows. 32 00:01:31,800 --> 00:01:34,833 And the trick to take all the rows, whatever data set 33 00:01:34,833 --> 00:01:38,900 you have with whatever number of rows is to add here a column. 34 00:01:39,100 --> 00:01:40,066 Why is that? 35 00:01:40,066 --> 00:01:42,833 Because of column in Python means a range. 36 00:01:42,833 --> 00:01:47,066 And when we specify a range without the lower bound and neither 37 00:01:47,066 --> 00:01:51,133 the upper bounds, that means in Python that we're taking everything in the range. 38 00:01:51,133 --> 00:01:53,500 Therefore here all the rows. 39 00:01:53,500 --> 00:01:55,266 So that's the trick to take all the rows. 40 00:01:55,266 --> 00:01:57,000 And you will always have to take all the rows. 41 00:01:57,000 --> 00:01:59,933 So here you won't have anything to change then. 42 00:01:59,933 --> 00:02:03,833 Now we have to specify which columns want to select with the indexes 43 00:02:04,133 --> 00:02:07,566 and to separate the rows that we just took from the columns 44 00:02:07,566 --> 00:02:09,533 we need here to add a comma. 45 00:02:09,533 --> 00:02:12,033 And now we can take care of the columns. 46 00:02:12,033 --> 00:02:12,500 All right. 47 00:02:12,500 --> 00:02:16,333 So now I'm going to show you a trick in order to take all the columns 48 00:02:16,333 --> 00:02:17,500 except the last one. 49 00:02:17,500 --> 00:02:20,933 Because indeed, as I told you, most of the data sets 50 00:02:20,933 --> 00:02:24,366 you will use to train your machinery models will have first 51 00:02:24,633 --> 00:02:27,233 the features you know, in the first columns 52 00:02:27,233 --> 00:02:30,866 and last the dependent variable vector in the last column. 53 00:02:31,266 --> 00:02:34,400 So now we're going to use a trick so that we can take automatically, 54 00:02:34,433 --> 00:02:36,800 you know, regardless of the number of columns 55 00:02:36,800 --> 00:02:40,433 in your data set all the columns except the last one, 56 00:02:40,533 --> 00:02:44,433 because all the columns except the last one are exactly the matrix of features. 57 00:02:44,933 --> 00:02:48,600 And the trick to do that is to add a new range here, 58 00:02:48,733 --> 00:02:52,000 which this time will be column minus one. 59 00:02:52,500 --> 00:02:53,733 So what does it mean? 60 00:02:53,733 --> 00:02:57,400 Well, as we said, the column here means the range. 61 00:02:57,400 --> 00:02:59,900 We know we're taking a range here on the left. 62 00:02:59,900 --> 00:03:00,766 We have nothing. 63 00:03:00,766 --> 00:03:03,833 That means that we're taking the first index. 64 00:03:03,833 --> 00:03:07,466 You know, the index zero because indexes in Python start at zero. 65 00:03:08,033 --> 00:03:10,766 And then you know we're going up to minus one. 66 00:03:10,766 --> 00:03:12,600 So what does this minus one mean. 67 00:03:12,600 --> 00:03:16,266 Well minus one means here the last column minus 68 00:03:16,266 --> 00:03:19,800 one in Python means the index of the last column. 69 00:03:20,100 --> 00:03:20,833 However. 70 00:03:20,833 --> 00:03:24,566 And that's a very important principle in Python which you must absolutely know. 71 00:03:25,000 --> 00:03:28,933 A range in Python includes the lower bound. 72 00:03:28,966 --> 00:03:31,966 Therefore we including here the lower bound zero to index 73 00:03:31,966 --> 00:03:35,000 zero, but exclude the upper bound. 74 00:03:35,100 --> 00:03:39,066 And therefore here we're excluding this index minus one meaning 75 00:03:39,066 --> 00:03:40,433 the index of the last column. 76 00:03:40,433 --> 00:03:44,933 And therefore what this will do is it will take all the columns 77 00:03:44,933 --> 00:03:46,500 excluding the last one. 78 00:03:46,500 --> 00:03:50,166 And that's exactly what we want for our matrix of features x. 79 00:03:51,000 --> 00:03:52,600 So voila. There you go. 80 00:03:52,600 --> 00:03:57,000 Now you just collected the right indexes to create your matrix of features X. 81 00:03:57,100 --> 00:04:01,066 And the beauty of this is that you won't have anything to change 82 00:04:01,200 --> 00:04:04,066 when creating the future matrices of features X 83 00:04:04,066 --> 00:04:07,566 of your future data sets, but make sure that your future data 84 00:04:07,566 --> 00:04:10,566 sets indeed have the features in the first columns 85 00:04:10,566 --> 00:04:13,400 and the dependent variable vector in the last column. 86 00:04:13,400 --> 00:04:14,866 Okay, perfect. 87 00:04:14,866 --> 00:04:19,666 So in order to finish this line of code, we just need to add here dot values. 88 00:04:20,100 --> 00:04:24,600 And this just means that we're taking indeed all the values in all the rows 89 00:04:24,600 --> 00:04:29,500 of this data set, and in all the columns except the last one of this data set. 90 00:04:30,300 --> 00:04:32,133 Perfect. So now you're learning a lot. 91 00:04:32,133 --> 00:04:35,266 Don't worry if this feels a bit overwhelming at the beginning. 92 00:04:35,400 --> 00:04:38,200 I promise you that we will use this trick many, many times, 93 00:04:38,200 --> 00:04:42,200 so you will just soon be so familiar with it and master it like a pro.