1 00:00:00,420 --> 00:00:01,210 Welcome back. 2 00:00:01,590 --> 00:00:06,300 Well we've had a look at a few different ways of describing data that we've imported depend is in into 3 00:00:06,360 --> 00:00:07,630 a data frame. 4 00:00:07,650 --> 00:00:12,920 Let's have a look at a couple of different ways we can view and select data. 5 00:00:13,050 --> 00:00:15,770 So come down here we'll put it another little heading in. 6 00:00:15,870 --> 00:00:25,110 So ago viewing and selecting data we're going to hit escape an M for markdown shift and enter because 7 00:00:25,110 --> 00:00:29,320 we're keeping our notebooks nice and clean and communicative right from the beginning. 8 00:00:29,640 --> 00:00:36,240 So to begin one of the first things you'll do is have a look at the head of your data frame. 9 00:00:36,250 --> 00:00:38,360 Now remember this has brackets after it. 10 00:00:38,390 --> 00:00:39,810 So it's a function. 11 00:00:39,810 --> 00:00:45,980 What this is going to do is return the first or the top five rows of your data frame. 12 00:00:46,030 --> 00:00:47,870 Now you might be thinking Why five. 13 00:00:47,910 --> 00:00:49,650 Well that's a good question. 14 00:00:49,650 --> 00:00:54,100 I'm not entirely sure but five seems like a pretty good number. 15 00:00:54,120 --> 00:00:59,730 And so in practice what you'll probably be doing is manipulating your data frame in some way and then 16 00:00:59,730 --> 00:01:03,340 calling head on it fairly often so you don't have. 17 00:01:03,420 --> 00:01:05,850 Now our data frames only 10 rows. 18 00:01:05,850 --> 00:01:10,880 So calling the whole thing isn't actually too bad but imagine if you had thousands of these. 19 00:01:11,070 --> 00:01:17,490 So what head does is it gives you a quick snapshot in a small little space of what your data frame contains. 20 00:01:17,490 --> 00:01:22,700 So if you make a couple quick changes here and there you might want to look at the top five rows and 21 00:01:22,710 --> 00:01:24,080 a five isn't enough. 22 00:01:24,150 --> 00:01:30,150 Maybe you want to look at the top seven or the beautiful thing about head is that you can type in a 23 00:01:30,150 --> 00:01:33,420 number here and it will return that many rows. 24 00:01:33,720 --> 00:01:37,690 So you can have a play around and return whatever amount of rows you want. 25 00:01:37,710 --> 00:01:45,370 Now for some reason if you wanted the bottom of your data frame you can use dot tail and that will return. 26 00:01:45,510 --> 00:01:48,800 You might have guessed the bottom five rows. 27 00:01:48,800 --> 00:01:53,750 Now this is handy if you're doing some alterations on the bottom of your data frame rather than the 28 00:01:53,750 --> 00:01:54,440 top. 29 00:01:54,500 --> 00:01:57,200 And so the changes will only appear towards the bottom. 30 00:01:57,200 --> 00:02:02,260 Something like that entails much the same you can put in any kind of number you want here. 31 00:02:02,480 --> 00:02:07,350 Press shift and enter and it will turn the bottom three rows there. 32 00:02:07,360 --> 00:02:12,210 Now the next two functions that we're going to look at our lock and I lock. 33 00:02:12,210 --> 00:02:14,040 So let's write them down here. 34 00:02:14,080 --> 00:02:19,930 Lock and Dot I lock now before we go into the details. 35 00:02:20,040 --> 00:02:23,610 Remember where you always want to run code first. 36 00:02:23,610 --> 00:02:25,260 If in doubt run the code. 37 00:02:25,260 --> 00:02:26,600 That's our motto. 38 00:02:26,790 --> 00:02:32,970 So let's create a series here so we can demonstrate the difference between lock and lock so I'm going 39 00:02:32,970 --> 00:02:35,400 to make one called animals. 40 00:02:35,400 --> 00:02:36,530 What's your favorite animal. 41 00:02:36,600 --> 00:02:43,850 Maybe since we're using pandas maybe we'll put a panda in there and one more snake. 42 00:02:43,950 --> 00:02:46,130 Now we're going to set the index here. 43 00:02:46,440 --> 00:02:50,930 Now if we created a series by default the index will be 0 2. 44 00:02:50,940 --> 00:02:53,070 However many things we have here. 45 00:02:53,280 --> 00:03:00,090 But let's actually see that rather than talk about it because remember if in doubt run the code animals 46 00:03:00,330 --> 00:03:02,390 0 to 4 in order. 47 00:03:02,430 --> 00:03:09,310 But if we wanted our own custom index which we can do by passing the index parameter and then we put 48 00:03:09,310 --> 00:03:16,380 it in a list to say 0 3 we want this index to be out of order on purpose so we can demonstrate lock 49 00:03:16,530 --> 00:03:18,090 and I lock. 50 00:03:18,120 --> 00:03:18,740 There we go. 51 00:03:18,740 --> 00:03:20,670 So we have five things here. 52 00:03:20,820 --> 00:03:23,690 Five things here both Python lists. 53 00:03:23,760 --> 00:03:28,550 Now if we run this we can see that our index has changed. 54 00:03:29,160 --> 00:03:30,540 So let's have a look. 55 00:03:30,720 --> 00:03:34,140 Animals don't lock three. 56 00:03:34,260 --> 00:03:41,310 If we call this what do you think we'll come back and lock you can consider is short for location. 57 00:03:41,670 --> 00:03:47,040 So let's try shift into dog and Snake. 58 00:03:47,040 --> 00:03:48,070 Beautiful. 59 00:03:48,120 --> 00:03:52,860 So what lock refers to is the index numbers. 60 00:03:53,040 --> 00:03:58,920 So because we have three and three here it returns two items. 61 00:03:58,920 --> 00:04:04,090 Let's try animals not lock nine. 62 00:04:04,170 --> 00:04:05,610 What do you think this will come back with. 63 00:04:05,610 --> 00:04:10,070 Which animal will shift into bird. 64 00:04:10,100 --> 00:04:15,590 The reason being is because bird is at index nine so it created bird. 65 00:04:15,800 --> 00:04:18,050 One two three one two three. 66 00:04:18,050 --> 00:04:18,930 Index nine. 67 00:04:19,700 --> 00:04:24,070 Okay now let's try our car sales data frame. 68 00:04:24,100 --> 00:04:25,810 You're not sure what this looks like. 69 00:04:25,820 --> 00:04:28,260 We'll run it here so we can see. 70 00:04:28,280 --> 00:04:28,550 Okay. 71 00:04:28,690 --> 00:04:31,700 0 1 2 3 the index is still in order here. 72 00:04:31,700 --> 00:04:32,960 Beautiful. 73 00:04:32,960 --> 00:04:35,770 Now let's run a lock on this dot lock. 74 00:04:35,770 --> 00:04:37,640 3 What do you think this will come back with. 75 00:04:37,670 --> 00:04:40,920 Which car let's try it out. 76 00:04:42,170 --> 00:04:49,160 All beautiful comes back with position 3 a beautiful black BMW the twenty two thousand dollars can be 77 00:04:49,220 --> 00:04:55,570 all yours I could become a car salesman What do you think so now let's try another one let's try. 78 00:04:55,610 --> 00:05:05,490 I lock this time we'll put I lock here so got animals don't I lock three this time we're doing that 79 00:05:05,600 --> 00:05:09,420 I lock instead of just lock so we have five items here. 80 00:05:09,430 --> 00:05:11,600 What do you think this will come back with. 81 00:05:11,810 --> 00:05:13,010 Let's try it out. 82 00:05:13,010 --> 00:05:22,270 Shift into panda now so we can see it let's put a new cell here and go animals beautiful. 83 00:05:22,440 --> 00:05:34,800 So what I like refers to I like refers to position so we can see here 0 1 2 3 if this was in order remember 84 00:05:35,160 --> 00:05:41,810 Python lease and data frames in series start from zero I look refers to a position whereas the lock 85 00:05:41,820 --> 00:05:49,050 refers to index Let's try maybe with our car sales data frame car sales to I lock three 86 00:05:51,750 --> 00:05:59,680 it comes back with the same as lock that is because if we go up well actually let's just do it here. 87 00:05:59,790 --> 00:06:01,040 Car sales. 88 00:06:01,230 --> 00:06:08,300 If we have a look at our data frame I lock position three member because I like refers to position so 89 00:06:08,370 --> 00:06:13,680 zero one two three is the same as index three. 90 00:06:13,740 --> 00:06:20,570 So that's the main two points you have to remember about I lock and lock is that I lock refers to position 91 00:06:20,960 --> 00:06:31,590 and we'll put in here lock lock refers to index they're the main two differentiations there. 92 00:06:32,420 --> 00:06:32,900 OK. 93 00:06:33,330 --> 00:06:39,970 Now the beautiful thing about this is that with I lock and lock you can use slicing. 94 00:06:39,990 --> 00:06:44,880 So if you've ever used Python lists you might be familiar with slicing but if not I'll show you what 95 00:06:44,880 --> 00:06:49,570 that looks like might type in animals that I lock. 96 00:06:49,630 --> 00:07:00,300 Colon 3 and what this means is rather than talk about it let's run the code is give us the items in 97 00:07:00,300 --> 00:07:03,680 animals up to position 3. 98 00:07:03,720 --> 00:07:06,030 Now I lock doesn't include position 3. 99 00:07:06,510 --> 00:07:12,570 So give us remember stuff from 0 0 1 2 3. 100 00:07:12,690 --> 00:07:15,180 So it's going to give us up to 3. 101 00:07:15,180 --> 00:07:18,950 Now let's try it with the car sales data from car sales. 102 00:07:19,080 --> 00:07:20,490 Don't lock. 103 00:07:20,490 --> 00:07:23,690 Colon three shift and enter. 104 00:07:23,770 --> 00:07:24,490 There we go. 105 00:07:24,490 --> 00:07:28,590 It's given us up to and including index 3. 106 00:07:28,630 --> 00:07:40,310 Basically this is the same as calling head but with for car sales don't head for excellent. 107 00:07:40,370 --> 00:07:40,890 Okay. 108 00:07:41,010 --> 00:07:41,860 Let's keep going. 109 00:07:42,600 --> 00:07:48,060 Now we've seen this previously but let's say we wanted to select just this make column and have a look 110 00:07:48,060 --> 00:07:48,870 at what's going on there. 111 00:07:49,560 --> 00:07:56,100 So the way to select a column is to type in its name in square brackets next to the name of the data 112 00:07:56,100 --> 00:08:00,740 frame as a strength shift into there's the make column. 113 00:08:00,900 --> 00:08:02,700 Let's have a look at the color column. 114 00:08:02,910 --> 00:08:08,470 Car sales color Excellent. 115 00:08:08,480 --> 00:08:09,830 We've selected that. 116 00:08:09,830 --> 00:08:14,660 Now there's two ways you might see I want you to be familiar with both of these ways of selecting a 117 00:08:14,660 --> 00:08:15,080 column. 118 00:08:15,080 --> 00:08:15,510 So if we go. 119 00:08:15,510 --> 00:08:16,200 Car sales. 120 00:08:16,190 --> 00:08:20,980 Dot make it easy as it's hype then something like that. 121 00:08:20,990 --> 00:08:22,370 But this is the same thing. 122 00:08:22,730 --> 00:08:24,490 Let me demonstrate. 123 00:08:24,530 --> 00:08:31,310 Car sales make these two lines of code are going to do the exact same thing. 124 00:08:31,400 --> 00:08:32,480 Let's prove it. 125 00:08:32,510 --> 00:08:38,610 I'll put a split in here by hitting control shift minus any I make. 126 00:08:38,690 --> 00:08:39,860 Excellent. 127 00:08:39,870 --> 00:08:48,020 We're going to a car sales don't make now the only real difference between these two is the syntax. 128 00:08:48,020 --> 00:08:49,810 So whichever one you prefer. 129 00:08:49,880 --> 00:08:56,570 You might want to use but what you should know about the dot notation is that if your column name has 130 00:08:56,570 --> 00:09:00,190 a space in it the dot notation won't work. 131 00:09:00,230 --> 00:09:08,030 So if we tried to do this for odometer K M it will come back with an error. 132 00:09:08,630 --> 00:09:19,800 But if we put a cell in here and go car sales odometer K and run this it'll return the odometer. 133 00:09:19,850 --> 00:09:26,260 So there are two main differences between selecting a single column and now one last thing before this 134 00:09:26,260 --> 00:09:31,300 video gets too long is that if you wanted to select a single column but put a little bit of a filter 135 00:09:31,300 --> 00:09:33,370 on it let's start with the make column. 136 00:09:33,370 --> 00:09:38,600 But we only want rows with that car maker Toyota. 137 00:09:39,490 --> 00:09:42,340 So can you decipher what's going on here. 138 00:09:42,340 --> 00:09:43,980 We're saying car sales. 139 00:09:44,110 --> 00:09:44,500 Okay. 140 00:09:44,500 --> 00:09:45,550 And then we're passing it. 141 00:09:45,550 --> 00:09:47,020 This little condition here. 142 00:09:47,020 --> 00:09:49,340 This is called boolean indexing. 143 00:09:49,390 --> 00:09:51,770 Let's hit shift entered to have a look. 144 00:09:51,780 --> 00:09:58,410 So this is going to say hey panders give us a car sales data frame and this condition is I only want 145 00:09:58,410 --> 00:10:04,370 the car sales data in the make column which are equal to Toyota. 146 00:10:04,380 --> 00:10:12,480 Now what if we only wanted maybe those with over 100000 kilometers on the data on the the data on the 147 00:10:12,480 --> 00:10:13,450 odometer. 148 00:10:13,450 --> 00:10:19,980 We've got data on the brain as well we're saying it let's get odometer here Cam because remember we 149 00:10:19,980 --> 00:10:26,130 can't do the dot notation so you can create a habit which when you want to type usually you pick one 150 00:10:26,220 --> 00:10:30,300 way of selecting a single column and stick with that throughout your entire notebook. 151 00:10:30,510 --> 00:10:35,240 So we want greater than 100000. 152 00:10:35,260 --> 00:10:43,220 Now this is gonna give back the rows that fulfill this condition so you can imagine how advanced you 153 00:10:43,220 --> 00:10:47,990 could get with this if you had say 100000 different cars you might put in a few more different criteria 154 00:10:47,990 --> 00:10:54,080 in here but this is just a few simple use cases of how we can view and select data. 155 00:10:54,080 --> 00:10:57,710 Now there's a couple more we're going to go through but to prevent this video from going for too long 156 00:10:57,710 --> 00:10:59,530 we're going to take a little break. 157 00:10:59,660 --> 00:11:04,460 Go back over what we've gone just through and practice a little bit before the next one but otherwise 158 00:11:04,490 --> 00:11:07,700 we'll see some more ways of selecting and viewing data in the next video.