1 00:00:00,550 --> 00:00:01,220 OK. 2 00:00:01,260 --> 00:00:02,560 Welcome back. 3 00:00:02,580 --> 00:00:09,450 Now in the last video we went through different data types in panders series and data frames creating 4 00:00:09,450 --> 00:00:15,330 some series in a data frame importing data and then exporting data. 5 00:00:15,330 --> 00:00:20,490 Now we're going to look at different ways you can describe data in pandas so we'll make a little heading 6 00:00:20,490 --> 00:00:21,310 here. 7 00:00:21,330 --> 00:00:25,410 Remember to do that you can type some markdown if you've never seen markdown before. 8 00:00:25,530 --> 00:00:30,100 You can do some research but to create a heading you can do two little hash. 9 00:00:30,240 --> 00:00:35,160 And this world will press escape an endless shift enter. 10 00:00:35,160 --> 00:00:38,720 Now I've got described data in a nice little heading format. 11 00:00:38,730 --> 00:00:41,580 This is a great way to break up Jupiter notebooks. 12 00:00:41,790 --> 00:00:42,750 So it's communicative 13 00:00:45,370 --> 00:00:46,760 to the way you call functions. 14 00:00:46,760 --> 00:00:55,300 We've already seen this on an example here to CSP on pandas data frames is by pressing the dot notation. 15 00:00:55,400 --> 00:00:59,450 So if we go to tab it might show us. 16 00:00:59,760 --> 00:01:00,520 There we go. 17 00:01:00,540 --> 00:01:03,700 Took a little while and I got impatient and pressed time to long. 18 00:01:03,830 --> 00:01:06,270 It'll show us a few functions here that we can run. 19 00:01:06,270 --> 00:01:09,600 But remember there isn't a lot here. 20 00:01:09,600 --> 00:01:15,310 So in the beginning don't focus on trying to learn all of these off by heart. 21 00:01:15,360 --> 00:01:21,480 Just focus on using the few that were going through at the moment and then build upon your knowledge 22 00:01:21,540 --> 00:01:22,240 as you go. 23 00:01:22,920 --> 00:01:28,710 So the first thing we're going to have a look at is how to describe the data types in our data frame. 24 00:01:28,710 --> 00:01:32,910 We can do this by typing in D types. 25 00:01:32,910 --> 00:01:36,300 So you'll notice some functions have brackets after. 26 00:01:36,320 --> 00:01:39,610 And some don't like this one doesn't have brackets over it. 27 00:01:39,720 --> 00:01:43,350 The difference between these two is this is an attribute. 28 00:01:44,190 --> 00:01:57,140 So we type in here attribute whereas a function like car sales to see as V is a function. 29 00:01:57,180 --> 00:02:02,550 So the difference between an attribute and a function is this is going to perform some steps whereas 30 00:02:02,550 --> 00:02:08,520 D types which is an attribute is just some meta information which is stored about the car sales data 31 00:02:08,520 --> 00:02:09,080 frame. 32 00:02:09,150 --> 00:02:14,520 So that's the two main things to remember here that the difference between an attribute and a function. 33 00:02:14,520 --> 00:02:19,940 We don't want to run this line of code so we'll comment that out but we do want to see the D types. 34 00:02:19,950 --> 00:02:23,510 Let's do that hitting shift enter. 35 00:02:23,520 --> 00:02:27,960 Now this will tell us some information about our car sales data from columns. 36 00:02:27,960 --> 00:02:30,270 So it's going to make column let's view it up here. 37 00:02:30,270 --> 00:02:33,440 Make color odometer doors price. 38 00:02:33,510 --> 00:02:35,310 That's our column names. 39 00:02:35,310 --> 00:02:39,610 So this is going to tell us the type of our make column is an object. 40 00:02:39,630 --> 00:02:43,380 Color is an object odometer is an int. 41 00:02:43,440 --> 00:02:45,000 Doors is an integer. 42 00:02:45,210 --> 00:02:47,250 And price is also an object. 43 00:02:48,030 --> 00:02:50,670 Let's have a look at some other things we can do with that data frame. 44 00:02:50,700 --> 00:02:52,530 Car sales note columns. 45 00:02:52,530 --> 00:02:55,660 Now is this a function or an attribute. 46 00:02:55,710 --> 00:02:58,710 It doesn't have brackets so it's an attribute. 47 00:02:58,710 --> 00:03:04,380 So this is going to tell us our column names is going to return it as a list so you could go car columns 48 00:03:04,560 --> 00:03:07,450 equals car sales columns. 49 00:03:08,040 --> 00:03:16,160 Hit shift enter and now you've got your column names stored as a list so you can perform some operations 50 00:03:16,160 --> 00:03:20,140 on this list and then use that later on to manipulate your data frame. 51 00:03:20,150 --> 00:03:24,850 This is just how you access two of the main types of attributes from your data frame. 52 00:03:24,890 --> 00:03:26,000 Let's have a look at something else. 53 00:03:26,000 --> 00:03:30,050 What about the indexes go car columns index 54 00:03:33,640 --> 00:03:40,910 has no attribute index all because we talked in car columns we want our car sales data frame. 55 00:03:41,050 --> 00:03:41,930 There we go. 56 00:03:43,010 --> 00:03:43,670 Beautiful. 57 00:03:43,790 --> 00:03:46,970 So this is going to say Let's view the data frame here. 58 00:03:46,970 --> 00:03:53,060 Car sales is going to say the range of the index starts at zero and stops at 10. 59 00:03:53,150 --> 00:03:56,270 Step is 1 0 1 2 3. 60 00:03:56,380 --> 00:03:56,750 Okay. 61 00:03:56,840 --> 00:03:58,590 That makes sense. 62 00:03:58,670 --> 00:04:02,050 Now we've gone through a few different attributes of our data frame. 63 00:04:02,060 --> 00:04:05,120 Let's have a look at some of the functions we can call. 64 00:04:05,120 --> 00:04:10,490 Remember the main difference between a function and an attribute is one doesn't have brackets attributes 65 00:04:10,490 --> 00:04:15,510 don't have brackets and functions are going to perform some kind of operation. 66 00:04:15,590 --> 00:04:19,680 So the first one will have a look at is describe. 67 00:04:19,700 --> 00:04:26,540 There we go what the scribe does is gives us some statistical information about our numeric columns. 68 00:04:26,540 --> 00:04:34,920 You might notice why isn't price coming up here is a numeric column so we've got odometer. 69 00:04:34,920 --> 00:04:36,570 That means there's 10 columns count. 70 00:04:36,570 --> 00:04:39,500 The main is seventy eight thousand six hundred one. 71 00:04:39,660 --> 00:04:46,100 So the odometer is how many kilometers a car might have done and the number of doors that count is 10 72 00:04:46,100 --> 00:04:53,760 meaning there's 10 rows with doors as a column and the main is for standard deviation is that zero point 73 00:04:53,760 --> 00:04:56,670 4 7 1 4 0 5. 74 00:04:56,720 --> 00:04:58,020 We've got percentiles here. 75 00:04:58,050 --> 00:05:01,510 Min and max but no price. 76 00:05:01,810 --> 00:05:07,280 Well if we come back up here we can see that the odometer is an integer value. 77 00:05:07,390 --> 00:05:12,080 And the doors are also an integer value but the price is an object. 78 00:05:12,100 --> 00:05:13,700 Now that's interesting. 79 00:05:13,860 --> 00:05:14,340 Mm hmm. 80 00:05:14,440 --> 00:05:22,020 Maybe when we imported it from our CSA this cell is formatted differently to these cells. 81 00:05:22,090 --> 00:05:28,650 Don't worry we'll have a look at that in a future video but something to keep in mind is it describe 82 00:05:28,650 --> 00:05:31,830 works on only numeric columns. 83 00:05:31,830 --> 00:05:33,450 So let's have a look at something else. 84 00:05:33,540 --> 00:05:40,950 Maybe we want info car sales or info and now it's important to remember that these type of description 85 00:05:40,950 --> 00:05:49,120 functions and attributes are some things that you might run through at the very start of exploring a 86 00:05:49,120 --> 00:05:50,740 new set of data. 87 00:05:50,770 --> 00:05:55,990 So that's what we're doing we're pretending we've got car sales here and we want to start running functions 88 00:05:55,990 --> 00:06:02,100 like info and describe to just start exploring our data and getting some information about it. 89 00:06:02,110 --> 00:06:03,160 So what's this gonna give us. 90 00:06:03,160 --> 00:06:07,150 Let's have a look Range Index 10 entry 0 to 9. 91 00:06:07,150 --> 00:06:15,740 Now info is kind of like index combined with data types. 92 00:06:15,850 --> 00:06:20,700 And so remember panders has many different ways to do similar things. 93 00:06:20,700 --> 00:06:26,130 So as you go through and work through learn more about the library you kind of find a pattern that works 94 00:06:26,130 --> 00:06:27,170 for you best. 95 00:06:27,180 --> 00:06:32,010 I'm just showing your broad collection of different useful functions you can use throughout any type 96 00:06:32,010 --> 00:06:33,590 of data analysis. 97 00:06:33,750 --> 00:06:40,390 What this is going to tell us is that the make column has 10 non now objects 10 normal objects for color. 98 00:06:40,500 --> 00:06:42,610 Same again for the rest of the columns. 99 00:06:42,720 --> 00:06:46,470 There's two D types for int 64. 100 00:06:46,480 --> 00:06:50,700 Yeah that makes sense in 64 in 64 and three for object. 101 00:06:50,700 --> 00:06:50,990 Okay. 102 00:06:51,000 --> 00:06:54,480 And that's not using very much memory all right. 103 00:06:54,670 --> 00:06:55,710 Let's keep going. 104 00:06:55,720 --> 00:07:00,010 Car sales you can call statistical analysis on your data frame. 105 00:07:00,090 --> 00:07:05,530 So car sales don't mean we'll give you the average of your numerical columns remember price doesn't 106 00:07:05,530 --> 00:07:13,730 show up here because it's an object not a numerical column you can even call mean on individual series 107 00:07:13,750 --> 00:07:24,350 that's make a a series called car prices equals P A DOT series three thousand let's pretend that this 108 00:07:24,350 --> 00:07:31,130 is our car prices that they're not in object format they're an integer format and then we go car prices 109 00:07:31,220 --> 00:07:37,400 don't mean princes conferences I wish I was a car Prince. 110 00:07:37,400 --> 00:07:38,090 There we go. 111 00:07:38,390 --> 00:07:41,620 So that's gonna give us the average of these three numbers here. 112 00:07:41,840 --> 00:07:42,670 That's pretty high there. 113 00:07:42,680 --> 00:07:44,330 Thirty eight thousand five hundred eighty three. 114 00:07:44,330 --> 00:07:47,930 Because we have this big value that's a car Prince car right there. 115 00:07:47,930 --> 00:07:51,580 Next one we can have a look at is some so we can go. 116 00:07:51,590 --> 00:07:53,440 Car sales not some. 117 00:07:53,450 --> 00:07:56,330 This is going to sum up all of the different columns value. 118 00:07:56,330 --> 00:08:01,400 So this is just really combined what's in the make column was in the color column. 119 00:08:01,400 --> 00:08:05,750 It's total up how many at a domino there is how many doors there are. 120 00:08:06,170 --> 00:08:11,660 It's not really useful calling it on the whole data frame so maybe we do it on a single column now to 121 00:08:11,660 --> 00:08:16,850 select a single column we'll go through this in the selection video to select a single column just to 122 00:08:16,850 --> 00:08:24,560 demonstrate it here doors dot some you type in the name of the column in square brackets as a string 123 00:08:24,890 --> 00:08:27,750 and then call dot some hit shift enter. 124 00:08:27,860 --> 00:08:31,490 Now that's just going to bring this some down here. 125 00:08:31,490 --> 00:08:37,730 Now you can do a fair few different versions of this statistical like not just some bingo go mean median 126 00:08:38,030 --> 00:08:38,860 and a bunch of others. 127 00:08:38,870 --> 00:08:45,590 So I'd suggest if you're practicing try out a few different statistical functions even look up the panda's 128 00:08:45,590 --> 00:08:50,810 documentation for different statistical measures you can call on different columns. 129 00:08:50,810 --> 00:08:56,750 And then finally if you want one last bit of information about your data frame is length. 130 00:08:57,410 --> 00:09:03,350 So the reason you might want this is say your data frames hundred thousand rows and you only wanted 131 00:09:03,350 --> 00:09:07,800 to work on a few thousand for an exploratory data analysis. 132 00:09:07,940 --> 00:09:13,700 You could figure out the length of the data frame you're currently working on by passing it to Len and 133 00:09:13,700 --> 00:09:18,300 we can see that there's 10 rows in our data from we have a look at it one more time. 134 00:09:18,500 --> 00:09:19,460 Beautiful. 135 00:09:19,550 --> 00:09:21,050 That's nice and correct. 136 00:09:21,140 --> 00:09:21,440 OK. 137 00:09:22,520 --> 00:09:28,520 So that's the end of this describing section what you could probably do is go through practice running 138 00:09:28,520 --> 00:09:34,040 the code yourself and then see if there's any other functions here that you want to try out maybe create 139 00:09:34,040 --> 00:09:41,390 your own series like this and then see what statistical values you can find out about your series but 140 00:09:41,900 --> 00:09:46,760 without further ado I'll see you in the next lecture and we'll look at how to differently view data 141 00:09:47,090 --> 00:09:48,080 and select data.