1 00:00:00,420 --> 00:00:03,970 Now we've got a little bit of an overview of what pandas is. 2 00:00:04,020 --> 00:00:10,020 Let's start to code in so to begin we're going to load up a terminal or command prompt. 3 00:00:10,020 --> 00:00:14,660 Now this is a workflow we've been through a couple of times now but we're going to keep going through 4 00:00:14,670 --> 00:00:19,370 with each new concept to make sure we kind of get it down pat because that's what we're good doing right. 5 00:00:19,370 --> 00:00:22,470 We're just practicing and practicing and practicing. 6 00:00:22,470 --> 00:00:29,280 So if we try to access Jupiter notebooks which is our workspace where we want to use pandas we run this 7 00:00:29,280 --> 00:00:31,710 code elsewhere in the base environment. 8 00:00:31,710 --> 00:00:33,060 It's not going to work. 9 00:00:33,240 --> 00:00:36,950 So we need to activate the environment that we've created. 10 00:00:37,410 --> 00:00:45,000 So if I go Conda and list this will show me my environments and in my case I want to go to this sample 11 00:00:45,000 --> 00:00:50,640 project folder that we created and beginning because that environment has all the tools such as map 12 00:00:50,640 --> 00:00:54,490 plot lib pandas Jupiter num pi etc.. 13 00:00:54,540 --> 00:01:00,990 We're focused on using Jupiter in pandas in this section so to activate that we're going to type in 14 00:01:00,990 --> 00:01:08,430 Conda activate then I could copy this here and put it down there but we're going to practice typing 15 00:01:08,430 --> 00:01:13,230 out commands in the command line because that's what we want to get proficient and used to typing these 16 00:01:13,230 --> 00:01:14,970 sort of things. 17 00:01:14,990 --> 00:01:21,420 The normal course sample project slash M will hit enter. 18 00:01:21,660 --> 00:01:30,750 So now that is activated this and you can see we've changed from base to having this environment activated. 19 00:01:30,810 --> 00:01:39,520 So if I load up a Jupiter notebook now by typing to put a notebook it's going to open up my browser. 20 00:01:39,960 --> 00:01:48,780 Beautiful now because I'm in my home directory I need to look into my desktop and then email course 21 00:01:49,320 --> 00:01:51,390 sample project. 22 00:01:51,390 --> 00:01:51,960 There we are. 23 00:01:52,050 --> 00:01:56,030 So now we're running this sample project folder within the Jupiter dashboard. 24 00:01:56,040 --> 00:02:02,250 Now this is a project that we made in before and previous session where we created the example notebook 25 00:02:03,230 --> 00:02:11,500 so to begin exploring pandas we're going to go new Python 3 notebook we'll give this a name we'll call 26 00:02:11,500 --> 00:02:23,860 it introduction to pandas There we go into it so to begin with pandas the first step is to import pandas 27 00:02:23,920 --> 00:02:31,060 as PD we've seen this a few times already but this abbreviation payday is just a short version of pandas 28 00:02:31,450 --> 00:02:37,240 it saves us from having to type pandas continually so we'll hit shift and enter and this is just going 29 00:02:37,240 --> 00:02:43,180 to tell python as a running Python 3 Hey Python go and get the pandas tool for us because we want to 30 00:02:43,180 --> 00:02:51,010 use it and now the first thing to know about pandas is there's two main data types riding that little 31 00:02:51,010 --> 00:02:57,550 hash this is a comment and I'll hit Enter the first one I'm going to look at is series so we go series 32 00:02:57,670 --> 00:03:04,300 equals PD series and now remember where the style we're gonna we're gonna go through these concepts 33 00:03:04,330 --> 00:03:11,320 is by looking at them first before discussing them because remember if in doubt run the code now if 34 00:03:11,320 --> 00:03:18,430 I make a series here the command PD dot series takes a python list so I've got a few car brands in there 35 00:03:18,430 --> 00:03:24,790 so we could really call this car brands if we want we'll leave it a series so let's hit shift and enter 36 00:03:25,800 --> 00:03:32,920 and to view our series in Jupiter notebook we'll type in series hit shift enter so now we can see we've 37 00:03:32,920 --> 00:03:38,770 got like a single one dimensional column of information that's the main thing you need to know about 38 00:03:38,770 --> 00:03:48,760 series through series equals one dimensional so one column let's practice making another one hit escape 39 00:03:48,790 --> 00:03:49,960 and create a new cell. 40 00:03:50,560 --> 00:04:07,970 So what about this one we do some colors colors equals PDT not series Red we'll go blue and white excellent 41 00:04:09,760 --> 00:04:16,260 so if we go underneath here and we'll put colors underneath so it displays it shift into beautiful. 42 00:04:16,270 --> 00:04:19,440 So now we've got two one day series one dimensional. 43 00:04:19,450 --> 00:04:25,700 When I say one day by the way one of car brands and one of different colors. 44 00:04:25,720 --> 00:04:33,670 So now the difference between a series and a data frame now a data frame is two dimensional and data 45 00:04:33,670 --> 00:04:35,960 frame is far more color common. 46 00:04:36,010 --> 00:04:43,300 I'm saying column because data frames have columns a data frame is far more common than a series because 47 00:04:43,370 --> 00:04:46,960 oftentimes with your data you'll have more than just one single column. 48 00:04:46,960 --> 00:04:48,580 So let's create a data frame. 49 00:04:48,580 --> 00:04:55,960 We'll call this one car data maybe car data equals PD dot data frame and now data frame is a little 50 00:04:55,960 --> 00:04:59,280 bit different because it takes a Python dictionary. 51 00:04:59,320 --> 00:05:01,650 So micro car make. 52 00:05:01,960 --> 00:05:07,450 And then the beautiful thing about data frames is that we can create a data frame out of series. 53 00:05:07,450 --> 00:05:09,640 So we've got series here. 54 00:05:09,640 --> 00:05:10,570 So car make. 55 00:05:10,570 --> 00:05:11,530 This is the key. 56 00:05:11,530 --> 00:05:22,670 This is the value will go color and we'll give it our colors series so down here we'll go get car data 57 00:05:24,060 --> 00:05:25,060 beautiful. 58 00:05:25,110 --> 00:05:28,080 That is our first data frame. 59 00:05:28,080 --> 00:05:29,450 Now this is so exciting right. 60 00:05:29,460 --> 00:05:34,980 Because I love data frames and it's a beautiful thing to start seeing your first ever data frame being 61 00:05:34,980 --> 00:05:35,550 made. 62 00:05:35,580 --> 00:05:42,470 And now if you have a look here making a data frame from scratch can kind of be a bit tedious. 63 00:05:42,480 --> 00:05:50,610 So what you usually do is rather than creating a data frame from some series you're going to import 64 00:05:50,610 --> 00:05:51,150 data. 65 00:05:51,510 --> 00:05:54,420 So we'll type that down and import data. 66 00:05:54,420 --> 00:05:59,230 And now where would you get data say for example we wanted some car data. 67 00:05:59,430 --> 00:06:01,080 Where could we find that. 68 00:06:01,080 --> 00:06:07,680 Well if we go up to this beautifully created premade spreadsheet and by the way all the resources as 69 00:06:07,680 --> 00:06:13,780 to what we're working on we'll be in the resources section we can see that this is a spreadsheet called 70 00:06:13,780 --> 00:06:15,130 car sales. 71 00:06:15,250 --> 00:06:20,170 And now how do you think we might get this into a panda's data frame. 72 00:06:20,620 --> 00:06:21,040 That's right. 73 00:06:21,040 --> 00:06:25,750 If you don't know the answer we haven't been through it yet but this is what panders is great at doing 74 00:06:25,750 --> 00:06:26,090 right. 75 00:06:26,680 --> 00:06:31,750 You'll have some structured data maybe in a spreadsheet or something which could be anything right. 76 00:06:31,750 --> 00:06:37,380 This is car sales but you could have transactions you could have patient records something like that 77 00:06:38,350 --> 00:06:45,430 but we want to rather than manipulate the data in a spreadsheet we want to start using code specifically 78 00:06:45,430 --> 00:06:46,640 Panda's code. 79 00:06:46,660 --> 00:06:49,160 So the first thing is we have to import it. 80 00:06:49,450 --> 00:06:56,140 Now for this car sales data the way we could do it is to export it. 81 00:06:56,200 --> 00:07:02,300 Now there's a little download tab here into commerce separated values dot CSC. 82 00:07:02,320 --> 00:07:08,320 Now this is a very common data storage file type and panders works beautifully with it. 83 00:07:08,740 --> 00:07:15,550 So if we were to export this spreadsheet data as comma separated values let's do that that's going to 84 00:07:15,550 --> 00:07:17,050 save down here. 85 00:07:17,080 --> 00:07:22,810 Now I've already imported this into my sample project folder as car sales dot CSC. 86 00:07:23,380 --> 00:07:26,170 So let's see how we would import that in pandas. 87 00:07:26,610 --> 00:07:35,200 So car sales equals PD don't read CSC and then we're going to type in the file name as a string. 88 00:07:35,230 --> 00:07:37,420 So car sales dot CSC. 89 00:07:37,690 --> 00:07:43,630 Now the beautiful thing about pandas or Jupiter notebooks in general actually is that you can use tab 90 00:07:43,720 --> 00:07:44,700 auto complete. 91 00:07:44,920 --> 00:07:45,790 So let's try that out. 92 00:07:46,180 --> 00:07:48,700 So we're gonna press tab. 93 00:07:48,720 --> 00:07:49,880 There we go. 94 00:07:49,890 --> 00:07:53,470 And we've got some options here that's going to pre fill that. 95 00:07:53,580 --> 00:07:58,770 So we don't want the missing data one we'll have a look at that in a future video but let's click that. 96 00:07:58,770 --> 00:07:59,610 There we go. 97 00:07:59,610 --> 00:08:01,370 Car sales don't CSP. 98 00:08:01,440 --> 00:08:09,720 So all this is doing is saying hey pandas with PDA read this CSB file and save it to car sales. 99 00:08:09,720 --> 00:08:15,710 So will it shift into now let's have a look at that data from beautiful. 100 00:08:15,730 --> 00:08:26,330 So now we can see this is just the exact same information that's here in a panda's data frame. 101 00:08:26,330 --> 00:08:32,960 Now the beautiful thing about this is that because it's now in a panda's data frame we can take advantage 102 00:08:32,960 --> 00:08:39,540 of all the functions that pandas has to offer and manipulating viewing and changing this data. 103 00:08:39,700 --> 00:08:45,820 All right let's have a quick overlook of what the anatomy of this data frame is because different places 104 00:08:45,820 --> 00:08:47,650 refer to different things. 105 00:08:47,650 --> 00:08:55,070 So we go to this slide the anatomy of a data frame so we'll break it down on the left hand side is the 106 00:08:55,080 --> 00:08:56,700 index column. 107 00:08:56,720 --> 00:09:00,940 Now pandas data frames like Python the list start at zero. 108 00:09:01,460 --> 00:09:07,670 So if you had five rows in your data frame it would only go up to four member it starts at zero. 109 00:09:07,670 --> 00:09:09,790 Now this here is a row. 110 00:09:10,070 --> 00:09:17,960 So these all these values here including the index is called a row and these ones here are columns and 111 00:09:17,960 --> 00:09:24,960 a row is referred to as access zero and a column is referred to as Axis equals 1. 112 00:09:24,980 --> 00:09:31,010 So if you ever see a panda's function in the future which has a little parameter called access if it's 113 00:09:31,010 --> 00:09:33,880 saying axis equals zero its meaning a row. 114 00:09:33,950 --> 00:09:37,490 Now if it's saying axis equals one it means a column. 115 00:09:37,550 --> 00:09:41,610 I got confused by this in the beginning I was just like just tell me if it's a row or column. 116 00:09:41,750 --> 00:09:46,450 But some functions use axis zero or axis equals one. 117 00:09:46,460 --> 00:09:52,610 Now these individual values here are data you can have different kinds of data types we'll see those 118 00:09:52,610 --> 00:09:58,250 in a future video and the headings up here in bold are column names. 119 00:09:58,250 --> 00:10:01,890 So these are the main things you need to know about a panders data frame. 120 00:10:02,000 --> 00:10:04,820 You can take a screenshot of this or something like that. 121 00:10:04,910 --> 00:10:09,620 It'll be in the resources section as well but if ever you hear me referring to different terms around 122 00:10:09,620 --> 00:10:14,240 here you can always have a look at Europe the anatomy of the data frame and know what what's being talked 123 00:10:14,240 --> 00:10:15,300 about. 124 00:10:15,380 --> 00:10:17,470 Let's go back to our notebook. 125 00:10:17,780 --> 00:10:24,260 And so once you have imported data from what you might want to do is make some changes here and then 126 00:10:24,350 --> 00:10:25,010 export it. 127 00:10:25,100 --> 00:10:29,510 So let's see how we would do that exporting a data frame. 128 00:10:29,510 --> 00:10:35,630 Now of course we haven't made any changes to ours just yet but we're going to export it anyway to see 129 00:10:36,290 --> 00:10:39,300 an example of how to do it to export a data frame. 130 00:10:39,350 --> 00:10:48,130 We take the variable of the data frame car sales dot to CSC so to import it was read CSB. 131 00:10:48,200 --> 00:10:54,920 Now we're gonna call this function car sales dot to CSC and then pass it a string name of what we want 132 00:10:54,920 --> 00:11:00,700 to call it import export and car sales dot CSC. 133 00:11:00,710 --> 00:11:08,000 Now you could also do car sales dot to excel if you wanted to export it to an excel file. 134 00:11:08,300 --> 00:11:14,090 But since we're only really working with CSR vs which is kind of a universal data type you might not 135 00:11:14,090 --> 00:11:19,520 have excel on your computer CSR vs you can run almost anywhere. 136 00:11:19,520 --> 00:11:21,040 We're going to export to CSC. 137 00:11:21,680 --> 00:11:29,280 So if we hit shift and enter what do you think that's gonna go let's go back up here. 138 00:11:29,280 --> 00:11:34,080 We're looking here export and car sales don't CSP a few seconds ago. 139 00:11:34,320 --> 00:11:35,280 Beautiful. 140 00:11:35,610 --> 00:11:44,670 So now if we wanted to create a new variable exported car sales just to demonstrate PD don't read CSB 141 00:11:46,140 --> 00:11:53,400 car sales now we want to export it car sales to all we're doing now is just reinforcing this file here 142 00:11:53,400 --> 00:12:00,580 to see what it looks like and then we'll go export it car sales fix up this one as well. 143 00:12:00,660 --> 00:12:07,030 Shifting into beautiful we import it back in but there's some unnamed column here. 144 00:12:07,080 --> 00:12:13,350 Now this is a little trick of what to see as V does because as this index column here. 145 00:12:13,680 --> 00:12:19,800 If you don't put in an index I think it's equals false. 146 00:12:19,800 --> 00:12:24,880 If you don't put in this it'll export it with the index as a column. 147 00:12:24,960 --> 00:12:28,240 So let's see what happens if we go that index equals false. 148 00:12:28,260 --> 00:12:34,650 Now we're going to all that's done is re export of the file and we're going to re important. 149 00:12:34,650 --> 00:12:35,310 There we go. 150 00:12:35,340 --> 00:12:38,020 It's going to index it as it gets imported. 151 00:12:38,160 --> 00:12:40,850 So that little parameter there index equals false is go. 152 00:12:40,890 --> 00:12:42,080 Export this data frame. 153 00:12:42,090 --> 00:12:47,430 And don't worry about throwing the numbers there but I think that's enough for now. 154 00:12:47,510 --> 00:12:52,580 We've looked at the main takeaways from this is that pandas have two main data types series which is 155 00:12:52,580 --> 00:12:59,540 a one dimensional column so there's only one column here data frame which is the far more common type 156 00:12:59,900 --> 00:13:02,290 which is two columns or two dimensional sorry. 157 00:13:02,290 --> 00:13:04,470 So you've got rows and columns. 158 00:13:04,730 --> 00:13:11,210 But the more common way of importing data into pandas is to read something like a CSA v file and then 159 00:13:11,210 --> 00:13:19,040 once you've made changes on this or manipulated in some way you can export it as a CSP file. 160 00:13:19,040 --> 00:13:20,900 All right I'll see you in the next video.