1 00:00:00,450 --> 00:00:01,500 Welcome back. 2 00:00:01,500 --> 00:00:06,960 In the last lecture we looked at a few different ways to manipulate data especially missing data. 3 00:00:06,960 --> 00:00:08,640 We looked at in place. 4 00:00:08,640 --> 00:00:13,630 So you want to perform an operation on Panda's data frames in place you can set the pace parameters 5 00:00:13,630 --> 00:00:18,590 to true or you can reassign it by using. 6 00:00:18,590 --> 00:00:25,150 We don't have a reassignment example where you can reassignment by setting it to equaling something. 7 00:00:25,240 --> 00:00:28,230 Now we've had enough of working with missing data. 8 00:00:28,390 --> 00:00:34,810 How do we create data from existing data or pandas has a few different ways that you can create data 9 00:00:34,870 --> 00:00:36,800 such as building a new column. 10 00:00:36,940 --> 00:00:38,590 Let's have a look at a couple of different ways. 11 00:00:39,250 --> 00:00:42,690 Let's start with a series so communist first. 12 00:00:42,910 --> 00:00:52,130 We'll go a column from series then we'll create a series called seats column equals Payday series and 13 00:00:52,130 --> 00:00:59,250 maybe inside that series we'll put a few numbers of five because a few of our cars have five seats or 14 00:00:59,360 --> 00:01:02,510 actually all of our cars have five seats but we're not going to fill it out. 15 00:01:02,550 --> 00:01:06,770 So at car sales seats another way to make a new column. 16 00:01:06,920 --> 00:01:15,890 We'll a comment here new column called seat remember our car sales data frame is the one we were working 17 00:01:15,890 --> 00:01:17,160 with above. 18 00:01:17,870 --> 00:01:19,820 So we can view it here. 19 00:01:19,890 --> 00:01:22,070 So car sales new column. 20 00:01:22,430 --> 00:01:27,160 Now it's very similar to how you would select a column to create a new column. 21 00:01:27,170 --> 00:01:30,360 But remember we're going to run the code first before we discuss it. 22 00:01:30,410 --> 00:01:38,320 So we're going to set it equal to the seats column which is that series we've just created and now we 23 00:01:38,320 --> 00:01:43,470 want to view our car sales data from excellent. 24 00:01:43,560 --> 00:01:46,610 So we'll see what this is done. 25 00:01:46,630 --> 00:01:52,590 We've got the exact same columns as before but this time we've now got seats on the right hand side. 26 00:01:52,680 --> 00:01:54,690 So that's created a column on the very right. 27 00:01:54,700 --> 00:01:59,860 When you create a new column in panders by default it appears on the very right hand side of your data 28 00:01:59,860 --> 00:02:05,040 frame but we've got some missing values here. 29 00:02:05,660 --> 00:02:06,560 Okay. 30 00:02:06,670 --> 00:02:14,350 That's because our series was only five by five so it's length five with the value five I'm getting 31 00:02:14,350 --> 00:02:16,480 confused all these fives here. 32 00:02:16,630 --> 00:02:22,810 So this is just to say this Toyota white car 400 thousand dollars because we haven't fixed up that price 33 00:02:22,810 --> 00:02:24,780 column and it has five seats. 34 00:02:24,880 --> 00:02:32,830 But how would we fill this in and maybe we can use car sales with the seats column we saw this before 35 00:02:32,830 --> 00:02:39,220 in the previous video fill in a we want five. 36 00:02:39,220 --> 00:02:39,700 There we go. 37 00:02:40,080 --> 00:02:41,140 All we want in place. 38 00:02:41,150 --> 00:02:41,370 Yeah. 39 00:02:41,380 --> 00:02:44,770 So we don't have to implies equals true shift. 40 00:02:44,770 --> 00:02:47,040 Now we have a look at car sales. 41 00:02:47,500 --> 00:02:49,160 Mistake. 42 00:02:49,240 --> 00:02:49,870 There we go. 43 00:02:49,900 --> 00:02:51,550 All of our cars have five seats. 44 00:02:51,550 --> 00:02:52,840 Beautiful. 45 00:02:52,840 --> 00:02:54,200 We kind of already knew that already. 46 00:02:54,220 --> 00:02:56,450 Well maybe some didn't have five seats. 47 00:02:56,470 --> 00:03:01,120 Now there's another way to create a column and that's from a point in the list which is similar to our 48 00:03:01,120 --> 00:03:10,330 series column from list might type in Python at least so we won't maybe fuel economy that's something 49 00:03:10,330 --> 00:03:16,120 else you want to consider when you buy a car these days you want to make sure that you're getting the 50 00:03:16,120 --> 00:03:18,610 best value or the best bang for your buck. 51 00:03:18,610 --> 00:03:19,810 Did I spell economy. 52 00:03:19,830 --> 00:03:20,390 No. 53 00:03:20,410 --> 00:03:21,400 Mix them up. 54 00:03:21,430 --> 00:03:22,260 There we go. 55 00:03:22,790 --> 00:03:30,600 We want maybe seven point five litres per 100 kilometres some nine point two 5.0 nine point six. 56 00:03:30,600 --> 00:03:32,840 We're just making these numbers up here. 57 00:03:32,990 --> 00:03:33,940 There we go. 58 00:03:33,940 --> 00:03:35,880 It's about five long as well. 59 00:03:35,890 --> 00:03:42,390 We want car sales what should we call this column fuel per hundred came. 60 00:03:42,420 --> 00:03:44,300 That's nice and easy to remember. 61 00:03:44,470 --> 00:03:45,130 Equals. 62 00:03:45,150 --> 00:03:52,290 We're going to set it to being equal to our fuel economy list economy and then we're going to have a 63 00:03:52,290 --> 00:03:56,580 look at it here value era. 64 00:03:56,590 --> 00:04:04,810 What do we get wrong length of value does not match length of index Huh says an interesting era. 65 00:04:04,900 --> 00:04:07,360 So maybe we need more values in here. 66 00:04:07,390 --> 00:04:09,310 Let's keep filling it up. 67 00:04:09,820 --> 00:04:11,230 Eight point seven three. 68 00:04:11,260 --> 00:04:12,850 This is a really fuel efficient car. 69 00:04:12,850 --> 00:04:13,900 See what happens. 70 00:04:13,900 --> 00:04:14,800 Still not the same. 71 00:04:14,800 --> 00:04:15,430 What do we need. 72 00:04:15,430 --> 00:04:16,150 Zero. 73 00:04:16,360 --> 00:04:23,630 One two three four five six seven eight mumble should do four or five. 74 00:04:23,770 --> 00:04:24,460 There we go. 75 00:04:25,110 --> 00:04:32,600 So if you want to create it from a list did that on purpose wink wink it has to be the same length as 76 00:04:32,600 --> 00:04:39,920 your existing data frame whereas a series can be different kind of length to what your data frame already 77 00:04:39,920 --> 00:04:40,480 is. 78 00:04:40,490 --> 00:04:47,030 So we needed nine or 10 values in here because it starts from zero because our original car sales data 79 00:04:47,030 --> 00:04:50,900 frame was of length 10 all right. 80 00:04:50,900 --> 00:04:52,390 Let's have another look at the way. 81 00:04:52,550 --> 00:04:56,270 What if we wanted to create a column from another column. 82 00:04:56,270 --> 00:05:03,040 So as if they wanted to figure out how much should this car has fuel it's used in its lifetime. 83 00:05:03,590 --> 00:05:09,800 So if this is per 100 kilometers it's done a hundred and fifty thousand kilometers. 84 00:05:09,810 --> 00:05:16,950 Maybe if we divided the odometer column by one hundred and times it by this column we could work out 85 00:05:16,950 --> 00:05:21,970 how many litres of fuel this car is used in its whole entire life. 86 00:05:21,990 --> 00:05:23,770 Let's try that out. 87 00:05:23,790 --> 00:05:26,430 So if we go car sales door. 88 00:05:26,490 --> 00:05:27,090 What's a good name. 89 00:05:27,090 --> 00:05:32,040 Total fuel used beautiful equals car sales. 90 00:05:32,040 --> 00:05:39,920 We want the motor column car sales odometer maybe we press tab area tab auto complete. 91 00:05:40,370 --> 00:05:46,850 Now we want to divide this by a hundred because this is fuel per 100 kilometers and so we could times 92 00:05:46,850 --> 00:05:49,460 that by car sales. 93 00:05:49,650 --> 00:05:53,880 We want fuel her tab per tab. 94 00:05:53,880 --> 00:05:57,180 No I press tab to get fuel per 100 km. 95 00:05:57,260 --> 00:06:03,380 Now what do you think this will do this is gonna create a new column because this string doesn't exist 96 00:06:03,380 --> 00:06:08,300 in our data frame that's going to take the odometer column another beautiful thing about Pan is numeric 97 00:06:08,300 --> 00:06:14,960 columns is that you can perform operations directly on them and then it's going to divide the odometer 98 00:06:14,960 --> 00:06:23,350 column by one hundred and times it by the fuel per 100 km column let's see what happens we want to view 99 00:06:23,350 --> 00:06:31,500 our data from beautiful so this is gonna be total fuel used in liters or maybe we could have known that 100 00:06:31,500 --> 00:06:41,240 better if we create this new column by typing in liters l it's gonna create another column total fuel 101 00:06:41,240 --> 00:06:49,940 used l That's right we don't mind that for now so 11 thousand two hundred eighty three ladies if you 102 00:06:49,940 --> 00:06:56,150 did that and one guy that'd be quite expensive depending on the price of fuel where you live right now 103 00:06:56,570 --> 00:07:05,240 is one more easier way to create a column we can do it from create a column from a single value let's 104 00:07:05,240 --> 00:07:05,750 do that. 105 00:07:05,750 --> 00:07:08,370 Car sales maybe we want a simple one. 106 00:07:08,480 --> 00:07:15,810 Number of wheels equals 4 because if your car didn't have four wheels you really wouldn't be out of 107 00:07:15,810 --> 00:07:22,910 drive now and you can start to see how our data frame can evolve over time Ryan we have manipulated 108 00:07:22,950 --> 00:07:23,880 a few things. 109 00:07:23,880 --> 00:07:28,500 Maybe some of these columns aren't very useful in the long run but it's just an example of how you can 110 00:07:28,500 --> 00:07:30,960 quickly use panders to change up your data frame. 111 00:07:30,960 --> 00:07:35,670 We've adjusted the price column with lowered the make column we've added a bunch of our own columns 112 00:07:36,080 --> 00:07:40,250 is a step you'll take in a data size or machine learning project right. 113 00:07:40,290 --> 00:07:43,350 Maybe we'll add one more column. 114 00:07:43,560 --> 00:07:48,480 We want to know if our cars pass to road safety and by default it equals true. 115 00:07:48,480 --> 00:07:55,260 This is just to give another example of different types of columns D types. 116 00:07:55,350 --> 00:07:57,070 So yeah there we go. 117 00:07:57,240 --> 00:08:01,440 We've got a bull column on the end now so bull. 118 00:08:01,440 --> 00:08:02,360 True or false. 119 00:08:02,370 --> 00:08:06,630 We've got a few ints we've got a few floats we've got some object columns we've got true. 120 00:08:06,630 --> 00:08:13,250 So this is how versatile panders is right you can use many different data types within and now what 121 00:08:13,250 --> 00:08:17,670 if we wanted to remove one of our columns like this one because it's got the L at the top. 122 00:08:17,690 --> 00:08:19,370 That's less confusing than this. 123 00:08:19,370 --> 00:08:21,740 We don't know what measurement that's in. 124 00:08:21,770 --> 00:08:30,580 You can use the function drop it's going to drop a drop column so we'll go total if you will use. 125 00:08:30,580 --> 00:08:32,140 That's our column name. 126 00:08:32,140 --> 00:08:36,220 Now we need to give it drop requires the parameter axis. 127 00:08:36,220 --> 00:08:39,780 So axis equals one if you remember the anatomy of our data frame. 128 00:08:39,780 --> 00:08:41,440 Let's see if we have it here. 129 00:08:41,500 --> 00:08:45,420 That's because columns are on axis equals 1. 130 00:08:45,430 --> 00:08:47,680 Remember how I said some functions require access. 131 00:08:47,680 --> 00:08:52,420 It took me a while to remember this but that's what you have to remember is if you're talking about 132 00:08:52,420 --> 00:08:55,530 a column you're talking about access 1. 133 00:08:55,700 --> 00:08:57,990 Let's do that drop. 134 00:08:58,180 --> 00:09:00,410 Beautiful car sales. 135 00:09:00,970 --> 00:09:06,080 It's still got that we keep forgetting to re-emphasize this point. 136 00:09:06,100 --> 00:09:10,060 We need to either reassign or use in place method. 137 00:09:10,180 --> 00:09:16,660 So we're reassigning our car sales data frame here drop car sales. 138 00:09:16,660 --> 00:09:20,460 Now we've dropped in beautiful where that data frame. 139 00:09:20,530 --> 00:09:21,890 That's amazing. 140 00:09:21,910 --> 00:09:22,330 All right. 141 00:09:22,510 --> 00:09:25,450 So we've seen out to create a few different columns here. 142 00:09:25,450 --> 00:09:28,220 We've seen how to drop a column if we don't want it. 143 00:09:28,300 --> 00:09:31,350 There's a few more things we want to do for manipulating our data frames. 144 00:09:31,360 --> 00:09:35,220 But before this lecture gets too long again go back. 145 00:09:35,260 --> 00:09:38,130 I want you to if you're going to take a little break take a little break. 146 00:09:38,140 --> 00:09:42,080 But otherwise before the next election try creating a column of your own. 147 00:09:42,130 --> 00:09:46,750 Something can be as simple as you like but just practice typing out some code like this. 148 00:09:46,750 --> 00:09:52,390 Maybe you do a little operation can be as fancy or as UN fancy as you like but just practice making 149 00:09:52,380 --> 00:09:53,770 a new column of your own. 150 00:09:53,890 --> 00:09:56,110 And I'll see you back in the next video.