1 00:00:00,560 --> 00:00:03,980 In the last video we created this bad boy here. 2 00:00:04,010 --> 00:00:09,620 Now let's try and make something similar but using our own data so we can practice a little bit of data 3 00:00:09,620 --> 00:00:15,290 manipulation along with plotting and put some of our pandas in map plot lib skills together. 4 00:00:15,770 --> 00:00:21,680 So what we'll do we'll format the price column first we'll get it into integer format because right 5 00:00:21,680 --> 00:00:26,870 now we don't want those dollar signs and columns and commas we don't want those for when we're doing 6 00:00:26,870 --> 00:00:28,700 numerical operations. 7 00:00:28,700 --> 00:00:33,750 So to do that we'll go car sales and we'll change the price column. 8 00:00:33,770 --> 00:00:37,540 We need to access the string we saw this in the pandas section. 9 00:00:37,670 --> 00:00:39,380 String dot replace. 10 00:00:39,380 --> 00:00:43,140 And now we want to pass it a little bit of regex here. 11 00:00:43,220 --> 00:00:44,180 There we go. 12 00:00:44,180 --> 00:00:50,420 So essentially all this is saying is get the dollar sign the comma and the DOT which is here and here 13 00:00:50,750 --> 00:00:57,300 and replace it and we're going to replace it with an empty string. 14 00:00:57,380 --> 00:00:59,180 So we'll see what this does. 15 00:00:59,180 --> 00:01:06,760 Car sales we need to add car sales there so let's see what it looks like now. 16 00:01:06,780 --> 00:01:07,170 OK. 17 00:01:07,200 --> 00:01:07,680 Beautiful. 18 00:01:07,740 --> 00:01:10,950 But we've got two extra zeros on the end so we don't want that. 19 00:01:11,100 --> 00:01:16,060 So we want to go remove last two zeros. 20 00:01:16,080 --> 00:01:20,860 We've got car sales want to access the price column again. 21 00:01:20,860 --> 00:01:26,120 We're going to reassign it to car sales price. 22 00:01:26,270 --> 00:01:27,870 Again it's still a string. 23 00:01:28,280 --> 00:01:38,710 So if we have a look at this guy here type just to exemplify what we're doing with the DOT string it's 24 00:01:38,770 --> 00:01:40,300 a series. 25 00:01:40,310 --> 00:01:40,850 OK. 26 00:01:40,980 --> 00:01:41,620 But that's right. 27 00:01:41,620 --> 00:01:42,790 Well this is still a string. 28 00:01:42,910 --> 00:01:46,260 So that's we're accessing it with the DOT string method here. 29 00:01:46,300 --> 00:01:50,410 Maybe if we go 0 string there we go. 30 00:01:50,590 --> 00:01:54,430 So that's accessing this little one here and it's telling us that it's a string. 31 00:01:54,460 --> 00:01:58,540 So that's why we have to access it as a string first with DOT SDR. 32 00:01:59,140 --> 00:02:05,600 And now we want to remove the last two using slicing. 33 00:02:05,770 --> 00:02:09,570 Let's have a look at what car sales looks like here. 34 00:02:09,770 --> 00:02:10,620 Oh no. 35 00:02:10,790 --> 00:02:12,230 We've removed too many. 36 00:02:12,240 --> 00:02:13,690 So what we can do is rerun it. 37 00:02:13,700 --> 00:02:20,470 We're gonna have to re import our data frame to fix that little issue that's going to just reset it. 38 00:02:20,690 --> 00:02:26,180 Because remember Jupiter sales are running in sequential mode so if we run this sell more than once 39 00:02:26,420 --> 00:02:32,900 it's going to continually remove the last two digits of our price column as long as we keep running 40 00:02:32,900 --> 00:02:33,080 this. 41 00:02:33,080 --> 00:02:33,920 So we don't want that. 42 00:02:33,950 --> 00:02:42,590 So now our price column is in the format that we want we're going to add a sale date column which will 43 00:02:42,590 --> 00:02:54,790 be similar to the PD date range function that we used up here paid a date range so let's do that data 44 00:02:54,790 --> 00:02:55,180 range. 45 00:02:55,220 --> 00:02:56,580 1 1. 46 00:02:56,650 --> 00:03:00,910 We want to get past the date 2020 number of periods. 47 00:03:00,980 --> 00:03:04,490 So this is how many dates we want to add. 48 00:03:04,490 --> 00:03:06,790 We've got periods here. 49 00:03:06,890 --> 00:03:08,810 So that's a parameter. 50 00:03:08,810 --> 00:03:10,650 So that's going to add this one. 51 00:03:10,670 --> 00:03:11,870 We had a thousand periods. 52 00:03:11,870 --> 00:03:14,780 What we don't want that many for our car sales data frame. 53 00:03:14,870 --> 00:03:18,440 We actually just want date period for each row here. 54 00:03:18,620 --> 00:03:27,280 So we can do that by going periods Eagles Lane car sales and then we'll check our car sales data frame 55 00:03:27,280 --> 00:03:27,640 again. 56 00:03:28,760 --> 00:03:29,480 Beautiful. 57 00:03:29,480 --> 00:03:32,260 So now we've got a sale date here which is sequential. 58 00:03:32,300 --> 00:03:32,810 That's okay. 59 00:03:32,810 --> 00:03:37,970 Maybe they all weren't sold on sequential dates but we're just making an example of how we can start 60 00:03:37,970 --> 00:03:40,910 to manipulate our existing data frames. 61 00:03:41,000 --> 00:03:45,630 So the next thing what we want to do is we want to make a total column. 62 00:03:45,680 --> 00:03:49,970 So we want to start to add up how much we've made from selling all our cars. 63 00:03:49,970 --> 00:03:53,040 So if we added 4000 to 5000 to 7000. 64 00:03:53,120 --> 00:04:02,480 So this is where we can use something like Come some because it's going to add each of these prices 65 00:04:02,990 --> 00:04:07,100 cumulatively so let's make a total column. 66 00:04:07,210 --> 00:04:09,130 This should be car sales. 67 00:04:09,250 --> 00:04:14,170 Total sales and then what do we want we want the price column. 68 00:04:14,170 --> 00:04:14,700 That's right. 69 00:04:14,700 --> 00:04:17,500 Car sales rise. 70 00:04:17,800 --> 00:04:19,300 Dot com some. 71 00:04:19,300 --> 00:04:22,030 Let's see what happens here. 72 00:04:22,060 --> 00:04:27,350 Car sales Oh what have we done. 73 00:04:27,410 --> 00:04:28,600 It's worked. 74 00:04:28,820 --> 00:04:35,720 But because our price column is still in a string format it's just concatenated the strings to each 75 00:04:35,720 --> 00:04:36,380 other. 76 00:04:36,410 --> 00:04:37,280 So let's have a look. 77 00:04:37,280 --> 00:04:46,830 Car sales price zero daytime time Oh that's a bit boring. 78 00:04:47,130 --> 00:04:52,050 Let's get it back to exemplifying that it's a string. 79 00:04:52,050 --> 00:04:52,800 There we go. 80 00:04:52,800 --> 00:04:54,990 So our price column is still a string. 81 00:04:54,990 --> 00:05:03,120 So to fix this we have to change it first into an integer and then call the come some function so we 82 00:05:03,120 --> 00:05:09,440 can do dot as type int dot come some. 83 00:05:09,440 --> 00:05:17,400 So what this is going to do is go reassign the total sales column to be the price column as an integer 84 00:05:17,550 --> 00:05:22,230 as type int dot cumulative sum. 85 00:05:22,250 --> 00:05:29,230 This should work in theory but as always run the code then check Beautiful. 86 00:05:29,260 --> 00:05:29,710 OK. 87 00:05:29,740 --> 00:05:36,310 So this has gone total sales of thousand plus five thousand is nine thousand plus seven thousand is 88 00:05:36,310 --> 00:05:41,650 sixteen thousand and then it keeps going right up to the end so after we've sold all of our cars across 89 00:05:41,650 --> 00:05:46,570 these 10 days or so we have seventy six thousand four hundred and fifty. 90 00:05:46,570 --> 00:05:47,590 Beautiful. 91 00:05:47,650 --> 00:05:48,770 That's some good profit there. 92 00:05:48,810 --> 00:05:50,230 You can use that to fund. 93 00:05:50,470 --> 00:05:51,860 About a store. 94 00:05:51,910 --> 00:06:03,160 And now let's plot the total sales so we want car sales we want to see what our total sales looks like 95 00:06:03,190 --> 00:06:04,180 over time. 96 00:06:04,390 --> 00:06:09,480 So kind of replicating this graph here with our own data. 97 00:06:09,790 --> 00:06:20,200 Car sales dot plot x equals sale date and now this way you can cold up plot directly on a painter's 98 00:06:20,200 --> 00:06:26,890 data frame this is Panda's version of the map partly API or the wrapper Panda's wrapper for the map 99 00:06:26,890 --> 00:06:28,540 partly of API. 100 00:06:28,540 --> 00:06:29,370 So we're just going. 101 00:06:29,380 --> 00:06:35,410 Data frame dot plot and we're passing it out X which is the date and now y which is the total sales 102 00:06:35,410 --> 00:06:35,980 column. 103 00:06:36,450 --> 00:06:38,590 So let's see what this looks like. 104 00:06:38,650 --> 00:06:40,370 Total sales. 105 00:06:40,520 --> 00:06:46,060 They'll put a semicolon at the end so it doesn't print out beautiful. 106 00:06:46,090 --> 00:06:46,750 There we go. 107 00:06:46,750 --> 00:06:55,340 So this is just an example of how you can take some code write from the documentation or some example 108 00:06:55,340 --> 00:07:02,910 that you found online and we've replicated it up here but then we've taken the principles from it or 109 00:07:02,910 --> 00:07:11,240 some of them and put them together with our own data and then created this little visualization here. 110 00:07:11,250 --> 00:07:12,700 So now we can visualize. 111 00:07:12,750 --> 00:07:17,200 Okay so say we sold our 10 cars over the first 10 days of January. 112 00:07:17,280 --> 00:07:22,680 Now our total sales go on up into the riot which is very pleasing for our boss because that's the type 113 00:07:22,680 --> 00:07:29,020 of graph I want to see profits up into the right and so hopefully you starting to get an idea of the 114 00:07:29,020 --> 00:07:33,610 kind of workflow that you can go about when you're working on your own projects or working on problems 115 00:07:33,610 --> 00:07:35,910 you don't necessarily know the answer to. 116 00:07:35,920 --> 00:07:43,180 It's all about searching up trying to find some kind of example taking information from that and then 117 00:07:43,360 --> 00:07:46,850 seeing if it works with your own problem. 118 00:07:46,870 --> 00:07:48,780 Now we've seen a line plot. 119 00:07:48,790 --> 00:07:52,560 How about we try scatter plot so we've got car sales. 120 00:07:52,600 --> 00:08:00,140 Plot X. Let's do it with the odometer this time a domino column. 121 00:08:00,430 --> 00:08:02,520 K M beautiful. 122 00:08:02,590 --> 00:08:03,800 Is that the right column name. 123 00:08:03,950 --> 00:08:05,810 Yep we've worked with this data frame. 124 00:08:05,870 --> 00:08:14,250 Also kind of familiar with what's happening price and then when you're calling the plot function there's 125 00:08:14,260 --> 00:08:23,140 a little parameter here called kind which we can feed the type scatter to hopefully get a scatter plot 126 00:08:23,140 --> 00:08:23,460 here. 127 00:08:25,070 --> 00:08:32,120 What's happening scatter requires y column to be numeric is our price column not numeric. 128 00:08:32,390 --> 00:08:35,290 It's still not numeric because here we've called as type int. 129 00:08:35,300 --> 00:08:42,040 So this is only going to turn the price column into an integer for running this single line of code. 130 00:08:42,050 --> 00:08:46,090 It's not going to reassign the price column to be an integer. 131 00:08:46,100 --> 00:08:49,880 These are still strings in the car sales data frame. 132 00:08:49,880 --> 00:09:00,210 So what we can do to make this work we want to reassign the price column to be an integer price an Eagles 133 00:09:01,110 --> 00:09:08,300 sales price as tied in because our prices are always going to be integers. 134 00:09:08,310 --> 00:09:15,900 Money is a numerical value so it's okay to reassign this data type to be integer they should work. 135 00:09:15,930 --> 00:09:30,570 So we got reassign the price column to int plot scatter plot with price column as numeric. 136 00:09:30,570 --> 00:09:36,060 So without this line of code we'll comment that out we're going to get an error here. 137 00:09:36,060 --> 00:09:41,970 Value era requires y column to be numeric but what we're doing with this line here we uncle meant it. 138 00:09:43,310 --> 00:09:48,420 To comment and on comment code we can do command Slash. 139 00:09:48,470 --> 00:09:50,770 So I'm just pressing command slash here. 140 00:09:50,900 --> 00:09:53,360 We're going to reassign it as a numeric column. 141 00:09:53,360 --> 00:09:55,930 So if we do this beautiful. 142 00:09:56,030 --> 00:10:00,560 Now we have a scatter plot of our odometer column versus price. 143 00:10:00,560 --> 00:10:07,750 And remember to remove this little piece of code here you can put a semicolon right at the end Excellent. 144 00:10:07,800 --> 00:10:13,500 Now I'm kind of seeing a few examples here of how we can start to plot directly from a panda's data 145 00:10:13,500 --> 00:10:14,100 frame. 146 00:10:14,100 --> 00:10:18,450 We'll see a few more but we'll end this video before it gets too long. 147 00:10:18,450 --> 00:10:23,790 So in the meantime before the next one maybe you'll try plot a couple of these columns together whatever 148 00:10:23,790 --> 00:10:25,800 kind of plot you want to do. 149 00:10:25,920 --> 00:10:26,760 You've seen a few. 150 00:10:26,760 --> 00:10:32,610 A bar graph of scatter plot a line plot to have a practice of that and I'll see you in the next video.