1 00:00:00,510 --> 00:00:01,560 Welcome back. 2 00:00:01,560 --> 00:00:05,150 So in the past few videos we've looked at how to create plots like this. 3 00:00:05,160 --> 00:00:08,220 We've made some line figures some scatter plots. 4 00:00:08,220 --> 00:00:10,640 Some bar graphs and some histogram. 5 00:00:10,890 --> 00:00:15,900 These are more than enough to get started plotting some of the most common things you want to plot in 6 00:00:15,900 --> 00:00:18,330 your machine learning and data science career. 7 00:00:18,330 --> 00:00:22,770 But more often than not you're not going to be plotting directly from num pi arrays you're going to 8 00:00:22,770 --> 00:00:25,140 be plotting from Panda's data frames. 9 00:00:25,290 --> 00:00:30,500 And I guess you kind of are still plotting from num pi arrays because panders is built on no high res 10 00:00:30,720 --> 00:00:35,610 but that's what we're going to focus on in these next couple of videos is plotting from Panda's data 11 00:00:35,610 --> 00:00:36,480 frames. 12 00:00:36,510 --> 00:00:45,630 So let's do that and there's data frames beautiful it's going to make that into a little heading using 13 00:00:45,630 --> 00:00:50,730 escape and pressing em and now first thing's first going to import pandas because we need panders to 14 00:00:50,730 --> 00:00:51,740 over data frames. 15 00:00:51,750 --> 00:00:53,190 We've done this at the top of the notebook. 16 00:00:53,220 --> 00:00:56,140 But we're going to do it anyway because we like that. 17 00:00:56,170 --> 00:00:59,330 And now we want to make a data frame. 18 00:00:59,330 --> 00:01:05,850 Now we've got up in here we've got our trusty car sales CSP which we've seen before which is just like 19 00:01:05,850 --> 00:01:11,700 a spreadsheet that's been exported into a comma separated value format or dot CSP. 20 00:01:12,150 --> 00:01:18,400 So we've seen this in the pandas section all car sales equals PD not read CSB. 21 00:01:18,720 --> 00:01:21,740 I believe it's called car sales car sales. 22 00:01:21,840 --> 00:01:24,140 You can always tab auto complete here. 23 00:01:24,170 --> 00:01:24,900 There we go. 24 00:01:24,900 --> 00:01:25,680 That's helped us out. 25 00:01:25,710 --> 00:01:26,370 Thank you. 26 00:01:27,150 --> 00:01:27,860 Put a comma. 27 00:01:27,860 --> 00:01:28,730 There we go here. 28 00:01:28,740 --> 00:01:30,810 Car sales we'll check it out. 29 00:01:30,810 --> 00:01:31,530 Lovely. 30 00:01:31,560 --> 00:01:34,110 That's our trusty car sales data frame. 31 00:01:34,170 --> 00:01:41,820 Now before we plot anything from here let's practice looking up some documentation Panda's data frame 32 00:01:41,820 --> 00:01:47,730 visualization wonderful that already knows what I want to look up it's almost like I've prepared this 33 00:01:47,730 --> 00:01:51,990 earlier so we want to look at visualization and pandas. 34 00:01:51,990 --> 00:01:53,970 This is the pandas documentation. 35 00:01:53,970 --> 00:01:58,430 We use the standard convention for referencing the map plot API. 36 00:01:58,470 --> 00:01:59,400 Wonderful. 37 00:01:59,400 --> 00:02:02,570 So that means that pandas builds upon map Gottlieb. 38 00:02:02,760 --> 00:02:07,530 So what we want to do is rather than reading through all of this documentation because there is a lot 39 00:02:07,530 --> 00:02:10,620 of different plots hey there's a histogram. 40 00:02:10,620 --> 00:02:11,870 We've done some of these. 41 00:02:12,110 --> 00:02:13,590 Will go back up here. 42 00:02:13,590 --> 00:02:19,590 We're going to start off by just writing some code and we're gonna try and replicate this in some sort. 43 00:02:19,680 --> 00:02:23,130 So let's just copy this and I know a break in the number one rule. 44 00:02:23,220 --> 00:02:27,540 Don't copy code rewrite it but we're gonna rewrite it anyway. 45 00:02:27,600 --> 00:02:29,130 But I'm just putting it here. 46 00:02:29,190 --> 00:02:33,090 Create a new cell below escape B so we can copy it out. 47 00:02:33,390 --> 00:02:36,790 We don't have to keep going back and forth. 48 00:02:37,440 --> 00:02:42,300 You might see something in a stack overflow question or documentation and you might just copy it straight 49 00:02:42,300 --> 00:02:48,130 from here put it into your notebook and then go hey I need this part but I don't need this part and 50 00:02:48,130 --> 00:02:50,580 just write out the parts that you actually need. 51 00:02:50,640 --> 00:02:50,880 Right. 52 00:02:50,890 --> 00:02:56,020 But you never want to be just completely copying and PD type range. 53 00:02:56,080 --> 00:02:58,560 This is a bit all 2000. 54 00:02:58,570 --> 00:03:03,910 So we're going to upgrade it to where we are now so 1 1 2020. 55 00:03:03,910 --> 00:03:04,690 Wonderful. 56 00:03:04,690 --> 00:03:08,570 A bit more closer to where we are now actually periods equals a thousand. 57 00:03:08,570 --> 00:03:10,080 Now you might be going okay. 58 00:03:10,090 --> 00:03:11,310 What does this mean payday. 59 00:03:11,330 --> 00:03:12,550 Or date range. 60 00:03:12,550 --> 00:03:15,430 Well remember our motto if in doubt run the code. 61 00:03:15,850 --> 00:03:19,100 So let's see this to yes. 62 00:03:19,120 --> 00:03:19,450 All right. 63 00:03:19,690 --> 00:03:20,740 So what's happening here. 64 00:03:20,740 --> 00:03:25,330 Well we've got dates on the index yep that makes sense index equals paid out date range. 65 00:03:25,330 --> 00:03:30,520 It starts at this date first of the first 20 20 then goes up. 66 00:03:30,550 --> 00:03:31,800 So it's going up by days. 67 00:03:31,990 --> 00:03:32,600 OK. 68 00:03:32,650 --> 00:03:40,010 And these are just random numbers being filled here in this column going up what is come some. 69 00:03:40,030 --> 00:03:43,540 Well let's type that in T.S. dot come some. 70 00:03:43,720 --> 00:03:44,570 What can we do. 71 00:03:44,590 --> 00:03:50,490 Shift tab return came out of some over a data frame or series access cumulative sum. 72 00:03:50,500 --> 00:03:51,640 What does that mean. 73 00:03:51,640 --> 00:03:56,320 Well let's find out we run that. 74 00:03:56,510 --> 00:03:59,670 Yes actually let's keep running we don't want a negative. 75 00:03:59,660 --> 00:03:59,880 There 76 00:04:05,340 --> 00:04:06,420 what is happening here. 77 00:04:06,420 --> 00:04:08,580 This should be adding up. 78 00:04:08,760 --> 00:04:09,750 Maybe we go here. 79 00:04:10,470 --> 00:04:14,450 Yes this is interesting. 80 00:04:14,470 --> 00:04:20,370 This should be adding TSA goes that. 81 00:04:21,110 --> 00:04:26,030 I know what came out of some thousand it doesn't seem to be doing it here what it supposed to be doing 82 00:04:26,030 --> 00:04:33,230 is adding this one to this one and then adding that one this one and then adding this one to this one 83 00:04:37,060 --> 00:04:44,740 oh that's why TSA Eagles t is classic and you just trying to get ahead of himself and run the code. 84 00:04:44,770 --> 00:04:46,090 My own motto is let me damn. 85 00:04:46,240 --> 00:04:47,750 Let's see what happens. 86 00:04:47,830 --> 00:04:48,380 Okay. 87 00:04:49,200 --> 00:04:53,910 We've got something kind of similar to this but again remember it's always gonna be different because 88 00:04:53,910 --> 00:05:01,920 we're using random not random Well this is just demonstrating that when we call plot from a PD dot series 89 00:05:02,040 --> 00:05:08,340 we haven't done it from a data frame yet it comes out as map plot lib dot axes dot subplots. 90 00:05:08,340 --> 00:05:10,910 So that's what we've seen before with this. 91 00:05:10,920 --> 00:05:16,100 So really it's straight from a panda series they might delete this cell because we don't need that anymore. 92 00:05:16,590 --> 00:05:20,800 But it's using map pot lib that's really handy to know. 93 00:05:20,830 --> 00:05:21,270 All right. 94 00:05:21,480 --> 00:05:25,310 Now we've seen plotting straight from a series let's plot from our data frame. 95 00:05:25,470 --> 00:05:28,380 We've got car sales it's our data frame we'll view it again. 96 00:05:28,380 --> 00:05:35,260 How about we make something similar to this but using our car sales data. 97 00:05:35,430 --> 00:05:42,230 Yeah maybe we get some dates and the sale price over time we might add a little column to our data frame. 98 00:05:42,240 --> 00:05:43,860 Yeah that sounds like a good project. 99 00:05:43,920 --> 00:05:47,280 But before this video gets too long we're going to do that in the next one. 100 00:05:47,310 --> 00:05:52,830 So next video I'll see you back then I try replicate something like this but using our data. 101 00:05:52,860 --> 00:05:57,270 So we'll get a mixture of manipulating data and plotting at the same time. 102 00:05:57,330 --> 00:05:58,040 That'll be good fun.