1 00:00:00,230 --> 00:00:01,470 Well welcome back. 2 00:00:01,560 --> 00:00:06,470 It's the last video we left off saying that this plot probably isn't ideal. 3 00:00:06,480 --> 00:00:13,320 We've got a few subplots going on here and we've been using the PI plot method plotting directly from 4 00:00:13,320 --> 00:00:17,760 a panda's data frame rather than the object orientated method. 5 00:00:17,760 --> 00:00:19,950 And I'm sure you're wondering you're going. 6 00:00:19,950 --> 00:00:24,720 Daniel you've introduced so many different ways of plotting and you're telling us there's two ways of 7 00:00:24,720 --> 00:00:25,350 doing it. 8 00:00:25,500 --> 00:00:28,330 And you're saying one is less ideal than the other. 9 00:00:28,560 --> 00:00:30,050 Which one should I use. 10 00:00:30,090 --> 00:00:31,920 And that's a perfectly great question. 11 00:00:31,920 --> 00:00:45,140 So which one should you use pi plot this map plot lib o method now. 12 00:00:45,650 --> 00:00:48,920 There is no real definitive answer here. 13 00:00:48,920 --> 00:00:53,860 The documentation on that plot level will push you towards the O method always. 14 00:00:53,900 --> 00:01:04,610 However when plotting something quickly okay to use the PI plot method. 15 00:01:04,610 --> 00:01:04,850 Right. 16 00:01:04,850 --> 00:01:09,110 So what we've seen up here when you want to get a quick visualization maybe not something like this 17 00:01:09,140 --> 00:01:11,100 but definitely something like this. 18 00:01:11,120 --> 00:01:15,600 It's okay to use the PI plot method that we've gone through in the last couple of videos. 19 00:01:15,980 --> 00:01:25,230 But when plotting something more advanced use the O method. 20 00:01:25,550 --> 00:01:28,020 So we're gonna see an example of that in this video. 21 00:01:28,190 --> 00:01:31,540 Something a little bit more advanced than just something singular like this. 22 00:01:31,550 --> 00:01:35,510 But if you're just making a quick plot like this you just wanna see the distribution of a column it's 23 00:01:35,520 --> 00:01:38,040 perfectly okay to go dot plot. 24 00:01:38,120 --> 00:01:41,320 Now we'll see the O method in action. 25 00:01:41,330 --> 00:01:45,480 Let's create a subset of our data frame so we can do a little bit of data analysis. 26 00:01:45,500 --> 00:01:49,630 Let's say we wanted to explore the patients which are over the age of 50. 27 00:01:49,640 --> 00:01:52,120 We only wanted to look at those patients. 28 00:01:52,200 --> 00:01:56,900 We want to create a little bit of a plot and start to do some data analysis on those. 29 00:01:56,960 --> 00:02:03,410 So we'll create another data frame called over 50 which is equal to our heart disease data frame which 30 00:02:03,410 --> 00:02:05,450 will get a little refresher here. 31 00:02:05,450 --> 00:02:06,990 So we know what it looks like. 32 00:02:07,010 --> 00:02:11,640 This is a kind of workflow you'll be going through is looking at your data frame and going okay. 33 00:02:11,720 --> 00:02:16,820 My colleague or my teammates and my boss want to know what some data analysis of our patients which 34 00:02:16,820 --> 00:02:18,450 are over 50. 35 00:02:18,450 --> 00:02:24,080 So let's do that we can access those by using boolean indexing. 36 00:02:24,150 --> 00:02:25,460 So we want heart disease. 37 00:02:25,500 --> 00:02:31,020 Actually we want to use the age column which are greater than 50. 38 00:02:31,020 --> 00:02:32,120 Wonderful. 39 00:02:32,490 --> 00:02:37,420 And then we want to have a look it out over 50 dataset. 40 00:02:37,550 --> 00:02:39,620 What have we got wrong here. 41 00:02:39,710 --> 00:02:43,310 Age Kiara I've used a capital I. 42 00:02:43,370 --> 00:02:46,220 That's right wonderful. 43 00:02:46,750 --> 00:02:51,330 And we might check the length of this 209. 44 00:02:51,340 --> 00:02:52,730 So we have two hundred and eight. 45 00:02:52,730 --> 00:02:56,920 This is about 300 or so of the original dataset. 46 00:02:56,920 --> 00:02:57,800 So that's fine. 47 00:02:57,820 --> 00:02:58,840 Go ahead. 48 00:02:58,840 --> 00:02:59,500 Wonderful. 49 00:02:59,500 --> 00:03:01,980 And again we've dropped out these two rows. 50 00:03:01,990 --> 00:03:05,710 So 37 41 using our boolean indexing. 51 00:03:05,710 --> 00:03:08,430 So now it goes 0 three four five six. 52 00:03:08,500 --> 00:03:15,070 But the rest of the columns are still the same How about we start with a scatter plot of The Age and 53 00:03:15,070 --> 00:03:20,030 cholesterol but we want to color it with the flavor of the target column. 54 00:03:20,050 --> 00:03:21,220 Well that's a lot going on there. 55 00:03:21,220 --> 00:03:24,130 Let's see the code first and then we'll figure it out from there. 56 00:03:24,160 --> 00:03:35,930 Over 50 dot plot kind equals scatter and we want X equals age and we want Y equals. 57 00:03:35,930 --> 00:03:43,040 We want the cholesterol column which is this one here C H O L is short for cholesterol and then we launch 58 00:03:43,040 --> 00:03:48,450 to cover it using the c So c is short for color by the target column. 59 00:03:48,560 --> 00:03:50,930 So we won't plot kind scatter. 60 00:03:50,930 --> 00:04:00,080 Yes that makes sense x age y call or child cholesterol kind of a hard word to say see agile and the 61 00:04:00,110 --> 00:04:01,310 color is target. 62 00:04:01,310 --> 00:04:04,970 Let's see what happens all right. 63 00:04:05,240 --> 00:04:07,010 So we can kind of see what's going on here. 64 00:04:07,010 --> 00:04:08,770 We've got Charles on the Y. 65 00:04:08,780 --> 00:04:09,550 That makes sense. 66 00:04:09,590 --> 00:04:11,270 X doesn't really show us much. 67 00:04:11,270 --> 00:04:19,680 And C is giving us this Calabar to show us that because target has values of 0 1 4 if someone does have 68 00:04:19,680 --> 00:04:21,470 heart disease or they don't. 69 00:04:21,480 --> 00:04:23,330 Does this look okay to you. 70 00:04:23,670 --> 00:04:24,960 To me it doesn't look very good. 71 00:04:24,990 --> 00:04:26,560 It's doing what we want it to do. 72 00:04:26,610 --> 00:04:28,050 It's got different dots on here. 73 00:04:28,380 --> 00:04:35,550 If their target value of 0 or 1 but it doesn't show us anything on the x axis and these colors aren't 74 00:04:35,550 --> 00:04:36,290 really that great. 75 00:04:36,300 --> 00:04:37,880 So let's spruce things up a little. 76 00:04:38,190 --> 00:04:41,080 And how can we do that with our method. 77 00:04:41,100 --> 00:04:47,160 So this is the pipeline method a.k.a. plotting directly from the panda's data frame with plot. 78 00:04:47,190 --> 00:04:49,760 Now let's do o method. 79 00:04:49,830 --> 00:04:55,940 We're going to recreate this but using the O method so let's go fig act. 80 00:04:55,960 --> 00:04:57,320 We've seen this before. 81 00:04:57,320 --> 00:05:05,180 BLT dot subplots and we might go fig size just to have some practice adjusting the size of our fingers 82 00:05:05,630 --> 00:05:07,410 and we'll go 10 6. 83 00:05:07,410 --> 00:05:10,070 We want a width of 10 and a height of six. 84 00:05:10,100 --> 00:05:14,680 These are in inches by the way I believe so over 50 dot plot. 85 00:05:15,200 --> 00:05:17,500 This is gonna look very similar to what we've done up here. 86 00:05:17,930 --> 00:05:22,920 But there's one slight difference kind Eagles scatter. 87 00:05:23,270 --> 00:05:24,130 Wonderful. 88 00:05:24,140 --> 00:05:28,010 Now he could copy and paste but we're not going to do that because we're in the habit of writing out 89 00:05:28,010 --> 00:05:32,270 code Y equals charcoal or coal. 90 00:05:33,290 --> 00:05:39,940 See we use the target column as hell off and then the last part is the ax parameter. 91 00:05:40,040 --> 00:05:47,660 So when we're using the O method with panders the plot function gets this ax parameter tell us hey we're 92 00:05:47,660 --> 00:05:55,790 using the O method and we want you to plot this data on this axis the one we've just created. 93 00:05:55,880 --> 00:06:02,880 So let's say that an action ax would put a little semicolon at the end beautiful. 94 00:06:02,900 --> 00:06:07,300 So what we've done is we've recreated the same plot but it's already looking a little bit better. 95 00:06:07,370 --> 00:06:12,350 It's a bigger because we've put the fig sized parameter in there and now because we've passed the ax 96 00:06:12,350 --> 00:06:18,410 method we've now got this age column which has been labelled beautiful actually we might give this one 97 00:06:18,410 --> 00:06:28,780 a chance and say hey let's up this fig size and see if it does the same thing 10 6 we still don't have 98 00:06:28,780 --> 00:06:31,150 the age X label there. 99 00:06:31,270 --> 00:06:34,610 Well the 0 method is already pulling out ahead. 100 00:06:34,630 --> 00:06:40,050 Now what we might do is we set the X limit to be a bit wider so we can have more space on our data frame 101 00:06:40,060 --> 00:06:45,300 so let's have a look that looks like we're using the 0 0 method we can access the axis here. 102 00:06:45,320 --> 00:06:50,800 Remember the Axis is just this space that we're plotting data on if we go back to the figure of our 103 00:06:51,700 --> 00:06:55,080 anatomy of a map plot lib figure or plot axis. 104 00:06:55,090 --> 00:06:57,520 Is this part here where we're adding data. 105 00:06:57,640 --> 00:07:01,390 Whereas figure is the entire thing. 106 00:07:01,420 --> 00:07:16,490 So let's go back we want X dot set X Lim to be 45 100 can we figure out what this does before we even 107 00:07:16,490 --> 00:07:16,880 look at it. 108 00:07:17,330 --> 00:07:22,160 So set X Lim We could do shift tab but I want you to just have a think about it. 109 00:07:22,160 --> 00:07:26,940 So Ax set X Lim 45 100. 110 00:07:27,110 --> 00:07:27,700 So that's right. 111 00:07:27,710 --> 00:07:30,730 If it didn't get it if you press shift tab and read it you cheat a little. 112 00:07:30,730 --> 00:07:31,630 That's also fine. 113 00:07:31,640 --> 00:07:36,560 That's actually a good thing to do is start exploring what Chef Tab does but this is going to change 114 00:07:36,560 --> 00:07:43,990 the limitations of our x axis to be from 45 to 100 instead of what it is now which is about 50 to 80. 115 00:07:44,030 --> 00:07:50,710 So let's see what happens so see how we've got like a lot more whitespace over here now because we've 116 00:07:50,830 --> 00:07:56,410 upped the X limit but if we didn't want to do that we could comment that out and that looks a bit better. 117 00:07:56,500 --> 00:08:01,900 Just given an example of how you could quickly adjust the limitations of your accesses if they don't 118 00:08:01,900 --> 00:08:02,710 look too good. 119 00:08:02,780 --> 00:08:09,210 But lib is generally possibility good at creating axis limits that are suited to whatever data that 120 00:08:09,250 --> 00:08:10,330 you're using. 121 00:08:10,380 --> 00:08:14,340 Now this plot is okay but it definitely could look nicer. 122 00:08:14,620 --> 00:08:19,850 So this black and white is kind of hard to get this color bar is not really offering much at all. 123 00:08:19,930 --> 00:08:25,470 So let's take a little break there and in the next video we'll make this plot look a little bit nicer 124 00:08:25,720 --> 00:08:27,640 and we'll build off what we've already done here.