0 1 00:00:00,380 --> 00:00:07,380 To create a donut chart all we have to do is add a bit more code to this cell here. 1 2 00:00:07,380 --> 00:00:13,810 So I'm going to copy it, paste it below, so I've got our original pie chart that I'm going to leave as it is. 2 3 00:00:14,220 --> 00:00:18,410 And here I'm going to add my additional code. 3 4 00:00:18,750 --> 00:00:28,020 The code I'm going to add is drawing a circle onto our chart, a white circle that is. So I'll create a variable 4 5 00:00:28,020 --> 00:00:36,630 called "center_circle" which is going to hold on to my circle and use matplotlib to 5 6 00:00:36,630 --> 00:00:43,490 create a circle, "plt.Circle", with a capital C, open parentheses, 6 7 00:00:43,650 --> 00:00:47,250 and now we have to supply some arguments. 7 8 00:00:47,250 --> 00:00:52,240 Let me hit Shift+Tab on the Circle to take a look at the quick documentation. 8 9 00:00:52,590 --> 00:01:00,420 So our Circle requires one input, one input is mandatory to create it and it basically needs to know 9 10 00:01:00,630 --> 00:01:03,410 where to center the circle. 10 11 00:01:03,480 --> 00:01:08,410 It needs an x and the y coordinate in these parentheses like so. 11 12 00:01:08,430 --> 00:01:12,540 So we're gonna give it a tuple of these coordinates. 12 13 00:01:12,540 --> 00:01:13,840 I'm just gonna center it at 13 14 00:01:14,110 --> 00:01:17,420 0, 0 and that's it. 14 15 00:01:17,990 --> 00:01:20,820 Let's add the Circle to our figure. 15 16 00:01:20,840 --> 00:01:33,710 So I'll say "plt.gca().add_artist(center_circle)". 16 17 00:01:33,710 --> 00:01:40,660 Now let me hit Shift+Enter and what we get is something like this. 17 18 00:01:40,880 --> 00:01:42,440 What's going on? 18 19 00:01:42,440 --> 00:01:51,210 Well, our circle is probably too big, it's not sized correctly, so we'll have to supply a radius. 19 20 00:01:51,320 --> 00:01:53,810 I'm going to make this circle a lot smaller, 20 21 00:01:53,870 --> 00:01:58,800 make it 0.6 for the radius and hit Shift+Enter. 21 22 00:01:58,910 --> 00:02:00,160 Much better, right? 22 23 00:02:00,290 --> 00:02:04,130 We've got the circle now sitting in the middle of our pie chart. 23 24 00:02:04,130 --> 00:02:08,570 The only problem is is that this circle is blue and it looks looks very, very odd. 24 25 00:02:08,570 --> 00:02:10,520 So let's make it white. 25 26 00:02:10,520 --> 00:02:19,160 The way we can do that is by supplying this color as an argument to a parameter called "fc", "fc = 26 27 00:02:19,160 --> 00:02:20,720 'white' 27 28 00:02:20,930 --> 00:02:24,360 will set the color of the circle to white. 28 29 00:02:24,370 --> 00:02:27,590 Now we've got something that looks a bit more like this. 29 30 00:02:27,610 --> 00:02:30,540 This is starting to look a lot better, right? 30 31 00:02:30,580 --> 00:02:34,210 The only thing is our offset doesn't quite work anymore. 31 32 00:02:34,210 --> 00:02:37,420 It makes it look very, very strange. 32 33 00:02:37,510 --> 00:02:45,420 So I'm going to delete the explode argument here on our pie chart and refresh our graph. 33 34 00:02:45,430 --> 00:02:46,630 There we go. 34 35 00:02:46,630 --> 00:02:53,200 Not bad, but I do want to move these percent labels. Back up here on our pie chart, 35 36 00:02:53,200 --> 00:03:02,550 we can do that with an argument called "percent distance", "pctdistance", all in lower case and if I set 36 37 00:03:02,550 --> 00:03:11,280 that to 0.8, then my percent labels start moving out and they're more or less centered now 37 38 00:03:11,670 --> 00:03:14,890 in their segment of the donut chart. 38 39 00:03:14,910 --> 00:03:18,160 Now I'm fairly happy with this design. 39 40 00:03:18,210 --> 00:03:21,150 Let's review the code a little bit. 40 41 00:03:21,150 --> 00:03:25,170 We created a circle using matplotlib, so "plt. 41 42 00:03:25,200 --> 00:03:33,060 Circle()" is our equivalent of "pd.DataFrame()" and then we've supplied a couple of inputs. 42 43 00:03:33,060 --> 00:03:40,650 We've supplied a color, we've supplied a size of the circle and we've supplied where that circle should 43 44 00:03:40,650 --> 00:03:42,850 be drawn. At 0, 0 44 45 00:03:42,870 --> 00:03:49,980 it's completely centered. Then what we did is we had to add this circle to our figure. 45 46 00:03:50,700 --> 00:04:00,120 So with "plt.gca" we get the current axes, so we're getting the current axes and we're adding the circle 46 47 00:04:00,330 --> 00:04:03,980 with the "add_artist" method to the axes. 47 48 00:04:04,050 --> 00:04:09,910 So that adds the circle to our figure, which means it gets drawn on top of our pie chart. 48 49 00:04:10,620 --> 00:04:14,670 Okay, so we've almost mastered the donut chart. 49 50 00:04:14,730 --> 00:04:21,200 The only thing that I really want to show you is how to draw a donut chart with more than two categories, 50 51 00:04:21,420 --> 00:04:27,720 because oftentimes these donut charts really come into the fore when you have three, four or five categories, 51 52 00:04:28,080 --> 00:04:32,060 then they start looking really, really great in all sorts of reports. 52 53 00:04:32,130 --> 00:04:37,500 So just as an exercise, let's create a donut chart with four categories. 53 54 00:04:37,530 --> 00:04:45,520 Copy this cell, paste it below, so I'm keeping the original and working off this one and I'm going to add 54 55 00:04:45,670 --> 00:04:54,100 a few more category names, so say "Updates" and "Promotions". 55 56 00:04:54,100 --> 00:05:00,790 Now since we have four categories, I'm going to have to update my list of sizes as well. 56 57 00:05:00,790 --> 00:05:08,500 And I'm just gonna make up these numbers here for the amount of spam, legitimate emails, updates and promotions. 57 58 00:05:08,500 --> 00:05:19,250 So say we have 25 spam emails and I don't know 43 legitimate emails, 19 updates and 22 promotions. 58 59 00:05:19,480 --> 00:05:21,010 So we've got that. 59 60 00:05:21,010 --> 00:05:27,330 Now it's just a matter of picking the colors. We can keep the red and blue, but we have to add two more 60 61 00:05:27,330 --> 00:05:29,770 colors that fit well with this palette. 61 62 00:05:30,030 --> 00:05:37,500 Going back to our American Palette from Flat UI colors, I think I'm going to go for the light green and 62 63 00:05:37,500 --> 00:05:39,090 maybe the sour lemon here, 63 64 00:05:39,120 --> 00:05:40,060 the yellow. 64 65 00:05:40,060 --> 00:05:49,340 So I'm going to copy this, update our list here, making sure that the hex code of the colors and single 65 66 00:05:49,340 --> 00:05:56,360 quotes and I'm going to copy the sour lemon for the promotions. 66 67 00:05:56,380 --> 00:05:58,430 There we go. 67 68 00:05:58,430 --> 00:06:01,130 Now I've got my four categories. 68 69 00:06:01,210 --> 00:06:05,440 Let me hit Shift+Enter and see what this looks like at the moment. 69 70 00:06:05,440 --> 00:06:06,600 Not bad, right? 70 71 00:06:06,760 --> 00:06:11,540 But what if we wanted to have a small gap between each of these categories? 71 72 00:06:11,560 --> 00:06:19,290 What if we wanted to explode the donut chart the way we've exploded our pie chart? And the easiest 72 73 00:06:19,290 --> 00:06:27,690 way to do this is to create another list of values, say "offset = []" and then maybe 73 74 00:06:27,750 --> 00:06:34,730 0.05, 0.05, 0.05,... 74 75 00:06:34,770 --> 00:06:42,490 I'll make the offset equal between all of the, all of the categories, 0.05 as well. 75 76 00:06:42,880 --> 00:06:49,240 Then for the pie chart, we'll just say "explode = offset". 76 77 00:06:49,240 --> 00:06:54,180 Now let me hit Shift+Enter and look at our results. 77 78 00:06:54,180 --> 00:06:55,390 Brilliant. 78 79 00:06:55,650 --> 00:06:57,750 And that's really all there is to it. 79 80 00:06:57,810 --> 00:07:04,020 I think pie charts are, they're simple, they're clear, they're easy to code up, but you can also make them 80 81 00:07:04,020 --> 00:07:10,880 look very, very good with the right choice of colors and a little thought on the formatting. Having done 81 82 00:07:10,890 --> 00:07:14,490 some cute visualizations in the past lessons, 82 83 00:07:14,550 --> 00:07:21,090 it's now time to return to do some hardcore programming. In the next couple of lessons, 83 84 00:07:21,150 --> 00:07:24,810 we're going to be looking at natural language processing. 84 85 00:07:24,870 --> 00:07:30,510 We can't just feed raw e-mail bodies into our machine learning algorithm. 85 86 00:07:30,510 --> 00:07:36,240 We need to prepare these e-mails for our Bayes' classifier to work. 86 87 00:07:36,250 --> 00:07:41,580 I'd say this step probably falls a bit more into the data cleaning part of the project, but it will equip 87 88 00:07:41,610 --> 00:07:48,090 you with some very, very powerful tools and techniques to tackle all kinds of text data that you're gonna 88 89 00:07:48,090 --> 00:07:51,560 be encountering on your personal projects as well. 89 90 00:07:51,660 --> 00:07:57,300 Natural Language Processing is a huge topic and it's super lucrative and super interesting. 90 91 00:07:57,300 --> 00:08:00,580 So I hope you're as excited as I am for the next section. 91 92 00:08:00,660 --> 00:08:02,990 I'll see you in the next lessons. 92 93 00:08:03,000 --> 00:08:03,630 Take care.