1 00:00:00,480 --> 00:00:00,890 All right. 2 00:00:00,900 --> 00:00:04,890 So here we're going to be visualizing our results. 3 00:00:04,890 --> 00:00:14,280 So I'll add a quick markdown sale here visualizing the results and in the next few sales we're gonna 4 00:00:14,280 --> 00:00:18,500 be creating some charts scrolling to the top of our notebook. 5 00:00:18,600 --> 00:00:24,880 We should have already added Our input statements for Matt Gottlieb and Seabourn hand. 6 00:00:24,900 --> 00:00:26,960 We should have also added this percent. 7 00:00:27,040 --> 00:00:35,000 Matt plotted in line the charts I want to create are the joint conditional probability that an email 8 00:00:35,000 --> 00:00:41,990 is spam vs. the join conditional probability that an email is legitimate. 9 00:00:42,140 --> 00:00:48,570 So in this first cell here that's some chart styling info. 10 00:00:49,130 --> 00:00:55,910 And by that I mean things like the y axis label the x axis label stuff like that. 11 00:00:55,910 --> 00:01:09,240 So y axis underscore label will be equal to single quotes P parentheses x pipe spam and the x axis on 12 00:01:09,250 --> 00:01:20,650 a score label will be equal to single quotes p x pipe non spam the first chart we're gonna create is 13 00:01:20,650 --> 00:01:27,760 gonna be not very fancy it's gonna be your standard map popular chant and it's gonna look like this 14 00:01:27,760 --> 00:01:36,880 with peal T dot figure I'm going to set the size of my chart first so I'll see fake size is equal to 15 00:01:37,690 --> 00:01:40,910 parentheses eleven by seven. 16 00:01:41,380 --> 00:01:51,160 This will give me a nice size for the screen cast BLT dot X label parentheses it's gonna be equal to 17 00:01:51,160 --> 00:01:53,310 that string that we have at a cell above. 18 00:01:53,410 --> 00:02:06,600 So x axis underscore label comma font size is equal to 14 and peel T dot y label. 19 00:02:06,700 --> 00:02:16,290 It's gonna be you guessed it y axis underscore label comma font size equals fourteen. 20 00:02:16,330 --> 00:02:24,070 Now let me actually plot this chart so I mean he's a scatter plot here peel T dot scatter parenthesis 21 00:02:25,270 --> 00:02:29,090 joint at a school log on a school ham. 22 00:02:29,230 --> 00:02:37,570 So this is going on our x axis comma joint and a score log on a score spam is gonna go on our y axis 23 00:02:38,860 --> 00:02:48,180 and peel teed up show will generate our chart we'll see what it looks like what it looks like. 24 00:02:48,320 --> 00:02:59,930 This pretty wild huh let's zoom in a little bit on this area here and set the limits of the chart. 25 00:03:00,420 --> 00:03:11,080 Also I'm going to change the color of my dots here to navy blue coming back up I want to set the scale 26 00:03:11,290 --> 00:03:21,130 or the limit of the axis with PDT dot excellent parentheses square brackets minus fourteen thousand 27 00:03:22,390 --> 00:03:40,540 two one and PDT dot why Lim is gonna go from also minus fourteen thousand to one to change the colour 28 00:03:40,810 --> 00:03:47,730 of the dots all I need to do is apply the color argument and provide a color name here. 29 00:03:48,730 --> 00:03:53,710 And one example of a legitimate name is Navy for navy blue. 30 00:03:53,770 --> 00:04:00,250 If I refresh the cell then you can see that our chart now looks like this. 31 00:04:00,250 --> 00:04:04,920 Not bad right but I think we can make some improvements. 32 00:04:04,990 --> 00:04:09,440 Let's have a think about how our naive bayes classifier made his predictions. 33 00:04:11,090 --> 00:04:20,120 Scrolling back up to this very crucial line of code here we can see that our algorithm decided on which 34 00:04:20,120 --> 00:04:25,610 category an e-mail belonged to simply based on which probability was higher. 35 00:04:27,050 --> 00:04:34,130 And this allows us to plot something on our chart called a decision boundary that decision boundary 36 00:04:34,370 --> 00:04:39,730 is going to divide our chart into two regions one region. 37 00:04:39,750 --> 00:04:46,630 It's gonna be for spam and the other region is gonna be for our legitimate emails now. 38 00:04:46,760 --> 00:04:53,080 Any guess of how our decision boundary will look like on our chart. 39 00:04:53,120 --> 00:04:55,170 Well let's think about it. 40 00:04:55,490 --> 00:05:01,520 If the probabilities for one of these e-mails is calculated and it comes in in the top left corner should 41 00:05:01,520 --> 00:05:09,140 be classified as spam whereas non spam well in this case it has a very high probability that the email 42 00:05:09,140 --> 00:05:14,120 is spam and a very low probability that the email is non spam right. 43 00:05:14,120 --> 00:05:20,860 Therefore according to our logic and our decision rule it should be classified as a spam email. 44 00:05:20,870 --> 00:05:24,270 What about the lower left and the lower left. 45 00:05:24,780 --> 00:05:27,930 We have the opposite case here. 46 00:05:27,960 --> 00:05:32,360 An email is much more likely to be a real email than a spam email. 47 00:05:32,370 --> 00:05:35,920 Such be classified as non spam. 48 00:05:36,090 --> 00:05:45,270 So by that logic if we were to draw our decision boundary a line to separate these two regions then 49 00:05:45,360 --> 00:05:48,990 it would go diagonally up through the middle of our chart right. 50 00:05:50,260 --> 00:05:56,890 This is gonna be our decision boundary and this is how we can visualize the logic that our classifier 51 00:05:56,890 --> 00:06:04,330 uses to separate what it believes to be a spam message from a non spam message. 52 00:06:04,390 --> 00:06:12,760 So let me add a very quick section heading out it reads the decision boundary and what I'm going to 53 00:06:12,760 --> 00:06:25,180 do is I'm going to copy this cell here and I'm going to paste it below and then what I'll do is I'm 54 00:06:25,180 --> 00:06:33,790 going to create some data for a line and I'm going to do that up here actually to plot a line on a map 55 00:06:33,910 --> 00:06:34,680 flip chart. 56 00:06:34,690 --> 00:06:42,430 You need some data points and I'm just gonna call these line data and when to generate them using num 57 00:06:42,430 --> 00:06:43,280 pi. 58 00:06:43,300 --> 00:06:51,940 So with NDP dot Lin space parentheses I can generate these data points and I'm going to set a starting 59 00:06:52,060 --> 00:06:55,320 and an ending point for these data points. 60 00:06:55,360 --> 00:07:04,710 The start is gonna be equal to negative fourteen thousand which is what we're using for our axes right. 61 00:07:04,840 --> 00:07:11,980 And I'm going to stop at say 1 and then I'll decide on how many data points I actually want to generate 62 00:07:12,580 --> 00:07:15,240 I'll generate say a 1000 of them. 63 00:07:15,580 --> 00:07:18,950 I think they'll be good enough. 64 00:07:18,970 --> 00:07:26,220 Let me add the and u m equals 1000 to make this a named argument. 65 00:07:26,320 --> 00:07:35,530 Now let's shift enter on the cell and I can come down here and modify the second chart all I got to 66 00:07:35,540 --> 00:07:46,610 do is use PDT dot plot and supply an x value so it's going to plot all the X's under line data and it's 67 00:07:46,610 --> 00:07:54,590 going to plot all the y values under line data as well so our x's and our Y's are going to be equal. 68 00:07:54,590 --> 00:08:00,080 This will give us a diagonal line and to contrast that with the navy blue. 69 00:08:00,140 --> 00:08:06,330 I'm going to use the orange color colors equal to orange. 70 00:08:06,470 --> 00:08:09,710 Let's see what this looks like. 71 00:08:09,710 --> 00:08:10,440 There we go. 72 00:08:11,930 --> 00:08:17,630 Okay so we've plotted our probabilities now but this chart is still not very clear. 73 00:08:18,920 --> 00:08:19,850 Let me ask you this. 74 00:08:19,970 --> 00:08:22,180 What kind of problems do you spot with this chart. 75 00:08:24,130 --> 00:08:31,600 Well for starters there is a lot of data in this top right hand corner and we can't actually see what's 76 00:08:31,600 --> 00:08:33,640 going on. 77 00:08:33,670 --> 00:08:36,520 The second problem is looking at this chart. 78 00:08:36,640 --> 00:08:45,390 We cannot identify which emails were classified correctly and which e-mails were classified incorrectly. 79 00:08:45,400 --> 00:08:51,080 All we see is our data plotted here and our decision boundary in the middle. 80 00:08:52,270 --> 00:08:56,800 So first things first let's tackle that first problem I described. 81 00:08:56,860 --> 00:09:04,090 This was the problem of over plotting when we have many many data points and we put them all together 82 00:09:04,090 --> 00:09:05,310 on a chart like this. 83 00:09:05,320 --> 00:09:11,020 They're so close together that we're faced with this over plotting problem we can't actually visually 84 00:09:11,020 --> 00:09:16,140 distinguish the relationship on the chart with so much data. 85 00:09:17,380 --> 00:09:22,800 If you wanted to make this chart a bit more clear how do you think we can improve this visualization. 86 00:09:22,870 --> 00:09:29,530 Hit pause on this video and maybe try and track down two things that we can do to make this chart a 87 00:09:29,530 --> 00:09:30,830 bit more clear. 88 00:09:30,970 --> 00:09:35,730 I'll give it a few seconds to do just that. 89 00:09:35,800 --> 00:09:36,370 All right. 90 00:09:36,430 --> 00:09:42,070 So first we're going to use some techniques that we've already introduced in the past listen. 91 00:09:42,100 --> 00:09:46,180 We're going to introduce some transparency to our data points. 92 00:09:46,180 --> 00:09:50,890 And we're also going to reduce the DOT size on this chart. 93 00:09:51,220 --> 00:09:56,350 And this is actually really really easy to do because all we need to do is add a few more keyword arguments 94 00:09:56,350 --> 00:09:57,010 here. 95 00:09:57,100 --> 00:09:59,710 The transparency we can set with alpha. 96 00:10:00,520 --> 00:10:08,010 So if I set the alpha value to zero point five and refresh myself then I can see that the chart now 97 00:10:08,020 --> 00:10:15,760 looks like this and we can better appreciate some of the density in certain areas to make the dot size 98 00:10:15,760 --> 00:10:16,230 smaller. 99 00:10:16,230 --> 00:10:24,010 On the other hand all we need to do is specify a value for this argument called S for size. 100 00:10:24,190 --> 00:10:31,930 And if I said s is equal to say 25 then I get something like this I get smaller dots and I've improved 101 00:10:31,930 --> 00:10:39,730 my situation somewhat but you know there's still a bit of a problem here in this top right corner. 102 00:10:39,880 --> 00:10:42,100 We've got a lot of data here. 103 00:10:42,160 --> 00:10:46,550 And even though we've made a picture a little bit more clear we're not quite there yet. 104 00:10:47,920 --> 00:10:55,230 So let's do something else let's have two charts side by side instead of a single chart most of the 105 00:10:55,230 --> 00:10:57,470 action seems to be happening up here. 106 00:10:57,840 --> 00:11:05,490 So you can create a little zoomed in version of that area and place it next to our chart showing the 107 00:11:05,490 --> 00:11:13,490 whole thing to do that in my plot lib we're going to use something called a subplot so I'll take the 108 00:11:13,490 --> 00:11:21,520 cell copy and paste it below and now I'll modified of it. 109 00:11:21,620 --> 00:11:27,080 So for my figure's size I want to change it to say 16 and seven. 110 00:11:27,110 --> 00:11:32,570 So when I go a bit wider here because I want to have two plots and then I'm going to create my first 111 00:11:32,660 --> 00:11:33,530 subplot. 112 00:11:33,530 --> 00:11:43,510 So with BLT dot subplot parentheses and I'll supply three values the number of rows in my subplot there'll 113 00:11:43,520 --> 00:11:46,170 be one row the number of columns. 114 00:11:46,190 --> 00:11:47,200 So I want two charts. 115 00:11:47,210 --> 00:11:48,040 Next to each other. 116 00:11:48,080 --> 00:11:52,890 So I'll have a two there and an index for the first chart. 117 00:11:52,940 --> 00:11:56,180 So this will be index 1. 118 00:11:56,660 --> 00:12:05,550 If I copy this line and come down here and paste it in then I'll use INDEX TO HERE. 119 00:12:05,970 --> 00:12:11,360 SO THIS WILL BE chart number two. 120 00:12:11,370 --> 00:12:14,630 That's going to go here up here. 121 00:12:14,700 --> 00:12:18,540 We're going to be working with chart number 1. 122 00:12:20,160 --> 00:12:21,210 So tell you what. 123 00:12:21,210 --> 00:12:29,520 Let's keep chart number one as it is when I want to do is I'm going to copy these lines of code come 124 00:12:29,520 --> 00:12:30,960 down here. 125 00:12:30,960 --> 00:12:36,890 Paste them in and then to set the scale for that second chart. 126 00:12:36,890 --> 00:12:42,920 I'm not going to go between negative fourteen thousand one I'm going to go between minus two thousand 127 00:12:43,010 --> 00:12:51,730 and one for both my x axis and my y axis and the other thing I'm going to do on my second chart is I'm 128 00:12:51,730 --> 00:12:53,130 going to change the size of my dots. 129 00:12:53,140 --> 00:12:57,370 I want to make them even smaller so as equals 25. 130 00:12:57,490 --> 00:12:58,600 Probably too big. 131 00:12:58,750 --> 00:13:04,090 So I'm going to go with a different number here a much smaller one say three. 132 00:13:04,610 --> 00:13:09,500 And now let me hit shift enter and see what this looks like. 133 00:13:09,500 --> 00:13:10,040 There we go. 134 00:13:11,470 --> 00:13:18,200 Now I've got two subplots I've got them side by side and I've got this zoomed in version here where 135 00:13:18,200 --> 00:13:26,310 I've got many many small dots to represent each data point So yeah these are some very very easy and 136 00:13:26,310 --> 00:13:32,220 very handy techniques to apply to try and tackle this over plotting problem but let's move on. 137 00:13:32,250 --> 00:13:40,110 Let's try and make our charts even better and to do that we're going to use Seabourn instead of matte 138 00:13:40,110 --> 00:13:42,450 plot lib to create our charts. 139 00:13:42,900 --> 00:13:47,850 One of the reasons I quite like using Seabourn is because it makes creating beautiful charts very very 140 00:13:47,850 --> 00:13:48,260 simple. 141 00:13:48,270 --> 00:13:50,290 There's not a lot of code you have to write. 142 00:13:50,610 --> 00:13:55,830 And because we want to do a better job with our data visualization see what makes it very very easy 143 00:13:55,860 --> 00:14:03,460 to add colour and make our charts a bit more clear so let me come down here and I want to use a cell 144 00:14:03,460 --> 00:14:12,190 here to add some information about chant styling and the first thing I'll do is I'll set the style of 145 00:14:12,190 --> 00:14:13,400 our seaborne charts. 146 00:14:13,450 --> 00:14:23,920 So I'll see S.A. dot set up a school style single quotes white grid and the other thing I'll do is I'll 147 00:14:23,980 --> 00:14:31,810 create a variable called labels which will hold onto a string called actual category. 148 00:14:31,810 --> 00:14:37,090 The thing about some of these seaborne charts is that they really like working with data frames so we 149 00:14:37,090 --> 00:14:44,500 can create one here call it summary and a score def that will hold on to all our data that we're interested 150 00:14:44,500 --> 00:14:45,310 in plotting. 151 00:14:45,310 --> 00:14:59,710 So PD dot data frame capital the capital F parenthesis curly brace y axis label colon joint underscore 152 00:14:59,800 --> 00:15:12,370 log on to scroll spam comma x axis label colon joint on a squat log on a school hem you can see here 153 00:15:12,430 --> 00:15:20,400 what I'm doing is I'm creating a dictionary and this dictionary is what's going to provide the columns 154 00:15:20,460 --> 00:15:29,700 for our data frame y axis label it's gonna be the column name of all our spam probabilities x axis label 155 00:15:29,790 --> 00:15:39,240 it's gonna be the column name for our ham probabilities and labels it's gonna be the column name for 156 00:15:39,450 --> 00:15:48,030 y on a school test so it's gonna be the column name for the true values the actual values in the testing 157 00:15:48,060 --> 00:15:58,590 dataset now we're ready to go let's create our first seaborne chart to do that we're going to use C 158 00:15:58,590 --> 00:16:10,890 bonds l m plot so S.A. stopped and then plot parenthesis X equals X axis label. 159 00:16:11,250 --> 00:16:27,060 So this is all the x values in our data frame Y is equal to y axis label common data is equal to summary 160 00:16:27,420 --> 00:16:30,020 underscore D F. 161 00:16:30,240 --> 00:16:31,170 So there you have it. 162 00:16:31,320 --> 00:16:38,460 We're using our data frame and we're specifying which columns of our data frame make up the y values 163 00:16:38,820 --> 00:16:41,340 and the x values. 164 00:16:41,340 --> 00:16:44,030 Now let me specify the size of the chart that I want. 165 00:16:44,050 --> 00:16:48,240 Size is equal to say six point five. 166 00:16:48,240 --> 00:16:52,920 Then I'll just say as an S start PDT dot show. 167 00:16:53,130 --> 00:16:55,000 Let's see what we get. 168 00:16:55,000 --> 00:16:57,310 So we get something like this. 169 00:16:57,450 --> 00:17:01,310 What you see here is a regression right. 170 00:17:01,350 --> 00:17:07,930 We've got a regression and we've got a warning here so let's modify our arguments here. 171 00:17:08,090 --> 00:17:12,040 And the first thing I'll do is I'll get rid of the regression line. 172 00:17:12,290 --> 00:17:24,650 So say fit and school R E G is equal to False and that will remove our regression line and also removes 173 00:17:24,650 --> 00:17:26,570 this warning. 174 00:17:26,570 --> 00:17:32,110 So this is how our chart looks like without the regression line using l m plot. 175 00:17:32,120 --> 00:17:36,550 Now I know what you're thinking you're thinking man this looks a lot worse than my plot led. 176 00:17:36,740 --> 00:17:38,320 Why are we doing this. 177 00:17:38,330 --> 00:17:40,190 Well you'll see in a minute. 178 00:17:40,280 --> 00:17:50,110 So um let's add some keyword arguments for our scatter plot and we can do this by providing a dictionary. 179 00:17:50,240 --> 00:17:57,980 So with the scatter underscore K W S is equal to curly braces. 180 00:17:58,190 --> 00:18:12,590 Single quotes Alpha colon zero point five comma s and single quotes colon twenty five. 181 00:18:12,880 --> 00:18:19,660 You'll recognize these because these argument names are what we've used with map plot lib right. 182 00:18:19,780 --> 00:18:24,260 And uh similarly we can also scale this chart differently right. 183 00:18:24,400 --> 00:18:35,530 With peel T dot X limb parentheses square brackets C minus two thousand and one and peel T dot while 184 00:18:35,530 --> 00:18:43,330 in parentheses square brackets minus two thousand coma 1. 185 00:18:44,040 --> 00:18:45,850 Let's take a look now. 186 00:18:46,030 --> 00:18:48,620 I've actually got an error here a syntax error. 187 00:18:48,810 --> 00:18:55,740 You might already be spotting it but uh I came down here on a new line without inserting a comma to 188 00:18:55,740 --> 00:18:57,380 separate my arguments. 189 00:18:57,420 --> 00:19:04,990 So adding the comma and refreshing the cell allows me to generate my chart as I intended. 190 00:19:05,000 --> 00:19:06,530 There we go. 191 00:19:06,550 --> 00:19:12,040 So this is looking a little better but I think it's still missing our decision boundary. 192 00:19:12,040 --> 00:19:16,130 So let's add that as well so I'll add this line here. 193 00:19:16,240 --> 00:19:28,510 After X limb and while I'm all that PDT dot plot parentheses line data comma line data comma color equal 194 00:19:28,510 --> 00:19:34,750 to single quotes black I think will look nicely. 195 00:19:34,760 --> 00:19:35,170 Yeah. 196 00:19:36,620 --> 00:19:39,700 This makes our chart look very clear. 197 00:19:40,040 --> 00:19:47,210 So right about now we've kind of caught up with that plot live in terms of looks but has promised we 198 00:19:47,300 --> 00:19:49,480 want to add some color to this chart right. 199 00:19:50,460 --> 00:19:58,660 The way we're going to color this chart is using a keyword argument called Hugh so I'll just copy the 200 00:19:58,660 --> 00:20:00,460 same pasted below. 201 00:20:00,460 --> 00:20:03,750 So we have the comparison between the two. 202 00:20:04,180 --> 00:20:06,960 And I'll add this Hugh argument here. 203 00:20:07,900 --> 00:20:15,100 And for the Hugh we're going to specify the true values in our summary unescorted. 204 00:20:15,110 --> 00:20:20,960 Yes the true values of our data in our data frame are y underscore. 205 00:20:20,980 --> 00:20:21,790 Test values. 206 00:20:21,790 --> 00:20:22,800 That is. 207 00:20:22,960 --> 00:20:32,620 So that's gonna be Hugh is equal to labels and heading shift into this is what we get we get some green 208 00:20:32,620 --> 00:20:40,180 values up here we get some blue values done here and we even get a little legend here which has the 209 00:20:40,210 --> 00:20:48,850 column name of our data from actual space category and 0 the blue ones are the non spam messages and 210 00:20:48,850 --> 00:20:51,620 one are the spam messages. 211 00:20:51,700 --> 00:20:57,220 The nice thing about this alleged plot function is that we can actually change these markers as well. 212 00:20:57,220 --> 00:21:03,150 So you know not only can we get two different colors we can actually get different markers. 213 00:21:03,460 --> 00:21:09,430 If you were to head to the mad plot lib documentation you can see that you have a whole menu of markers 214 00:21:09,430 --> 00:21:10,390 to choose from. 215 00:21:10,420 --> 00:21:17,350 So if we want to go with circles or triangles or squares or whatever stars you've got a whole lot of 216 00:21:17,350 --> 00:21:18,360 options here. 217 00:21:19,200 --> 00:21:20,980 So I have to set some markers as well. 218 00:21:21,130 --> 00:21:24,250 So we've got color and shape as a differentiator. 219 00:21:24,250 --> 00:21:35,860 And for the markers I'll go for Marcus is equal to square brackets single quotes 0 comma single quotes 220 00:21:36,400 --> 00:21:45,480 X so now my spam messages will be marked with all X's little crosses and my non spam messages will be 221 00:21:45,480 --> 00:21:48,200 marked with little circles like this. 222 00:21:48,250 --> 00:21:53,760 Now there's actually two things I still don't quite like about this chart. 223 00:21:53,980 --> 00:21:59,300 One thing is that I'd like to have a little bit more control over my legend here on the right. 224 00:21:59,320 --> 00:22:01,900 I'd like to customize the legend for this chart. 225 00:22:03,140 --> 00:22:09,290 The other thing that bugs me are actually the default colors that Seabourn has chosen for me. 226 00:22:09,560 --> 00:22:15,800 At the time of recording the colors that were chosen here are blue and green. 227 00:22:15,800 --> 00:22:22,460 Now that might not seem like a big issue but it could be a problem for approximately one and 12 men 228 00:22:22,550 --> 00:22:25,120 or one in 200 women. 229 00:22:25,130 --> 00:22:31,740 And the reason is is that certain color combinations actually not ideal for people with color blindness 230 00:22:32,330 --> 00:22:37,670 and you can take this into account when you're creating your own graphics or your own charts for projects 231 00:22:37,670 --> 00:22:41,450 that you have to show people or that you want to make look beautiful. 232 00:22:41,450 --> 00:22:48,740 And one rule of thumb that I follow is I tend to avoid the red green combo the green blue combo and 233 00:22:48,770 --> 00:22:50,540 the green brown combo. 234 00:22:50,720 --> 00:22:57,200 If I if I remember because these are some of the common combinations of colors to avoid in your charts 235 00:22:57,290 --> 00:22:58,560 if possible. 236 00:22:59,120 --> 00:23:07,010 So if on the slide you have trouble seeing this like w the 42 and the 71 I've uh I've got bad news for 237 00:23:07,010 --> 00:23:09,280 you you might actually be colorblind. 238 00:23:09,500 --> 00:23:14,450 So I hope you're not finding this out on the course but you know you're not you're you're not alone 239 00:23:14,540 --> 00:23:18,750 and very very famous and influential people are also colorblind. 240 00:23:18,770 --> 00:23:24,890 So you know if you're creating your charts and you're presenting to the likes of Keanu Reeves or Mark 241 00:23:24,890 --> 00:23:29,720 Zuckerberg or Bill Clinton they're gonna have difficulty differentiating between the reds and greens 242 00:23:29,720 --> 00:23:30,260 in your chance. 243 00:23:30,290 --> 00:23:37,040 So it's probably not what you want to pick and that's why I suspect it's not a coincidence that the 244 00:23:37,040 --> 00:23:45,260 predominant color of Facebook is actually blue now Seabourn actually has a pretty nice choice of color 245 00:23:45,260 --> 00:23:52,610 palettes that we can use in our projects and some of the default color palettes are found on this part 246 00:23:52,700 --> 00:23:59,250 of their documentation if you scroll through here you'll see that these color palettes have names like 247 00:23:59,460 --> 00:24:08,160 HLS or such two or paired or what have you and we can try them out in our notebook here by adding an 248 00:24:08,160 --> 00:24:14,280 argument here to our little plot function called palette. 249 00:24:14,370 --> 00:24:22,020 And if we set palette equal to say HLS and refresh ourselves we get this look right here. 250 00:24:22,530 --> 00:24:31,680 If we set our palette to set capital S too then we get this look here orange and green. 251 00:24:32,940 --> 00:24:37,920 And if we set it to see head then we get this look here. 252 00:24:38,190 --> 00:24:45,730 Dark blue and light blue I actually quite liked HLS so I'm going to stick with us. 253 00:24:46,490 --> 00:24:54,820 And now I'm going to add my own legend the way you do this is with Petey dot legend. 254 00:24:55,130 --> 00:24:57,940 And there we can specify what should be inside the legend. 255 00:24:57,950 --> 00:25:03,950 So I wanted to have three things I wanted to explain the decision boundary the non spam messages and 256 00:25:03,950 --> 00:25:05,510 the spam messages. 257 00:25:05,510 --> 00:25:10,940 So within the parentheses I'm going to add another set of parentheses and I'll add three strings right 258 00:25:11,600 --> 00:25:24,320 decision boundary column single quotes non spam comma single quotes spam and after this first closing 259 00:25:24,320 --> 00:25:30,320 parentheses I'll add a comma and I'll specify the location where I want this legend to be. 260 00:25:30,650 --> 00:25:34,610 So LLC c is equal to lower space. 261 00:25:34,670 --> 00:25:36,320 Right. 262 00:25:36,320 --> 00:25:39,290 And now I'll have to specify as a font size. 263 00:25:39,380 --> 00:25:47,230 So I'll go with font size is equal to 14 and I'll refresh myself. 264 00:25:47,230 --> 00:25:52,780 Now what I get is I have the legend that I've created right here. 265 00:25:52,990 --> 00:25:57,400 I have got the default legend a little bit to the right. 266 00:25:57,610 --> 00:26:06,350 So we need to remove this default legend here and we can do that simply by specifying an argument under 267 00:26:06,370 --> 00:26:17,340 Ellen plot called Legend set that equal to big capital F and A S E false. 268 00:26:17,400 --> 00:26:18,940 Don't forget the comma afterwards. 269 00:26:19,050 --> 00:26:21,210 Otherwise you get the same error I had earlier. 270 00:26:21,420 --> 00:26:30,910 And if you refresh your chart should look like this now coming back to the colors were of course not 271 00:26:30,910 --> 00:26:35,140 limited to using the built in pallets from Siebel. 272 00:26:35,140 --> 00:26:39,180 Let me show you how you can create your own custom palette. 273 00:26:39,190 --> 00:26:40,980 The process is actually quite simple. 274 00:26:41,050 --> 00:26:46,360 You just have to stick the hex codes that correspond to the colors that you want into a Python list 275 00:26:46,780 --> 00:26:53,930 and then feed that list into this Elm plot function as an argument now for this example. 276 00:26:54,100 --> 00:26:57,780 I want to take some inspiration from some famous artists. 277 00:26:57,880 --> 00:27:04,240 So I want to go to a website called Color list dot com and the lady behind this Web site basically took 278 00:27:04,360 --> 00:27:09,640 a bunch of artists looked at their paintings and now the hex codes for us to use. 279 00:27:09,730 --> 00:27:17,380 So I'm gonna go to r and I'm going to look for the troubled American painter by the name of Mark Rothko 280 00:27:18,910 --> 00:27:24,390 and scrolling through here I can see the color palette of two of his paintings. 281 00:27:24,670 --> 00:27:28,890 But looking what's available here I might actually change my mind. 282 00:27:28,960 --> 00:27:33,670 I'm going to go with somebody else instead maybe see Mondrian. 283 00:27:33,850 --> 00:27:35,580 See if she's got Mondrian. 284 00:27:35,590 --> 00:27:36,110 There we go. 285 00:27:36,190 --> 00:27:37,620 Mondrian. 286 00:27:37,840 --> 00:27:40,650 This is a whole lot more promising. 287 00:27:40,660 --> 00:27:46,760 Let me take the red and the blue from the Broadway boogie woogie from Mondrian. 288 00:27:46,930 --> 00:27:54,790 I'll just copy this hex code here come back into my Python notebook what I'll quickly do is I'll copy 289 00:27:54,910 --> 00:27:56,470 the cell pasted below. 290 00:27:56,560 --> 00:28:06,790 So I have my own copy and then I'll create a variable here called My of colors and I'll set that equal 291 00:28:06,790 --> 00:28:13,240 to two empty square brackets with some single quotes inside and this is where I'm going to paste my 292 00:28:13,240 --> 00:28:23,260 hex code for my red turn and then I'm going to come back take the maybe the light blue and I'll come 293 00:28:23,260 --> 00:28:35,290 back here and I'll put this one here as perhaps the second hex code in my list so I've got my red and 294 00:28:35,290 --> 00:28:43,360 my blue and now all I have to do is change the value here of my palette argument from HLS which was 295 00:28:43,360 --> 00:28:51,030 the inbuilt seaborne color palette to my underscore colors and then I'll refresh the cell. 296 00:28:52,270 --> 00:28:52,760 Brilliant. 297 00:28:53,140 --> 00:28:59,980 So now we can see that the red is showing up here and the bottom triangle and the blue is showing up 298 00:28:59,980 --> 00:29:08,770 here on the top triangle and you know what the thing is I reckon red actually suits spam emails better 299 00:29:08,980 --> 00:29:16,990 than does Blue so I'll have to do to change which color corresponds to which category is change the 300 00:29:16,990 --> 00:29:18,220 order in my list. 301 00:29:18,240 --> 00:29:28,960 So I take my second hex code and put it as the first text code and then I can refresh myself to see 302 00:29:28,960 --> 00:29:29,830 how we're doing now. 303 00:29:30,610 --> 00:29:31,030 Brilliant. 304 00:29:31,180 --> 00:29:38,320 So this is a lot more interesting tell me thing I'll do now is I want to zoom in a little bit on this 305 00:29:38,320 --> 00:29:40,300 top right corner. 306 00:29:40,660 --> 00:29:45,340 And the reason is is that there seems to be a lot of this sort of red fringing going on there seems 307 00:29:45,340 --> 00:29:46,600 to be some crossover. 308 00:29:46,720 --> 00:29:52,540 So this area looks very very interesting to me and I'm going to change this value here on my axis from 309 00:29:52,540 --> 00:29:58,680 minus two thousand to 500 and we'll see how that looks. 310 00:29:58,680 --> 00:30:06,370 Now not bad not bad at all but perhaps maybe a little bit too faint. 311 00:30:06,370 --> 00:30:15,860 So when I take my alpha value we change it to say seven and I think that is a lot more clear. 312 00:30:16,090 --> 00:30:26,510 We can see the red Xs in this blue area much much better now how I know that a lot of these modern artists 313 00:30:26,600 --> 00:30:34,340 on exactly known for their data visualization and I don't really get modern art but I do think that 314 00:30:34,340 --> 00:30:41,960 sometimes these guys do have some pretty cool color combinations and I do think we can draw some inspiration 315 00:30:41,960 --> 00:30:43,400 from that. 316 00:30:43,880 --> 00:30:50,210 So yeah check out color Lisa and let me know if you have any other favorite resources for your projects 317 00:30:50,300 --> 00:30:52,040 in the comments on this video.