1 00:00:00,420 --> 00:00:04,260 In the last video, we learned how to create our logistic regression model. 2 00:00:04,560 --> 00:00:10,440 In this lecture, we will see how to predict the values of why and how to create a confusion matrix 3 00:00:10,830 --> 00:00:17,220 from those predicted values to get the predicted values from our model. 4 00:00:17,970 --> 00:00:24,870 We just have to right not predict Kiewa to get the probability output from our model. 5 00:00:26,040 --> 00:00:26,820 We have to right. 6 00:00:27,570 --> 00:00:29,850 Predict underscore prob a. 7 00:00:30,990 --> 00:00:35,130 So my model variable is VLF underscore Elat. 8 00:00:36,750 --> 00:00:43,770 Therefore, I would like to see a live underscore la dot project underscore Proby and then in record, 9 00:00:43,800 --> 00:00:49,320 I have to mention the independent variables on which I want to predict the probabilities. 10 00:00:51,790 --> 00:00:52,660 If I done this. 11 00:00:55,190 --> 00:00:57,480 The output is in the form of a ray. 12 00:00:58,520 --> 00:01:01,520 The first column here is the probability of zero. 13 00:01:02,390 --> 00:01:05,750 That is probability of not sought in the second column. 14 00:01:05,750 --> 00:01:09,370 Here is the probability of one that this probability of solid. 15 00:01:12,180 --> 00:01:15,390 If you add these two column, the resultant is one. 16 00:01:18,330 --> 00:01:25,140 So for the first throw, the probability of homes being sold is zero point eighty seven for the second 17 00:01:25,140 --> 00:01:25,590 record. 18 00:01:25,980 --> 00:01:29,400 The probability is zero point six three and so on. 19 00:01:30,510 --> 00:01:38,460 Now, as we discussed earlier, we want to set a boundary condition on classifying it as not sold or 20 00:01:38,460 --> 00:01:42,930 sold by default or want to condition this point five. 21 00:01:44,030 --> 00:01:50,840 Which means that if the probability is less than point five, we had seen that the House cannot be sold 22 00:01:50,840 --> 00:01:54,880 within three months if the probability is greater than parfait. 23 00:01:55,160 --> 00:01:58,420 We are saying that we will be able to solve the holes in three months. 24 00:02:00,620 --> 00:02:09,620 Now, as we discussed earlier, that the cost associated with false positive and false negative is different 25 00:02:10,130 --> 00:02:14,270 and therefore we may want to choose a different boundary condition. 26 00:02:14,840 --> 00:02:18,740 We will use these probability values to choose that boundary condition. 27 00:02:19,070 --> 00:02:26,780 But before that, we will see how to predict zero one or classes depending on point five as our boundary 28 00:02:26,780 --> 00:02:27,350 condition. 29 00:02:28,550 --> 00:02:37,220 So by default, the predicate function that they see L.F. underscore allowed but predict takes boundary 30 00:02:37,220 --> 00:02:39,500 condition as zero point five. 31 00:02:40,130 --> 00:02:41,480 So if I run this code. 32 00:02:45,120 --> 00:02:52,980 You can see we have zero one zero one, the output result resultant is not in the form of problem readings, 33 00:02:53,190 --> 00:02:58,020 but it is in the form of classes that is sort of NORAD. 34 00:03:00,070 --> 00:03:06,700 If you compare from our probabilities values, for the first record, our probability was zero point 35 00:03:06,790 --> 00:03:10,690 eight seven six zero point eight seven is greater than zero point five. 36 00:03:10,840 --> 00:03:15,190 We are getting one for the second record. 37 00:03:15,220 --> 00:03:17,500 The probability is zero point six three. 38 00:03:18,340 --> 00:03:21,250 And since this is more than zero point five, we are getting one. 39 00:03:21,520 --> 00:03:22,810 And the second record as well. 40 00:03:24,550 --> 00:03:30,690 If you look at the third record, the probability here is zero point zero two. 41 00:03:31,480 --> 00:03:33,790 Since this is less than zero point five. 42 00:03:35,420 --> 00:03:37,570 We are getting zero in the third record 43 00:03:40,800 --> 00:03:44,330 now to set custom boundaries condition. 44 00:03:44,890 --> 00:03:49,700 We will use this project to underscore grade a probability values. 45 00:03:50,630 --> 00:03:56,150 So if you see here I am using this product, underscore Proby. 46 00:03:58,310 --> 00:04:01,490 Then I am selecting the second column from this. 47 00:04:01,760 --> 00:04:03,320 That's why I have written one here. 48 00:04:03,740 --> 00:04:10,540 So all the rules and on the second column, then I'm comparing this second column values with point 49 00:04:10,550 --> 00:04:12,020 three, point three. 50 00:04:12,050 --> 00:04:13,790 Here is my board recommendation. 51 00:04:15,890 --> 00:04:21,720 So if the value is greater than zero point three, the output will be crucial. 52 00:04:22,190 --> 00:04:26,420 And if the value here of probability is less than zero point three. 53 00:04:27,560 --> 00:04:29,050 The output will be false. 54 00:04:30,470 --> 00:04:32,240 And I am saving this value. 55 00:04:32,510 --> 00:04:37,550 And why underscore pride underscore zero point three if I rent this call? 56 00:04:40,110 --> 00:04:44,040 And get a sample of my wife tried to zero through and use. 57 00:04:54,230 --> 00:04:59,600 You can see I'm getting this grool for that Spruill false values, depending on the condition I have 58 00:04:59,600 --> 00:05:00,170 mentioned. 59 00:05:01,440 --> 00:05:07,840 Crew means one and false means zero one zero Stainforth, sorry or sorry. 60 00:05:11,340 --> 00:05:16,040 Now we have the actual values of way and the predicted values of way. 61 00:05:17,540 --> 00:05:25,580 Now we want to compare the accuracy of photo model, which means how many times we are actually predicting 62 00:05:25,580 --> 00:05:31,070 the correct outcome and how many times we are predicting the wrong outcome. 63 00:05:32,780 --> 00:05:34,250 Correct outcome means. 64 00:05:35,920 --> 00:05:39,660 Group positives that mean the actual value is true. 65 00:05:39,790 --> 00:05:42,800 That is one and the predicted value is also crucial. 66 00:05:42,940 --> 00:05:46,660 That is one or two negatives. 67 00:05:46,990 --> 00:05:50,860 That is actual value is zero and the predicted value is also zero. 68 00:05:52,100 --> 00:05:56,470 The wrong outcomes are false positive and false negative. 69 00:05:56,650 --> 00:05:59,200 We have already covered this in order to re-elect our. 70 00:06:02,220 --> 00:06:09,780 We also got word confusion metrics to categorize these four categories, and we will draw that confusion 71 00:06:09,780 --> 00:06:13,050 metrics from this output of our model. 72 00:06:16,110 --> 00:06:20,490 To create confusion metrics, we first have to import the confusion metrics. 73 00:06:21,840 --> 00:06:25,020 We will import it from a skill and dot matrix. 74 00:06:26,780 --> 00:06:30,560 And then we'll use confusion, make X function. 75 00:06:31,280 --> 00:06:32,690 There are two arguments here. 76 00:06:32,990 --> 00:06:35,060 First one is the actual values of Y. 77 00:06:35,570 --> 00:06:38,510 And second one is the predicted values of way. 78 00:06:40,010 --> 00:06:43,070 Remember, we created Widespread with zero point five. 79 00:06:43,130 --> 00:06:45,340 That is before it as our boundary condition. 80 00:06:46,190 --> 00:06:47,540 So let's run this. 81 00:06:52,060 --> 00:06:57,160 Here, these rules are the actual classes. 82 00:06:57,310 --> 00:06:59,600 So first rule is four zero. 83 00:06:59,980 --> 00:07:01,780 And second rule is for one. 84 00:07:02,200 --> 00:07:03,630 These are the actual outcomes. 85 00:07:03,640 --> 00:07:05,200 Rules are for actual outcomes. 86 00:07:05,830 --> 00:07:09,190 And this columns are for predicted classes. 87 00:07:09,640 --> 00:07:13,390 So the first column is for zero and the second column is for one. 88 00:07:15,250 --> 00:07:19,120 This one ninety five cents for zero and zero. 89 00:07:19,210 --> 00:07:22,180 That means the actual value was also zero. 90 00:07:22,270 --> 00:07:23,160 That is not sold. 91 00:07:23,330 --> 00:07:25,390 And the third there was also zero. 92 00:07:25,480 --> 00:07:26,380 That is not sold. 93 00:07:27,160 --> 00:07:29,260 These are also known as crude negatives. 94 00:07:32,220 --> 00:07:41,970 This 81 isn't the zero through the actual value of this, 81 was zero, but the predicted value is one. 95 00:07:42,030 --> 00:07:46,850 Since this are in the second volume, these are known as false positives. 96 00:07:49,470 --> 00:07:52,200 This 77 are in the second row. 97 00:07:52,350 --> 00:07:57,050 That is actually they belong to the second class that is sold. 98 00:07:59,040 --> 00:08:01,650 But we predicted them as not sold. 99 00:08:01,800 --> 00:08:03,690 So these are false negatives. 100 00:08:05,400 --> 00:08:06,670 This 153. 101 00:08:08,170 --> 00:08:14,330 For one and one, that is, the actual values also soared and the predicted value is also solid. 102 00:08:15,860 --> 00:08:23,690 Now let's create contingent metrics for our second pretty good value where we use zero point three as 103 00:08:23,690 --> 00:08:24,750 our boundary condition. 104 00:08:34,630 --> 00:08:35,520 Let's run this. 105 00:08:39,910 --> 00:08:44,510 Here is the confusion metrics for the zero point three as overblown to condition. 106 00:08:47,540 --> 00:08:54,530 Now, since we are using zero point three as bond recognition as compared to zero point five year, 107 00:08:54,740 --> 00:09:02,510 that means we are categorizing more values and the one category since we are lowering our probably be 108 00:09:02,510 --> 00:09:03,080 threshold. 109 00:09:03,980 --> 00:09:10,340 So, for example, if for some record the problem is zero point four, if we are losing zero point five 110 00:09:10,340 --> 00:09:14,030 threshold, we are categorizing it as not sold. 111 00:09:14,570 --> 00:09:17,960 But in this case we are categorizing it as sold. 112 00:09:18,740 --> 00:09:23,330 That's why the numbers here are inflated in the second column. 113 00:09:23,810 --> 00:09:26,310 The predicted ones numbers are inflated. 114 00:09:27,740 --> 00:09:38,000 So you can see our false positive values have increased from 81 to 154 and the false negative values 115 00:09:38,390 --> 00:09:40,880 as decreased from 77 to 17. 116 00:09:42,380 --> 00:09:49,880 So if you have different costs associated with your false positive and false negative, you can change 117 00:09:49,910 --> 00:09:55,910 this threshold level to change their distribution of these values in these two categories.