1 00:00:02,210 --> 00:00:08,130 In the previous session, we learned about machine learning concepts. 2 00:00:08,570 --> 00:00:14,480 We also saw the different types of machine learning, classification and regression. 3 00:00:15,510 --> 00:00:24,660 In this session, we will see how to measure the accuracy of a classification algorithm and regression 4 00:00:24,660 --> 00:00:25,170 algorithm. 5 00:00:25,650 --> 00:00:27,670 OK, what is accuracy? 6 00:00:28,410 --> 00:00:31,470 I am predicting someone will get a disease. 7 00:00:32,040 --> 00:00:36,600 Whether that person actually gets a disease or not is a measure of accuracy. 8 00:00:36,600 --> 00:00:36,940 Right. 9 00:00:37,440 --> 00:00:42,760 So I compare what I predicted and what happened in actual right. 10 00:00:43,500 --> 00:00:46,890 That tells me about the level of accuracy. 11 00:00:47,490 --> 00:00:50,990 We are going to measure that as we are developing the model itself. 12 00:00:51,300 --> 00:00:57,720 So we get an idea of what is the kind of accuracy we can expect in our machine learning model based 13 00:00:57,720 --> 00:00:58,740 on historical data. 14 00:00:59,970 --> 00:01:04,440 OK, so how will you measure accuracy in a regression problem? 15 00:01:04,950 --> 00:01:07,530 We use what is called R-squared. 16 00:01:07,890 --> 00:01:11,430 OK, our squire is a condition of determination. 17 00:01:12,370 --> 00:01:16,020 The it is actually a ratio, we're going to see that very shortly. 18 00:01:16,540 --> 00:01:19,740 We also use one more metric. 19 00:01:19,750 --> 00:01:21,670 It is known as mean absolute terror. 20 00:01:22,300 --> 00:01:31,270 While our square is the ratio of the difference between actual versus predictable, mean absolute terror 21 00:01:31,630 --> 00:01:36,010 tells the quantum of difference between actual versus. 22 00:01:37,060 --> 00:01:43,210 So we will use both the coefficient and the absolute terror to assess how good the model is. 23 00:01:43,600 --> 00:01:48,130 How would the fitment is OK, that is in a regression problem. 24 00:01:48,430 --> 00:01:55,550 In a classification problem, we look at accuracy from the perspective of proof positive and negative. 25 00:01:55,930 --> 00:02:02,140 And we compare that with all the possibilities that can happen that is too positive to negative, false 26 00:02:02,140 --> 00:02:03,550 positive and false negative. 27 00:02:04,150 --> 00:02:04,600 Right. 28 00:02:05,470 --> 00:02:12,190 And other metrics that we use in a classification problem is what is known as a U.S. area under the 29 00:02:12,190 --> 00:02:12,500 code. 30 00:02:13,180 --> 00:02:17,850 This provides a range of values between zero and one, OK? 31 00:02:18,250 --> 00:02:24,160 It also uses all the four possibilities and provides a value between zero and one. 32 00:02:24,620 --> 00:02:28,480 If you see is closer to one, it means it is having a higher accuracy. 33 00:02:28,600 --> 00:02:31,890 If it is closer to zero means the accuracy is low. 34 00:02:32,710 --> 00:02:33,000 Right. 35 00:02:33,340 --> 00:02:34,270 So now let's see. 36 00:02:34,270 --> 00:02:37,600 This are square and you see a bit more in detail. 37 00:02:38,080 --> 00:02:42,650 OK, R-squared, as I mentioned, is a ratio, OK. 38 00:02:42,880 --> 00:02:47,340 It measures the residual sum of squares and the total sum of squares. 39 00:02:47,920 --> 00:02:51,560 So why after that, you see here is actually the predicted one. 40 00:02:51,910 --> 00:02:58,790 OK, a residual sum of squares actually is the difference between actual versus predictable. 41 00:02:59,410 --> 00:03:04,570 Explain some of squares is predicted versus the average. 42 00:03:04,840 --> 00:03:09,520 OK, so both I consider in the ratio using this formula. 43 00:03:10,060 --> 00:03:13,000 OK, I will have what is known as R-squared. 44 00:03:14,070 --> 00:03:20,160 If our squire is closer to one hundred percent on one, that means, you know, it's a great fitment. 45 00:03:20,730 --> 00:03:24,780 If it is 80 percent, the fitment is good, 40 percent. 46 00:03:24,960 --> 00:03:27,420 OK, I will say it is just below. 47 00:03:28,290 --> 00:03:31,840 And for the zero means there is no correlation, OK. 48 00:03:32,670 --> 00:03:35,400 Please note that R-squared can be negative also. 49 00:03:35,760 --> 00:03:36,570 That means. 50 00:03:37,760 --> 00:03:44,870 It is having a negative relationship instead of a positive relationship, you have a negative relationship 51 00:03:44,870 --> 00:03:50,800 of R-squared negative that is very much possible, like somebody studies for more number of us. 52 00:03:51,320 --> 00:03:54,820 Unfortunately, that student scores less number of marks. 53 00:03:54,890 --> 00:03:59,870 That is the more number of other student studies, the less number of marks the student gets. 54 00:04:00,800 --> 00:04:03,510 That is a case of negative correlation, right? 55 00:04:03,890 --> 00:04:07,220 We don't want that, but we do have such scenarios in real life. 56 00:04:08,330 --> 00:04:08,720 OK. 57 00:04:09,970 --> 00:04:16,960 Now, let's see this U.S. OK area under the code, we use what is known as confusion matics, this is 58 00:04:16,960 --> 00:04:24,130 nothing but the matrix of predicted versus actual, that is tabulation of positive to negative, false 59 00:04:24,130 --> 00:04:25,450 positive and false negative. 60 00:04:25,720 --> 00:04:33,160 OK, we've already seen the seen the explanation of false positive and false negative in the hypothesis 61 00:04:33,760 --> 00:04:34,220 session. 62 00:04:34,690 --> 00:04:40,600 OK, so the confusion matrix is used to construct the area under the code. 63 00:04:40,950 --> 00:04:46,450 OK, as I mentioned earlier and you see closer to ones means high accuracy rate. 64 00:04:47,050 --> 00:04:55,990 And the graph that you see here graphically shows how the area under the curve corresponds to a comparison 65 00:04:55,990 --> 00:04:58,600 between true positive and true negative light. 66 00:04:58,760 --> 00:05:01,600 As you can see here, the red is too negative. 67 00:05:01,630 --> 00:05:02,790 Green is still positive. 68 00:05:03,430 --> 00:05:09,280 If the area under the commerce clause that the one the overlap between two positive and negative is 69 00:05:09,280 --> 00:05:09,610 loyal. 70 00:05:09,970 --> 00:05:17,770 OK, as the area and the code comes down, the overlap increases, which means the accuracy also comes 71 00:05:17,770 --> 00:05:17,980 down. 72 00:05:19,010 --> 00:05:19,410 Right. 73 00:05:20,360 --> 00:05:26,780 So are you are you understanding it if you have a regression problem, you will use? 74 00:05:28,500 --> 00:05:34,770 Square, if it is a classification problem, you will use area under the right. 75 00:05:35,660 --> 00:05:39,530 And you will use these values, right? 76 00:05:39,560 --> 00:05:41,890 Is it closer to one or closer to zero? 77 00:05:42,140 --> 00:05:47,030 And you think that you will determine how good the faith is, how good the accuracy's.