1
00:00:02,210 --> 00:00:08,130
In the previous session, we learned about machine learning concepts.

2
00:00:08,570 --> 00:00:14,480
We also saw the different types of machine learning, classification and regression.

3
00:00:15,510 --> 00:00:24,660
In this session, we will see how to measure the accuracy of a classification algorithm and regression

4
00:00:24,660 --> 00:00:25,170
algorithm.

5
00:00:25,650 --> 00:00:27,670
OK, what is accuracy?

6
00:00:28,410 --> 00:00:31,470
I am predicting someone will get a disease.

7
00:00:32,040 --> 00:00:36,600
Whether that person actually gets a disease or not is a measure of accuracy.

8
00:00:36,600 --> 00:00:36,940
Right.

9
00:00:37,440 --> 00:00:42,760
So I compare what I predicted and what happened in actual right.

10
00:00:43,500 --> 00:00:46,890
That tells me about the level of accuracy.

11
00:00:47,490 --> 00:00:50,990
We are going to measure that as we are developing the model itself.

12
00:00:51,300 --> 00:00:57,720
So we get an idea of what is the kind of accuracy we can expect in our machine learning model based

13
00:00:57,720 --> 00:00:58,740
on historical data.

14
00:00:59,970 --> 00:01:04,440
OK, so how will you measure accuracy in a regression problem?

15
00:01:04,950 --> 00:01:07,530
We use what is called R-squared.

16
00:01:07,890 --> 00:01:11,430
OK, our squire is a condition of determination.

17
00:01:12,370 --> 00:01:16,020
The it is actually a ratio, we're going to see that very shortly.

18
00:01:16,540 --> 00:01:19,740
We also use one more metric.

19
00:01:19,750 --> 00:01:21,670
It is known as mean absolute terror.

20
00:01:22,300 --> 00:01:31,270
While our square is the ratio of the difference between actual versus predictable, mean absolute terror

21
00:01:31,630 --> 00:01:36,010
tells the quantum of difference between actual versus.

22
00:01:37,060 --> 00:01:43,210
So we will use both the coefficient and the absolute terror to assess how good the model is.

23
00:01:43,600 --> 00:01:48,130
How would the fitment is OK, that is in a regression problem.

24
00:01:48,430 --> 00:01:55,550
In a classification problem, we look at accuracy from the perspective of proof positive and negative.

25
00:01:55,930 --> 00:02:02,140
And we compare that with all the possibilities that can happen that is too positive to negative, false

26
00:02:02,140 --> 00:02:03,550
positive and false negative.

27
00:02:04,150 --> 00:02:04,600
Right.

28
00:02:05,470 --> 00:02:12,190
And other metrics that we use in a classification problem is what is known as a U.S. area under the

29
00:02:12,190 --> 00:02:12,500
code.

30
00:02:13,180 --> 00:02:17,850
This provides a range of values between zero and one, OK?

31
00:02:18,250 --> 00:02:24,160
It also uses all the four possibilities and provides a value between zero and one.

32
00:02:24,620 --> 00:02:28,480
If you see is closer to one, it means it is having a higher accuracy.

33
00:02:28,600 --> 00:02:31,890
If it is closer to zero means the accuracy is low.

34
00:02:32,710 --> 00:02:33,000
Right.

35
00:02:33,340 --> 00:02:34,270
So now let's see.

36
00:02:34,270 --> 00:02:37,600
This are square and you see a bit more in detail.

37
00:02:38,080 --> 00:02:42,650
OK, R-squared, as I mentioned, is a ratio, OK.

38
00:02:42,880 --> 00:02:47,340
It measures the residual sum of squares and the total sum of squares.

39
00:02:47,920 --> 00:02:51,560
So why after that, you see here is actually the predicted one.

40
00:02:51,910 --> 00:02:58,790
OK, a residual sum of squares actually is the difference between actual versus predictable.

41
00:02:59,410 --> 00:03:04,570
Explain some of squares is predicted versus the average.

42
00:03:04,840 --> 00:03:09,520
OK, so both I consider in the ratio using this formula.

43
00:03:10,060 --> 00:03:13,000
OK, I will have what is known as R-squared.

44
00:03:14,070 --> 00:03:20,160
If our squire is closer to one hundred percent on one, that means, you know, it's a great fitment.

45
00:03:20,730 --> 00:03:24,780
If it is 80 percent, the fitment is good, 40 percent.

46
00:03:24,960 --> 00:03:27,420
OK, I will say it is just below.

47
00:03:28,290 --> 00:03:31,840
And for the zero means there is no correlation, OK.

48
00:03:32,670 --> 00:03:35,400
Please note that R-squared can be negative also.

49
00:03:35,760 --> 00:03:36,570
That means.

50
00:03:37,760 --> 00:03:44,870
It is having a negative relationship instead of a positive relationship, you have a negative relationship

51
00:03:44,870 --> 00:03:50,800
of R-squared negative that is very much possible, like somebody studies for more number of us.

52
00:03:51,320 --> 00:03:54,820
Unfortunately, that student scores less number of marks.

53
00:03:54,890 --> 00:03:59,870
That is the more number of other student studies, the less number of marks the student gets.

54
00:04:00,800 --> 00:04:03,510
That is a case of negative correlation, right?

55
00:04:03,890 --> 00:04:07,220
We don't want that, but we do have such scenarios in real life.

56
00:04:08,330 --> 00:04:08,720
OK.

57
00:04:09,970 --> 00:04:16,960
Now, let's see this U.S. OK area under the code, we use what is known as confusion matics, this is

58
00:04:16,960 --> 00:04:24,130
nothing but the matrix of predicted versus actual, that is tabulation of positive to negative, false

59
00:04:24,130 --> 00:04:25,450
positive and false negative.

60
00:04:25,720 --> 00:04:33,160
OK, we've already seen the seen the explanation of false positive and false negative in the hypothesis

61
00:04:33,760 --> 00:04:34,220
session.

62
00:04:34,690 --> 00:04:40,600
OK, so the confusion matrix is used to construct the area under the code.

63
00:04:40,950 --> 00:04:46,450
OK, as I mentioned earlier and you see closer to ones means high accuracy rate.

64
00:04:47,050 --> 00:04:55,990
And the graph that you see here graphically shows how the area under the curve corresponds to a comparison

65
00:04:55,990 --> 00:04:58,600
between true positive and true negative light.

66
00:04:58,760 --> 00:05:01,600
As you can see here, the red is too negative.

67
00:05:01,630 --> 00:05:02,790
Green is still positive.

68
00:05:03,430 --> 00:05:09,280
If the area under the commerce clause that the one the overlap between two positive and negative is

69
00:05:09,280 --> 00:05:09,610
loyal.

70
00:05:09,970 --> 00:05:17,770
OK, as the area and the code comes down, the overlap increases, which means the accuracy also comes

71
00:05:17,770 --> 00:05:17,980
down.

72
00:05:19,010 --> 00:05:19,410
Right.

73
00:05:20,360 --> 00:05:26,780
So are you are you understanding it if you have a regression problem, you will use?

74
00:05:28,500 --> 00:05:34,770
Square, if it is a classification problem, you will use area under the right.

75
00:05:35,660 --> 00:05:39,530
And you will use these values, right?

76
00:05:39,560 --> 00:05:41,890
Is it closer to one or closer to zero?

77
00:05:42,140 --> 00:05:47,030
And you think that you will determine how good the faith is, how good the accuracy's.