1 00:00:02,610 --> 00:00:08,970 In this session on bigoted analysis, we are particularly interested in finding the relationship between 2 00:00:09,510 --> 00:00:14,280 two independent variables in the previous sessions I talked about. 3 00:00:16,570 --> 00:00:21,100 Dependent variable being dependent on many independent variables. 4 00:00:22,320 --> 00:00:28,160 Please note what I just said, I'm interested in finding the relationship between two independent variables 5 00:00:28,710 --> 00:00:29,430 itself. 6 00:00:30,900 --> 00:00:38,850 If there is a relationship between two independent variables, I can predict one independent variable 7 00:00:38,850 --> 00:00:39,930 from another independent. 8 00:00:39,930 --> 00:00:40,220 Very. 9 00:00:41,310 --> 00:00:46,110 Ideally, I don't want this kind of relationship among my variables. 10 00:00:48,160 --> 00:00:50,660 As far as developing machine learning model systems. 11 00:00:52,930 --> 00:00:59,290 This is what is known as multipolarity, that is one independent variable can be predicted from another 12 00:00:59,290 --> 00:01:00,850 independent variable itself's. 13 00:01:01,860 --> 00:01:08,040 OK, so let's see all the techniques that are available in my video analysis. 14 00:01:09,910 --> 00:01:15,760 So by immediate analysis, helps find the relationship between two variables, right? 15 00:01:16,680 --> 00:01:22,360 I know I said we are more interested in finding the relation between two independent variables, but 16 00:01:22,360 --> 00:01:23,710 why we need analysis. 17 00:01:24,010 --> 00:01:29,170 Broadly speaking, can be used to find out the relationship between any two variables. 18 00:01:30,070 --> 00:01:30,400 Right. 19 00:01:30,940 --> 00:01:35,920 So we are looking at both association and this association, meaning there is a relationship, there 20 00:01:35,920 --> 00:01:40,840 is no relationship that can be a positive relationship, that can be a negative relationship. 21 00:01:41,860 --> 00:01:44,190 All the possibilities are endless. 22 00:01:45,100 --> 00:01:51,990 OK, and by various analysis, can be performed for any kind of combination of variables. 23 00:01:52,000 --> 00:01:59,320 That is, one variable could be continuous, another variable could be categorical, like both the variables 24 00:01:59,320 --> 00:02:00,370 could be categorical. 25 00:02:00,370 --> 00:02:02,250 Both variables could be continuous. 26 00:02:02,320 --> 00:02:06,640 There are different types of techniques for different types of combinations. 27 00:02:06,670 --> 00:02:08,920 We are going to see all of them in this issue. 28 00:02:09,670 --> 00:02:14,710 OK, we will first start with the relationship between two continuous variables. 29 00:02:14,710 --> 00:02:16,840 That is two members, so to speak. 30 00:02:17,320 --> 00:02:19,000 Remember, I gave the example, right? 31 00:02:19,000 --> 00:02:22,510 Number of marks of the number of hours I study. 32 00:02:22,780 --> 00:02:25,690 That is an example right here. 33 00:02:25,700 --> 00:02:31,170 The primary technique that we'll be looking at is what is known as a scatterplot, OK? 34 00:02:31,510 --> 00:02:38,290 It graphically shows whether there is a relationship between two variables like you can you can be sure, 35 00:02:38,300 --> 00:02:41,030 Lucy, as I mentioned, it can be a positive. 36 00:02:41,050 --> 00:02:43,060 It can be a negative relationship. 37 00:02:43,300 --> 00:02:49,030 That can be a relationship, that may not be a relationship and the relationship can be a linear which 38 00:02:49,360 --> 00:02:51,400 a straight line can be fitted. 39 00:02:51,950 --> 00:02:56,730 OK, what is called is the line of best fit, OK? 40 00:02:58,510 --> 00:03:05,920 Or it can be a non-linear relationship, like a polynomial kind of relationship, OK, or a logistic 41 00:03:05,920 --> 00:03:06,550 relationship. 42 00:03:07,510 --> 00:03:15,790 So Scatterplot shows the relationship between two variables, but it doesn't tell you the strength of 43 00:03:15,790 --> 00:03:16,540 the relationship. 44 00:03:17,370 --> 00:03:21,930 OK, so for this, we use what is known as quotient of correlation. 45 00:03:23,740 --> 00:03:32,500 It derives from the scatterplot because it's got to be food, what is known as the line of best, OK, 46 00:03:33,070 --> 00:03:39,550 using the line of best food, you can find out what is the extent of correlation between two continuous 47 00:03:39,550 --> 00:03:41,310 experience for this? 48 00:03:41,320 --> 00:03:44,140 As I said, the use coefficient of correlation. 49 00:03:44,530 --> 00:03:47,200 So correlation there is between minus one and plus one. 50 00:03:47,510 --> 00:03:49,180 OK, one indicates. 51 00:03:51,420 --> 00:03:58,540 Minus one indicates perfect negative correlation, plus one indicates positive linear correlation, 52 00:03:58,560 --> 00:04:00,270 Zettl indicates no correlation. 53 00:04:00,480 --> 00:04:01,480 This is minus one. 54 00:04:01,500 --> 00:04:01,890 OK. 55 00:04:03,390 --> 00:04:09,270 The point is many times, you know, you won't get this minus one, plus one, or you will get values 56 00:04:09,270 --> 00:04:10,200 that are closer to one. 57 00:04:10,530 --> 00:04:15,390 So which means in order for this point nine six, if it is closer to minus one. 58 00:04:15,390 --> 00:04:19,930 So it means it's a nearly a perfect linear correlation. 59 00:04:19,950 --> 00:04:24,200 You can even get a value of eight point six. 60 00:04:24,210 --> 00:04:27,210 That means positive correlation to some extent. 61 00:04:27,930 --> 00:04:33,150 So those are the judgments, you know, you need to make based on the quotient of correlation. 62 00:04:34,410 --> 00:04:34,740 Right. 63 00:04:35,130 --> 00:04:37,160 So this is the scatterplot, right? 64 00:04:37,440 --> 00:04:43,590 In this example, I have drawn this kind of prop for a relationship between applicant's income and loan 65 00:04:43,590 --> 00:04:44,010 amount. 66 00:04:44,670 --> 00:04:47,910 Remember, these are two different variables. 67 00:04:50,350 --> 00:04:57,460 We can find that there is more relation between the two, right in the other scenario, the applicant 68 00:04:57,460 --> 00:05:00,440 income KOPLIK and then that is also being compared. 69 00:05:00,880 --> 00:05:07,830 But see, the scatterplot divide is scattered all along and quite literally, it is scattered all around. 70 00:05:07,840 --> 00:05:13,690 That means there is no relationship like you can plodders, whenever you have two variables and you 71 00:05:13,690 --> 00:05:15,990 want to find the relationship immediately. 72 00:05:16,000 --> 00:05:17,810 Plot scatterplot right. 73 00:05:17,860 --> 00:05:22,660 A very, very simple tool to ascertain the relationship between any two variables. 74 00:05:23,410 --> 00:05:26,420 And you can be sure you see if there is any relationship or not. 75 00:05:27,360 --> 00:05:34,030 You can also see, is it a case of linear relation or is it a case of a polynomial or other types of 76 00:05:34,030 --> 00:05:34,650 relationship? 77 00:05:34,870 --> 00:05:40,870 All those things can be seen here, you know, using this scatterplot. 78 00:05:41,650 --> 00:05:42,030 Right. 79 00:05:42,280 --> 00:05:47,290 And as I discussed the earlier, I use quotient of correlation. 80 00:05:47,530 --> 00:05:49,660 That is the extent of correlation. 81 00:05:49,660 --> 00:05:56,620 That is the numerical measure of correlation is the correlation coefficient. 82 00:05:56,680 --> 00:06:02,230 But in this case, you are taking applicant's income and loan amount by we are getting a correlation 83 00:06:02,230 --> 00:06:09,430 coalition of point Feiss, which means that is a moderate relationship, a more positive relationship 84 00:06:09,970 --> 00:06:12,590 between applicant income and corporate income. 85 00:06:12,610 --> 00:06:16,840 It is a negative correlation, but it is a weak correlation. 86 00:06:16,870 --> 00:06:21,510 Please understand this is closer to zero, right, than to minus one hour plus one. 87 00:06:22,330 --> 00:06:28,670 If you look at the relationship between applicant income and long term, it is in fact closer to zero 88 00:06:28,680 --> 00:06:30,780 point minus point zero. 89 00:06:30,810 --> 00:06:34,450 For that is a little weak relationship between two variables. 90 00:06:35,490 --> 00:06:37,260 Are you getting what I'm what I'm trying to say? 91 00:06:37,270 --> 00:06:37,650 Right. 92 00:06:37,960 --> 00:06:40,920 Scatterplot helps him to visually see the relationship. 93 00:06:40,930 --> 00:06:41,300 Right. 94 00:06:41,710 --> 00:06:46,870 Whether that is a relationship or not, who have Schindall correlation, gives us a numerical measure 95 00:06:47,140 --> 00:06:49,540 to find out the extent of correlation. 96 00:06:49,540 --> 00:06:55,010 That is that between two variables, I can use this concept to determine whether that is in any relationship 97 00:06:55,030 --> 00:06:57,000 and decide my course of action. 98 00:06:58,430 --> 00:07:02,010 OK, what is the course of action? 99 00:07:02,020 --> 00:07:07,270 Ideally, I don't want correlation between two independent variables. 100 00:07:07,270 --> 00:07:09,010 I will remove such variables. 101 00:07:09,850 --> 00:07:11,500 Right, so that is the primary action. 102 00:07:12,690 --> 00:07:21,540 Now, the other typists, categorical and categorical, if the two variables are most of them are categorical 103 00:07:21,540 --> 00:07:25,340 in nature, we use what is known as a chi square test, right. 104 00:07:25,770 --> 00:07:33,090 It is used to derive the statistical significance of the relationship between two categorical variables, 105 00:07:33,710 --> 00:07:33,900 right. 106 00:07:34,320 --> 00:07:41,870 It is based on the difference between expected and observed frequencies in one or more categories. 107 00:07:41,880 --> 00:07:42,080 Right. 108 00:07:42,220 --> 00:07:45,660 So categorically, it can have multiple levels of categories. 109 00:07:45,660 --> 00:07:45,930 Right. 110 00:07:46,140 --> 00:07:51,400 We create what is known as up to a table expected and observed frequencies. 111 00:07:51,570 --> 00:07:51,880 Right. 112 00:07:52,170 --> 00:07:57,120 It basically returns the probability for the chi squared distribution. 113 00:07:59,620 --> 00:08:03,420 Basically returns the probability if the probability is zero. 114 00:08:03,940 --> 00:08:11,790 It means that both the categorical variables are dependent probability of one to choose. 115 00:08:12,380 --> 00:08:14,900 Both are independent, right. 116 00:08:15,370 --> 00:08:17,470 So that's closer to zero. 117 00:08:17,470 --> 00:08:18,240 Closer to one. 118 00:08:18,580 --> 00:08:26,990 But we are looking at a 95 percent confidence level, which means that you will take a P value of point 119 00:08:27,040 --> 00:08:31,880 zero probability value of one zero four if it is less than one zero five. 120 00:08:32,410 --> 00:08:35,830 It means that there is a relationship between variables. 121 00:08:36,870 --> 00:08:37,280 My. 122 00:08:39,200 --> 00:08:41,750 Please note, this is what we will be using for. 123 00:08:42,870 --> 00:08:50,850 Finding out the relationship between two categorical variables, it chi square test is what you're going 124 00:08:50,850 --> 00:09:00,600 to use to get to understand this, OK, because we'll be using the concept of hypothesis testing. 125 00:09:01,350 --> 00:09:08,580 I frequently talked about dependent variable and an independent variable rate, the high school mathematics 126 00:09:08,580 --> 00:09:09,000 concept. 127 00:09:09,660 --> 00:09:15,590 This is also something that we learned in our high school and probably colleges. 128 00:09:15,780 --> 00:09:16,160 Right. 129 00:09:16,590 --> 00:09:28,410 So hypothesis testing is the concept that I'm trying to derive an inference about a population based 130 00:09:28,410 --> 00:09:31,280 on the information I have about sample. 131 00:09:31,590 --> 00:09:37,520 Remember, the population was a sample concept that we discussed in the earlier session. 132 00:09:37,720 --> 00:09:45,630 Like I am doing an opinion poll by taking feedback from some of the groups. 133 00:09:46,020 --> 00:09:46,580 Right. 134 00:09:46,890 --> 00:09:49,250 From some of the people who are eligible to vote. 135 00:09:49,680 --> 00:09:50,030 Right. 136 00:09:50,520 --> 00:09:55,590 I'm not going and taking feedback from all of the people who are eligible to vote. 137 00:09:55,590 --> 00:10:00,660 I'm only taking feedback from some of the people and hence I'm interested in that particular particular 138 00:10:00,660 --> 00:10:02,960 party is expected to win the elections. 139 00:10:02,980 --> 00:10:03,290 Right. 140 00:10:03,720 --> 00:10:10,320 So how can I conclude that the responses of a few people are. 141 00:10:11,690 --> 00:10:16,790 Good enough to determine the outcome of an election that more people are employed. 142 00:10:17,150 --> 00:10:24,470 So that is where the concept of hypothesis testing, cos I am trying to draw inferences about a sample 143 00:10:24,470 --> 00:10:31,790 based on the performance I'm trying to draw on in terms of the population, based on the results of 144 00:10:32,150 --> 00:10:34,400 a sample, based on the sample. 145 00:10:34,400 --> 00:10:37,420 I thought about the entire population for this. 146 00:10:37,430 --> 00:10:40,310 I'm making use of the concept of hypothesis testing. 147 00:10:40,450 --> 00:10:40,700 Right. 148 00:10:41,270 --> 00:10:45,860 So if you see the photo that is there, I apply fertiliser on the plant goes. 149 00:10:46,280 --> 00:10:46,610 Right. 150 00:10:46,880 --> 00:10:48,590 So what is the hypothesis? 151 00:10:48,590 --> 00:10:52,670 The hypothesis is application of fertiliser increases plant. 152 00:10:53,010 --> 00:10:53,250 Right. 153 00:10:53,630 --> 00:10:56,060 So that is stated as an alternate hypothesis. 154 00:10:56,060 --> 00:10:59,050 Alternate hypothesis is what we want to prove. 155 00:10:59,240 --> 00:10:59,630 Right. 156 00:10:59,990 --> 00:11:02,000 What we don't want to prove is data. 157 00:11:02,000 --> 00:11:06,820 Neither my life, what the application of fertiliser does not increase plant growth. 158 00:11:06,830 --> 00:11:07,040 Right. 159 00:11:07,060 --> 00:11:07,940 There is no impact. 160 00:11:09,130 --> 00:11:09,580 Right. 161 00:11:09,970 --> 00:11:15,350 So an alternate hypothesis is what we are trying to prove. 162 00:11:15,820 --> 00:11:16,910 I'm not like what? 163 00:11:16,910 --> 00:11:19,360 This is the opposite of that, right? 164 00:11:19,780 --> 00:11:22,300 We actually don't try to prove hypothesis. 165 00:11:22,330 --> 00:11:27,440 Rather, we try to either accept or reject the argument hypothesis. 166 00:11:27,550 --> 00:11:31,060 So that is the way the concept of hypothesis testing works. 167 00:11:31,640 --> 00:11:32,080 Right. 168 00:11:32,530 --> 00:11:39,460 So in our case, the alternate hypothesis would be that is the relationship between two variables. 169 00:11:39,490 --> 00:11:41,760 It's not there is no relationship between two. 170 00:11:41,770 --> 00:11:47,830 It is like if the P value is less than one zero five. 171 00:11:48,400 --> 00:11:50,120 We accept that. 172 00:11:50,980 --> 00:11:56,380 We conclude rather than is the relationship between two variables. 173 00:11:57,950 --> 00:11:58,640 Is this clear? 174 00:12:00,180 --> 00:12:01,340 Is this clear to everyone? 175 00:12:02,620 --> 00:12:04,910 Let's see some examples here. 176 00:12:04,930 --> 00:12:11,900 I'm taking two variables, right, the education level and the whether the individual is self-employed 177 00:12:12,530 --> 00:12:13,920 and there's no kind of scenario. 178 00:12:14,440 --> 00:12:14,780 Right. 179 00:12:15,040 --> 00:12:23,290 So please note that in all, if there are more than two levels, also, you can still use Bisquick, 180 00:12:23,290 --> 00:12:25,080 not just when I have to live. 181 00:12:25,090 --> 00:12:29,830 It's like education, graduate or not graduate, self-employed person. 182 00:12:30,000 --> 00:12:38,260 But if I have a variable in which more than two options are it, then also quietist can be used. 183 00:12:39,370 --> 00:12:45,980 Although the example that I showed you that only two possible outcomes for each of these variables might. 184 00:12:46,990 --> 00:12:49,420 Two possible levels here. 185 00:12:49,870 --> 00:12:56,460 The alternate hypothesis is education and self-employed are not independent. 186 00:12:56,530 --> 00:12:59,210 Look at the way I am structuring this. 187 00:12:59,230 --> 00:13:00,850 I'm saying they are dependent. 188 00:13:01,550 --> 00:13:04,020 So I'm essentially saying they are more dependent. 189 00:13:04,270 --> 00:13:08,850 Like the null hypothesis is they are independent of one another. 190 00:13:08,860 --> 00:13:10,050 There is no relationship. 191 00:13:10,660 --> 00:13:17,020 And the chi square value that I'm getting here, OK, is point zero to three six. 192 00:13:17,350 --> 00:13:20,620 But I am more interested in the P value. 193 00:13:21,940 --> 00:13:23,630 Right, that is what I'm interested in. 194 00:13:24,160 --> 00:13:27,680 So the p value is point eight, seven, seven, eight, right? 195 00:13:28,420 --> 00:13:29,200 That means. 196 00:13:30,660 --> 00:13:32,910 The value is greater than one zero five. 197 00:13:33,450 --> 00:13:37,860 That means we are accepting the null hypothesis. 198 00:13:38,040 --> 00:13:39,690 That means there is no relationship. 199 00:13:42,600 --> 00:13:47,070 In the case of let's see, one more example, education and property. 200 00:13:48,060 --> 00:13:55,230 Right, so here also the P values turning out to be greater than one zero five, that means there is 201 00:13:55,230 --> 00:13:58,020 no relationship between these two buildings. 202 00:13:58,530 --> 00:14:06,690 So whenever we have categorical variables, we use the concept of high school year. 203 00:14:06,930 --> 00:14:11,640 The primary metric that we are looking at is the value it is. 204 00:14:12,030 --> 00:14:19,500 If it is greater than one zero five, we can conclude that there is no relationship between the two 205 00:14:19,500 --> 00:14:20,250 buildings. 206 00:14:21,090 --> 00:14:28,440 So for this, we are using the concept of hypothesis testing to explain a little earlier in the session. 207 00:14:29,320 --> 00:14:36,240 Yes, now we are going to look at the relationship between categorical and countries, right? 208 00:14:36,490 --> 00:14:40,140 So for this, we are going to use the test detector on. 209 00:14:40,930 --> 00:14:48,730 Again, these are coming from statistical concepts, right on all counts whenever we are looking at 210 00:14:48,730 --> 00:14:52,910 more than two groups that are more than two groups. 211 00:14:52,920 --> 00:14:53,140 Right. 212 00:14:53,530 --> 00:15:00,020 Whereas if we are only looking at comparing two groups, we use the test already. 213 00:15:00,070 --> 00:15:03,190 This both are in fact, similar. 214 00:15:03,400 --> 00:15:10,540 OK, but it is used whenever the number of observations is less than 30. 215 00:15:11,050 --> 00:15:11,330 Right. 216 00:15:11,900 --> 00:15:18,280 Obviously, in most of the cases, the number of observations, the number of data, the number of historical 217 00:15:18,280 --> 00:15:23,350 data that we are taking up for building the machine learning model will be more than the people invariably 218 00:15:23,350 --> 00:15:25,150 will be going for a test. 219 00:15:25,850 --> 00:15:26,260 Right. 220 00:15:26,590 --> 00:15:32,860 So what what do I mean by two groups, which is what the categorical it is? 221 00:15:33,910 --> 00:15:34,270 Right. 222 00:15:34,660 --> 00:15:40,160 I have two variables if I want to compare more than two. 223 00:15:40,750 --> 00:15:42,130 We are looking at Kanawa. 224 00:15:43,240 --> 00:15:48,400 Like the example that you're going to see are only to fight, but if you have more than do, you can 225 00:15:48,400 --> 00:15:49,480 definitely use an all. 226 00:15:50,720 --> 00:15:51,070 Right. 227 00:15:52,250 --> 00:15:58,430 So here also we are going to primarily look at deep value. 228 00:16:00,320 --> 00:16:07,190 It is a P-value that is something that is of importance to us in this case, we are looking at if there 229 00:16:07,190 --> 00:16:10,980 is any relationship between applicant income and loans to. 230 00:16:11,510 --> 00:16:11,800 Right. 231 00:16:12,410 --> 00:16:14,330 This is a look at this. 232 00:16:14,330 --> 00:16:17,450 We are computing V Z, right. 233 00:16:17,510 --> 00:16:20,150 And they are using the Z test. 234 00:16:20,570 --> 00:16:20,890 Right. 235 00:16:21,110 --> 00:16:25,700 And we are computing the probability that is in one year. 236 00:16:25,790 --> 00:16:31,340 Also, we are stating we now like lotuses and alternate hypotheses, alternate hypothesis. 237 00:16:31,340 --> 00:16:34,760 That is the relationship model, that is more relationship. 238 00:16:35,000 --> 00:16:38,050 The P value comes to be greater than point zero five. 239 00:16:38,090 --> 00:16:39,470 In fact, it is closer to one. 240 00:16:39,740 --> 00:16:40,120 Right. 241 00:16:40,430 --> 00:16:44,390 We can very well not reject the null hypothesis. 242 00:16:44,400 --> 00:16:48,710 Remember the way I am stating this is the way the world of hypothesis testing works. 243 00:16:49,340 --> 00:16:52,790 I either accept or reject the null hypothesis. 244 00:16:53,540 --> 00:16:56,780 Right, which is there is no relationship here. 245 00:16:56,780 --> 00:16:57,800 I do not reject. 246 00:16:59,630 --> 00:17:06,890 OK, which means I accept whatever my hypothesis tastes like, but if you want to understand it, simply 247 00:17:07,010 --> 00:17:11,850 don't bother about the Amala argument by Waters's right. 248 00:17:11,870 --> 00:17:14,870 If this is confusing to you, just look at the people. 249 00:17:15,020 --> 00:17:21,030 Just see for the P value is less than or greater than point zero if it is greater than one zero five. 250 00:17:21,080 --> 00:17:23,480 That is that there is no relation. 251 00:17:23,700 --> 00:17:27,290 OK, nothing to worry if there's less than one zero five that is. 252 00:17:27,860 --> 00:17:28,780 And you need to worry. 253 00:17:29,090 --> 00:17:33,110 And the course of action is you remove the variable. 254 00:17:33,110 --> 00:17:33,530 Right. 255 00:17:34,670 --> 00:17:36,950 That is involved in the. 256 00:17:39,000 --> 00:17:44,240 Relationship like I do when there is a relationship between two independent variables, you want ideally 257 00:17:44,240 --> 00:17:49,710 remove one of them like so that's a you take, right. 258 00:17:50,550 --> 00:17:55,520 So a concept that is related to this is what is known as multipolarity. 259 00:17:55,540 --> 00:17:55,870 Right. 260 00:17:56,130 --> 00:18:00,080 And I predict an independent variable from another independent body. 261 00:18:02,480 --> 00:18:09,200 So it is like, for example, can I predict the outcome based on the applicant's in? 262 00:18:10,900 --> 00:18:15,800 Like, if such a relationship existed, it is definitely a big cause for concern. 263 00:18:17,140 --> 00:18:22,560 And the previous we saw there is a relationship, but here I'm going one step further, right? 264 00:18:22,620 --> 00:18:28,700 I'm checking if an independent variable can be predicted from another independent body. 265 00:18:28,900 --> 00:18:35,980 This is known as multicasting inheritance for those fathers would be using the concept of various variants, 266 00:18:35,980 --> 00:18:37,750 inflation, factor of value. 267 00:18:38,290 --> 00:18:40,120 And if you use this. 268 00:18:41,650 --> 00:18:48,970 Preloaded library that comes with Python, right, you can find out valiance, inflation factor in a 269 00:18:48,980 --> 00:18:49,340 jiffy. 270 00:18:49,610 --> 00:18:50,040 Right. 271 00:18:50,350 --> 00:18:56,220 So see the readings inflation factor for each of these independent variables. 272 00:18:56,500 --> 00:19:01,240 There are some there are some variables where it is higher, like six, 10. 273 00:19:01,240 --> 00:19:01,520 Right. 274 00:19:01,840 --> 00:19:02,590 Many variables. 275 00:19:02,590 --> 00:19:05,270 Many of the variables in it is not so high. 276 00:19:05,590 --> 00:19:10,390 So obviously, our interest level is higher than the average office. 277 00:19:11,290 --> 00:19:11,720 Higher. 278 00:19:12,160 --> 00:19:12,480 Right. 279 00:19:13,840 --> 00:19:17,440 Anything beyond fine is a cause for concern. 280 00:19:17,440 --> 00:19:24,120 But if it is greater than 10, you must do the more the MultiKulti and D right. 281 00:19:24,980 --> 00:19:31,770 Remove some of the highly correlated independent variables, which is the loan amount to multi-client 282 00:19:31,780 --> 00:19:33,550 feeling that right. 283 00:19:33,910 --> 00:19:37,810 You can also combine independent variables such as you can add them together. 284 00:19:37,820 --> 00:19:41,670 You can add, let's say, for example, applicant's income and corporate and income. 285 00:19:42,130 --> 00:19:42,410 Right. 286 00:19:42,820 --> 00:19:44,380 That is one way of looking at it. 287 00:19:45,230 --> 00:19:51,850 Or you can perform an analysis that is meant for highly correlated variables such as principal component 288 00:19:51,850 --> 00:19:55,120 analysis and partially square regression. 289 00:19:55,990 --> 00:19:57,410 Those are techniques you can adopt. 290 00:19:57,430 --> 00:20:02,860 But the one that I have found the best uses in already more than. 291 00:20:05,760 --> 00:20:10,150 Take a call and remove it all, you can add them together, right? 292 00:20:10,350 --> 00:20:11,930 These two have worked very well. 293 00:20:13,260 --> 00:20:19,620 So what is the primary action that you will be taking, if that is multipolarity, that is ViiV is greater 294 00:20:19,620 --> 00:20:21,390 than 10 remolded. 295 00:20:24,250 --> 00:20:29,880 So these are key, and I also don't want to highly correlated variables. 296 00:20:29,950 --> 00:20:39,220 I mean, we saw two guys square regression, right, which is the scatterplot and Zetterstrom. 297 00:20:40,330 --> 00:20:44,140 That also indicates that is the relationship between the two. 298 00:20:44,290 --> 00:20:54,160 I don't want to I don't want any relationship, especially a positive relationship between a positive 299 00:20:54,160 --> 00:20:56,470 relationship, between two independent variables. 300 00:20:58,070 --> 00:20:58,410 Right. 301 00:20:58,850 --> 00:21:06,440 So I don't want such variables in my data set because they will definitely pull down my forecast accuracy. 302 00:21:09,450 --> 00:21:18,940 So that's what I wanted to cover in this session, will be seen how to handle missing values, what 303 00:21:19,010 --> 00:21:22,590 what actions you need to be taking in the next session. 304 00:21:23,140 --> 00:21:23,690 OK.