1 00:00:00,210 --> 00:00:06,780 Hello, all so in all our previous session, we have done some interactive analysis, so in this session 2 00:00:06,780 --> 00:00:12,330 I'm going to have this assignment in which I have for of the statement, which is somehow related to 3 00:00:12,330 --> 00:00:14,390 our hypothesis analysis. 4 00:00:14,700 --> 00:00:21,310 So the very first one is I have to consider two hypotheses depending upon what statement I have. 5 00:00:21,570 --> 00:00:29,670 So here I have the age of a person is going to be a factor in available loan or not. 6 00:00:29,910 --> 00:00:33,420 And then I have to automate all the stuff. 7 00:00:33,660 --> 00:00:37,540 So these are basically my advanced kind of analysis. 8 00:00:37,830 --> 00:00:46,010 So here I'm going to show you how exactly you can perform all this stuff in a very simplest way. 9 00:00:46,170 --> 00:00:53,820 So very first, I'm just going to call it scatterplot on my data points, on my features. 10 00:00:53,910 --> 00:01:02,400 Here I'm going to say I have to float a scatterplot between age and, of course, between personal loan. 11 00:01:02,800 --> 00:01:07,340 You will see this exactly is my personal loan. 12 00:01:07,590 --> 00:01:09,290 So just start over here. 13 00:01:09,300 --> 00:01:11,280 And this is exactly personal loan. 14 00:01:11,520 --> 00:01:16,710 And let's say you have two data points on the basis of, let's say, family. 15 00:01:16,730 --> 00:01:22,650 So here I'm going to say split it on the basis of Ramla. 16 00:01:23,010 --> 00:01:29,580 So if you are going to execute it now, you will see your beautiful scatterplot and here you have family 17 00:01:29,580 --> 00:01:31,620 status as well. 18 00:01:31,890 --> 00:01:35,270 So now you have to simply perform your hypothesis. 19 00:01:35,640 --> 00:01:42,390 So to perform your hypothesis, you have to import some of the library, because this is exactly part 20 00:01:42,390 --> 00:01:47,940 of scientific by time so far this you have to import your Sappi module. 21 00:01:48,240 --> 00:01:53,460 So here I'm going to say just import side by dot instead. 22 00:01:53,910 --> 00:02:03,070 And let's say I'm going to create ideas as let's say instead just executed and using this instead. 23 00:02:03,120 --> 00:02:07,080 So it is saying, yeah, it's not side by side by. 24 00:02:08,160 --> 00:02:15,570 So again, executed and using this instead you have to perform some kind of test over here depending 25 00:02:15,570 --> 00:02:18,090 upon your number of samples. 26 00:02:18,330 --> 00:02:19,170 So very fast. 27 00:02:19,170 --> 00:02:22,510 And is will define my two hypotheses over here. 28 00:02:23,100 --> 00:02:28,410 So if you're not over to the basics of hypotheses, the different types of T-test, I would recommend 29 00:02:28,410 --> 00:02:33,940 you to go to the back of this video and just clear your basic spec until I have uploaded a lectures 30 00:02:33,960 --> 00:02:37,950 on hypotheses as well as on the test. 31 00:02:38,160 --> 00:02:41,080 So very first I would define my two hypothesis. 32 00:02:41,460 --> 00:02:41,850 Let's see. 33 00:02:41,850 --> 00:02:44,900 The first one is my null hypothesis. 34 00:02:44,930 --> 00:02:55,350 So here I'm going to say my null hypothesis is basically, let's say age doesn't have impact on available 35 00:02:58,410 --> 00:03:00,050 personal loan. 36 00:03:00,360 --> 00:03:03,500 So my alternate will be reverse of this. 37 00:03:03,510 --> 00:03:06,370 And at that point, only one can be true. 38 00:03:06,630 --> 00:03:11,880 So let's say my alternative achey or you can be presented as well. 39 00:03:12,300 --> 00:03:16,380 So this is exactly my alternate hypothesis. 40 00:03:16,380 --> 00:03:20,730 You have to present it and you have to just remove this. 41 00:03:20,760 --> 00:03:21,540 Not so. 42 00:03:21,540 --> 00:03:25,040 This is exactly my alternate hypothesis. 43 00:03:25,050 --> 00:03:29,100 And now that if you need some data, you need some sample. 44 00:03:29,320 --> 00:03:34,430 So here I'm going to say very first, I have to access my personal loan. 45 00:03:34,440 --> 00:03:35,040 This one. 46 00:03:35,400 --> 00:03:40,940 And the very first condition is I have to access all the data where my personal loan equal to zero. 47 00:03:41,250 --> 00:03:43,580 So this is exactly my entire data. 48 00:03:43,800 --> 00:03:51,690 And on this data frame, basically I have to access the each data and let's say to convert it into array. 49 00:03:51,750 --> 00:03:57,510 I have to just call umpired or add it to convert all this stuff into. 50 00:03:58,530 --> 00:04:00,900 So this is exactly my very first edit. 51 00:04:01,140 --> 00:04:04,270 You can copy it, you can paste over here and that's it. 52 00:04:04,270 --> 00:04:09,620 At this time I want for Ajita having both the loan equal to one. 53 00:04:09,900 --> 00:04:15,000 So here I am going to say this is my let's say on underscore no. 54 00:04:15,720 --> 00:04:20,790 And for this, let's say I'm going to see it on this call. 55 00:04:21,000 --> 00:04:21,530 Yes. 56 00:04:22,110 --> 00:04:26,220 So just executed and now your data is ready for your. 57 00:04:26,490 --> 00:04:33,140 So just showing me gonged Azania here I have to assign underscored not a constant. 58 00:04:33,690 --> 00:04:39,240 So now my data is ready, my sample is ready to perform to sample data. 59 00:04:39,260 --> 00:04:40,590 So why does Hanbali test. 60 00:04:40,590 --> 00:04:43,380 Because here I have two samples. 61 00:04:43,390 --> 00:04:49,240 The very first one is my agent Schorno, and the second one is, of course, my age on the score. 62 00:04:49,260 --> 00:04:49,760 Yes. 63 00:04:49,890 --> 00:04:54,390 So two sample T dead will basically give me a quick overview of. 64 00:04:54,390 --> 00:04:54,660 Yeah. 65 00:04:54,720 --> 00:04:59,490 Whether there is a statistical difference between the mean of the board, the samples or not. 66 00:05:00,000 --> 00:05:07,560 For this, I have to call some of the functions from my steps, so for this I'm going to say, is that 67 00:05:08,460 --> 00:05:11,960 deep test not BASTABLE over here? 68 00:05:11,980 --> 00:05:19,160 You can get your data underscore I and the surpass shift plus tab. 69 00:05:19,170 --> 00:05:25,470 And here you have to pass here and B, which is exactly my data, which are exactly my sample for the 70 00:05:25,470 --> 00:05:29,980 very first one is exactly my age under called No. 71 00:05:30,420 --> 00:05:33,620 And the second one is exactly my age and a score. 72 00:05:33,890 --> 00:05:34,410 Yes. 73 00:05:34,720 --> 00:05:38,860 And now you have to pass your access parameter as well. 74 00:05:39,360 --> 00:05:46,680 And so it will basically told me to value the first one is my D and the second one is, of course, 75 00:05:46,680 --> 00:05:53,850 my B value, so that I'm going to say this is my business value and the first one is my destiny value. 76 00:05:53,850 --> 00:05:55,440 The second one is my P value. 77 00:05:55,680 --> 00:06:00,360 And P value is the most important value in this case. 78 00:06:00,630 --> 00:06:06,120 For now I'm going to say I'm going to simply put a condition over here and here. 79 00:06:06,120 --> 00:06:12,850 I'm going to say if P value is basically less than zero point zero five. 80 00:06:13,050 --> 00:06:14,730 So this is my condition. 81 00:06:14,940 --> 00:06:23,160 So in such case, I can say I have to print my this alternate hypothesis because then I will be valued 82 00:06:23,160 --> 00:06:24,650 less than zero point zero five. 83 00:06:24,810 --> 00:06:32,060 In such case, I have to reject my null hypothesis and to zero, which is exactly my null hypothesis. 84 00:06:32,430 --> 00:06:34,360 That's why I had to reject this. 85 00:06:34,620 --> 00:06:43,230 So now in the printed statement, I'm going to say I have to reject this at Ejiro and accept this edge. 86 00:06:43,920 --> 00:06:51,930 So very first, let's say I'm going to print this very first Etch A and after it, I'm going to say, 87 00:06:52,440 --> 00:07:06,720 as the P value is basically less than zero point zero five with the value of whatever value of B, I 88 00:07:06,720 --> 00:07:13,360 will get over here for here, I'm going to add placeholder and this time I'm going to add my format's 89 00:07:13,530 --> 00:07:14,280 function. 90 00:07:14,280 --> 00:07:20,250 And whatever placeholder here, it will basically receive value from this format. 91 00:07:20,460 --> 00:07:27,130 So here I'm going to say B, underscore value, whatever P value this will return. 92 00:07:27,390 --> 00:07:28,800 So this will get over here. 93 00:07:29,070 --> 00:07:33,930 And now in Ellsburg, basically I have to add a condition. 94 00:07:34,050 --> 00:07:37,630 So in Allsburg, my value will be greater than zero point zero five. 95 00:07:37,950 --> 00:07:43,270 And in such case, I have to print, let's say, at zero. 96 00:07:43,380 --> 00:07:49,280 So this time I'm going to say print, just print at zero. 97 00:07:49,290 --> 00:07:52,990 And now I have to print all this stuff together. 98 00:07:53,010 --> 00:08:01,260 Then here I have to say p value is basically greater than zero point zero five with a value of whatever 99 00:08:01,530 --> 00:08:04,890 value will be there to just execute it. 100 00:08:04,890 --> 00:08:07,380 And you will see it will reflect me. 101 00:08:07,840 --> 00:08:11,150 It doesn't have impact on our valving. 102 00:08:11,340 --> 00:08:16,750 Personal loan as a P value is greater than zero point zero five. 103 00:08:16,800 --> 00:08:21,390 It means our P value is basically greater than zero point zero five. 104 00:08:21,660 --> 00:08:24,840 It means this condition will come in this block. 105 00:08:24,870 --> 00:08:25,810 This as well. 106 00:08:26,010 --> 00:08:33,390 And if this will come to us, blog might as zero, which is this one, this will get printed and then 107 00:08:33,690 --> 00:08:41,790 this blocks of code will get printed with P value, whatever P value I will get receive from here. 108 00:08:42,030 --> 00:08:47,040 So this is how exactly your all loop will work over here. 109 00:08:47,160 --> 00:08:52,860 So let's move to the next problem statement in which I have to automate this hypothesis. 110 00:08:52,860 --> 00:08:54,470 I have to automate this stuff. 111 00:08:54,840 --> 00:08:58,950 So for this, let's say I'm going to say I'm going to define my function. 112 00:08:59,350 --> 00:09:03,030 Let's say the function name is hypothesis. 113 00:09:03,360 --> 00:09:05,380 So this function name is hypothesis. 114 00:09:05,820 --> 00:09:11,430 And now this time, whatever code I have written over here, I have to just import it. 115 00:09:11,700 --> 00:09:16,900 So very first, let's say I'm going to say I have to just import all this data. 116 00:09:17,130 --> 00:09:19,010 So here I have to pasted. 117 00:09:19,440 --> 00:09:22,990 And after that I have to also copy from here. 118 00:09:23,340 --> 00:09:26,370 So now this time I have to pay as well. 119 00:09:26,700 --> 00:09:34,950 So provide right indentation over here, here, here and here and here as well. 120 00:09:35,310 --> 00:09:38,000 So now in Sinéad use deadly force. 121 00:09:38,040 --> 00:09:42,140 I'm going to say columns on the second one is of course my column too. 122 00:09:42,450 --> 00:09:47,430 And then let's say you have to assign some of the hypotheses as well. 123 00:09:47,910 --> 00:09:57,090 So now I'm going to say my hypothesis is, let's say at zero and the first let's say at all, and the 124 00:09:57,090 --> 00:09:59,660 first one is my actual and the second one, which is. 125 00:09:59,760 --> 00:10:07,350 My alternate hypothesis I would represent it is Etch A and personal loan will get replaced by basically 126 00:10:07,350 --> 00:10:13,420 my columns on similarly over here, my personal will get replaced by column one. 127 00:10:13,560 --> 00:10:17,710 Similarly, this will get replaced by column two. 128 00:10:18,090 --> 00:10:21,230 This will also get replaced by column two. 129 00:10:21,420 --> 00:10:29,550 And now here you have to pass your added one and add it to let's say this is exactly why I added one 130 00:10:29,940 --> 00:10:37,910 and let's say this is exactly my added to and here I have to pass both the add over here. 131 00:10:38,160 --> 00:10:43,510 So the very first one is exactly why added one and the second one is exactly my area. 132 00:10:43,800 --> 00:10:46,980 So it will return me my details and my P value. 133 00:10:47,190 --> 00:10:51,280 And this is exactly my condition or what you can do, guys. 134 00:10:51,390 --> 00:10:58,290 You can simply put all these blocks of code in print and it will basically receive value from, let's 135 00:10:58,290 --> 00:11:04,870 say, a formula function similarly for at zero as well. 136 00:11:05,190 --> 00:11:15,060 So just remove this and now you have to write over here as at zero or you can write here as a placeholder 137 00:11:15,270 --> 00:11:17,310 and whatever value it will receive. 138 00:11:17,550 --> 00:11:24,720 So here I'm going to say it is my exactly at zero comma P-value and here is my achee. 139 00:11:24,990 --> 00:11:30,910 So similarly here you have to also assign your placeholder as well. 140 00:11:31,110 --> 00:11:32,640 So just executed. 141 00:11:32,640 --> 00:11:38,400 And now it is showing me Indentation Adder because here I have some in addition. 142 00:11:38,400 --> 00:11:42,190 And so you can provide that indentation. 143 00:11:42,210 --> 00:11:49,200 Similarly here provide right indentation because in addition added the will of course happen in Python 144 00:11:49,200 --> 00:11:50,880 if you don't provide right. 145 00:11:50,880 --> 00:11:51,580 Indentation. 146 00:11:51,990 --> 00:11:54,920 So now you have to just call this function. 147 00:11:54,930 --> 00:12:01,020 So here I'm going to say just call this function using hypotheses and just shift plus tab. 148 00:12:01,380 --> 00:12:04,860 So that column one is basically my personal loan. 149 00:12:05,050 --> 00:12:10,410 So here I'm going to see my personal loan is exactly my first column. 150 00:12:10,830 --> 00:12:18,450 And on the basis of column, do I have to assign this each column and now I have to assign my alternate 151 00:12:18,450 --> 00:12:21,220 hypotheses as well as null hypothesis. 152 00:12:21,570 --> 00:12:30,540 Let's say the first one is exactly my null hypothesis, and this is exactly your null hypothesis. 153 00:12:30,960 --> 00:12:33,240 Or you can copy from here. 154 00:12:33,450 --> 00:12:39,960 And let's say I'm just going to paste over here and now it's time for your alternate hypothesis. 155 00:12:40,410 --> 00:12:49,200 So although it will be this one and just copy from here and you can paste over here all these stuff 156 00:12:49,620 --> 00:12:54,360 to just execute it and it will give you a solution at age. 157 00:12:54,360 --> 00:13:02,010 Doesn't have impact on availing personal loan because here you will see your P value is greater than 158 00:13:02,010 --> 00:13:03,310 zero point five. 159 00:13:03,360 --> 00:13:07,120 That's why you have this type of confusion over here. 160 00:13:07,320 --> 00:13:11,070 So how exactly will see this is a power of the automation. 161 00:13:11,070 --> 00:13:14,730 You have a just call this function and you have to write this docs of code. 162 00:13:14,730 --> 00:13:17,250 You don't have to write it blocks of code again and again. 163 00:13:17,580 --> 00:13:20,860 Let's say you have to perform this hypothesis for. 164 00:13:20,890 --> 00:13:21,270 Yeah. 165 00:13:21,510 --> 00:13:26,090 Does income of a person have an impact on a willing loan or not? 166 00:13:26,400 --> 00:13:28,890 So what you have to do, you have to just copy this. 167 00:13:28,890 --> 00:13:34,860 Let's say I'm just going to copy this entire code and I'm just going to paste over here. 168 00:13:35,040 --> 00:13:39,810 And now I have to just play with some of the parameters. 169 00:13:40,110 --> 00:13:47,280 So here I'm going to say the first one column is exactly my personal loan, or you can assign it as 170 00:13:47,280 --> 00:13:47,630 well. 171 00:13:47,910 --> 00:13:49,680 So this is exactly my column one. 172 00:13:50,010 --> 00:13:54,240 And now in case of column two, I have, of course, my income. 173 00:13:54,570 --> 00:13:56,520 So I'm going to say income. 174 00:13:56,880 --> 00:14:06,900 And now on my null hypothesis, I'm going to say income doesn't have impact on availing personal loan. 175 00:14:07,200 --> 00:14:11,730 And in alternate, of course, I have income will impact. 176 00:14:11,760 --> 00:14:14,290 So just execute it and it will say me. 177 00:14:14,300 --> 00:14:14,730 Yeah. 178 00:14:14,760 --> 00:14:22,530 Income impact on the available personal loan, because here you will see your P value is somewhere close 179 00:14:22,530 --> 00:14:23,790 to zero. 180 00:14:24,060 --> 00:14:29,720 So if you are going to plot this scatterplot between personal loan and income, you can simply rely 181 00:14:29,760 --> 00:14:32,280 this effect of yeah. 182 00:14:32,280 --> 00:14:37,800 Why income has an impact on availing Boston law in a similar way. 183 00:14:37,800 --> 00:14:41,700 You can also do hypothesis for this one as well. 184 00:14:41,970 --> 00:14:46,920 We are the family side makes them do our loan or not. 185 00:14:47,220 --> 00:14:48,930 So just copy paste here. 186 00:14:49,290 --> 00:14:57,840 And in column one, I have, of course, my personal loan, but in case of it, I have basically my 187 00:14:57,840 --> 00:14:58,440 family. 188 00:14:58,730 --> 00:14:59,310 So here. 189 00:14:59,500 --> 00:15:09,080 I'm going to see my family and now here I'm going to say family doesn't have impact on of any personal 190 00:15:09,080 --> 00:15:10,210 alone wedding. 191 00:15:10,330 --> 00:15:16,500 In case of alternate hypothesis, I will say your family will definitely impact. 192 00:15:16,750 --> 00:15:19,180 So just execute it and you will see it. 193 00:15:19,180 --> 00:15:25,510 Will see my family will, of course, impact on our wedding lawn because you will see your P values, 194 00:15:26,200 --> 00:15:28,600 of course, approx to zero. 195 00:15:28,610 --> 00:15:33,760 You will see one point four into E to the power minus five. 196 00:15:33,760 --> 00:15:35,850 So it will be very nil value. 197 00:15:35,860 --> 00:15:39,570 That's why it is showing me our family have I impact on a really personal note. 198 00:15:39,940 --> 00:15:45,220 So I hope you will love this, all this analysis, how I'm going to automate all of this stuff rather 199 00:15:45,220 --> 00:15:47,770 than writing all this course again and again. 200 00:15:47,770 --> 00:15:52,990 I'm just going to call this function or if you want to play with it, what you guys can do. 201 00:15:53,320 --> 00:15:58,900 You guys simply autodidacts of code in your hypothesis function. 202 00:15:59,170 --> 00:16:04,650 And once you have this printed statement, you can also get your visual as well. 203 00:16:04,690 --> 00:16:06,010 So that's a pretty cool way. 204 00:16:06,220 --> 00:16:08,550 How exactly you can analyze your data. 205 00:16:09,160 --> 00:16:11,260 So hope you will love this analysis. 206 00:16:11,650 --> 00:16:12,320 Thank you. 207 00:16:12,390 --> 00:16:13,270 How nice day. 208 00:16:13,570 --> 00:16:14,320 Keep learning. 209 00:16:14,620 --> 00:16:15,310 Keep going. 210 00:16:15,520 --> 00:16:16,480 Keep practicing.