1 00:00:00,990 --> 00:00:07,110 So in the previous version, we have done some of the very interactive visas we have done of our various 2 00:00:07,140 --> 00:00:13,650 processing steps in our data, the move to features, because both of the features doesn't make sense 3 00:00:13,650 --> 00:00:15,760 at all for our analysis. 4 00:00:16,050 --> 00:00:22,350 So in this session, I'm going to consider this one assignment in which I have all these problems statement. 5 00:00:22,590 --> 00:00:29,100 The very first one is I had with life distribution of data in which I'm going to consider some of the 6 00:00:29,100 --> 00:00:30,560 particular columns. 7 00:00:30,870 --> 00:00:39,390 So let's say for this edition, I'm going to consider PARNAS Library for the retention purposes as well. 8 00:00:39,720 --> 00:00:41,910 But very first, we have already check. 9 00:00:41,920 --> 00:00:44,690 Yeah, it has some kind of a skewness in data. 10 00:00:45,030 --> 00:00:51,990 So let's say I'm going to check ready for what type of skewness in my each and every feature. 11 00:00:52,170 --> 00:00:59,790 So for this I have to just call this a skew function on my frame so it will return me Skewness in terms 12 00:00:59,790 --> 00:01:02,560 of positive in done some some negative values. 13 00:01:02,880 --> 00:01:10,280 So in case of income, you will see it has highest skewness in all these five features. 14 00:01:10,290 --> 00:01:13,860 You will see this shows positive skewness. 15 00:01:13,860 --> 00:01:21,990 It means it contains high positive outliers in data, whereas in case of age and in case of experience, 16 00:01:21,990 --> 00:01:30,150 you will see it contained very less number of low outliers in data that basically a type of conclusion 17 00:01:30,150 --> 00:01:35,940 you can drawn from this by just calling this a skew function on your data. 18 00:01:36,210 --> 00:01:41,670 So very first, let's say I'm going to check what are my data types of each and every column you will 19 00:01:41,670 --> 00:01:46,640 see all are of basically Intisar or of fraud. 20 00:01:47,010 --> 00:01:50,980 So the very first one is I have to visualize distributional data. 21 00:01:51,390 --> 00:02:00,450 So for this, I have to just call my hest function, just call his let's say I'm also going to set my 22 00:02:00,450 --> 00:02:02,730 own site, my own window side. 23 00:02:03,040 --> 00:02:08,700 So for this, you have to just set your custom parameter, which is fixie. 24 00:02:08,820 --> 00:02:14,580 And in this case, you have to set, let's say, a window of 20 call model. 25 00:02:14,850 --> 00:02:22,620 So just executed and you will see our distribution or I can see a histogram of each and every feature 26 00:02:22,950 --> 00:02:25,500 available in your data. 27 00:02:25,920 --> 00:02:33,690 So if you want to size of the colors so that you will make your histogram a little bit interactive so 28 00:02:33,690 --> 00:02:36,870 you can play with your color parameter as well. 29 00:02:36,990 --> 00:02:45,840 So if you want to inference from this histogram, you will see it and experience are to extend. 30 00:02:46,110 --> 00:02:55,260 You will see equally distributed whereas income and you will see this C C average, which is my credit 31 00:02:55,260 --> 00:03:04,380 card average, is due to the left, you see, whereas income and coverage expanding are just is skewed 32 00:03:04,380 --> 00:03:05,200 to the right. 33 00:03:05,460 --> 00:03:13,620 So it means it contains some high positive outliers in their entries or I can see in their features. 34 00:03:13,920 --> 00:03:21,330 So we have some more undergraduate than graduate from this education. 35 00:03:21,330 --> 00:03:28,320 You will see this one basically refers to undergraduate this to basically the first to graduate. 36 00:03:28,530 --> 00:03:36,420 And these three basically refers to some advanced and professional or I can see those persons that have 37 00:03:36,420 --> 00:03:41,320 done masters or those who have done some of the higher studies. 38 00:03:41,760 --> 00:03:50,730 So in case you will see in this online feature, you'll see this one basically refers to somewhere more 39 00:03:50,730 --> 00:03:55,890 than 50 to 60 percent of customers have enabled online banking. 40 00:03:56,130 --> 00:04:01,600 And given this that type of instance, you can drawn from your data. 41 00:04:01,860 --> 00:04:09,280 So let's say I'm going to visualize our distribution of basically my experience feature. 42 00:04:09,540 --> 00:04:14,310 So for this, I'm basically going to use this block and just execute it. 43 00:04:14,520 --> 00:04:18,530 So this thing you haven't imposed as a.. 44 00:04:18,810 --> 00:04:24,110 So what you guys can do and is going to import my Seiple library. 45 00:04:24,120 --> 00:04:27,040 So Seabourne as soon as I can. 46 00:04:27,120 --> 00:04:28,730 I don't have imported earlier. 47 00:04:29,070 --> 00:04:29,600 Yeah. 48 00:04:29,640 --> 00:04:30,740 This was an issue. 49 00:04:31,260 --> 00:04:40,380 So as an as so if you are again going to execute it now, you will see your beautiful distribution of 50 00:04:40,380 --> 00:04:41,430 your experience. 51 00:04:41,820 --> 00:04:45,270 So you will see it has contained some negative values. 52 00:04:45,510 --> 00:04:46,740 That's our main goal. 53 00:04:46,770 --> 00:04:54,400 That's how we have to deal with this feature so that it is available for a better analysis purposes. 54 00:04:54,780 --> 00:04:59,290 So let's say I'm going to check what exactly is the NE. 55 00:04:59,740 --> 00:05:10,270 This experience, so this got me and you will see it mean is somehow close to 20 so that whatever negative 56 00:05:10,270 --> 00:05:15,580 values and experience, I'm just going to store it in a separate Degerfeldt So for this I'm going to 57 00:05:15,580 --> 00:05:20,350 put a conditioner's deal of experience less than zero. 58 00:05:20,620 --> 00:05:22,440 And this is exactly my filter. 59 00:05:22,570 --> 00:05:27,190 Then I have to pass this filter in my face so that I will get my filter. 60 00:05:28,120 --> 00:05:37,240 So let's say the name is let's say negative on those called E x b so this is exactly my grapheme. 61 00:05:37,480 --> 00:05:46,090 So let's say I'm going to check ahead on my this data so you will see it contains all the negative values 62 00:05:46,090 --> 00:05:47,730 of experience. 63 00:05:48,040 --> 00:05:58,260 So for this, if I'm going to check here, what is our distribution of aid in this negative, the so-called 64 00:05:58,280 --> 00:05:59,060 XP? 65 00:05:59,840 --> 00:06:05,460 For this, I have to just call this part, which is my distribution plot. 66 00:06:05,980 --> 00:06:13,990 So just execute it and you will see this is basically a distribution of each in case of having negative 67 00:06:13,990 --> 00:06:14,950 experience. 68 00:06:14,950 --> 00:06:23,200 So you will see over here, most of them are those customers that are aged between 20 to 30 that are 69 00:06:23,200 --> 00:06:26,020 basically having negative experience. 70 00:06:26,110 --> 00:06:32,610 If you will think logically how it is possible a person can have negative experience. 71 00:06:32,830 --> 00:06:39,880 So that's atter in your data and that's how dottiness in your data and how you have to deal with those 72 00:06:39,880 --> 00:06:40,320 data. 73 00:06:40,330 --> 00:06:43,100 You have to pre-process those data. 74 00:06:43,360 --> 00:06:51,010 So let's say I'm going to check what exactly the mean right now of this negative one is called experience 75 00:06:51,010 --> 00:06:57,220 of the basically of, oh, let's say I'm going to check me out of experience. 76 00:06:57,640 --> 00:07:00,490 So of this experience, not me. 77 00:07:00,910 --> 00:07:08,380 So now you will see how much fluctuation we have in this mean, because right now Vimy has somewhere 78 00:07:08,380 --> 00:07:09,450 close to negative. 79 00:07:09,850 --> 00:07:12,310 And this is exactly the mean, right. 80 00:07:12,310 --> 00:07:13,170 Not positive. 81 00:07:13,180 --> 00:07:17,760 So you will see how much fluctuation in our data we have. 82 00:07:18,160 --> 00:07:27,150 Let's say I'm going to check here what are my total number of entries in this negative on a school experience. 83 00:07:27,580 --> 00:07:31,720 So you will see and results until six twenty four. 84 00:07:31,780 --> 00:07:33,490 So now you can visualize. 85 00:07:33,490 --> 00:07:42,130 Yeah, among five thousand entries I have somewhere around six, twenty four entries that has negative 86 00:07:42,130 --> 00:07:43,550 experience next year. 87 00:07:43,570 --> 00:07:47,920 I have to print this stuff so I'm going to say I have to just print this. 88 00:07:48,170 --> 00:07:56,710 So here I'm going to see there are a number of records, so there are a number of records which has 89 00:07:57,190 --> 00:08:08,310 basically negative values for experience and say I'm going to say approx. 90 00:08:08,320 --> 00:08:15,190 And here again, I'm going to add a placeholder and once I will add my placeholder, I will fill the 91 00:08:15,190 --> 00:08:19,710 values of placeholders using by my format function. 92 00:08:19,750 --> 00:08:22,750 So dot of format. 93 00:08:22,960 --> 00:08:30,520 And here basically very first, I have to add the size of this negative experience here. 94 00:08:30,520 --> 00:08:35,980 I'm going to say I have this much number of negative experience. 95 00:08:35,990 --> 00:08:37,390 So here I have to see this. 96 00:08:38,020 --> 00:08:45,370 So this one placeholder gets replaced by this one and this one second placeholder will get replaced 97 00:08:45,370 --> 00:08:48,770 by whatever I'm going to write over here. 98 00:08:48,940 --> 00:09:00,340 So here I'm going to see negative of experience, dark page divided by D.F., Dark Side, whatever will 99 00:09:00,340 --> 00:09:02,170 be the side of my data frame. 100 00:09:02,320 --> 00:09:10,050 And then I have to convert it in the form of percentage by basically multiplying it by one hundred. 101 00:09:10,330 --> 00:09:19,750 So I'm going to see into one grade and you have to just add over here, you prentice's and here one 102 00:09:19,750 --> 00:09:25,420 more parenthesis because you have to compute all the stuff in your parenthesis. 103 00:09:25,420 --> 00:09:33,150 So just execute it and you will see you have six twenty four required which has negative values for 104 00:09:33,160 --> 00:09:37,180 experience approx, one point zero four percent. 105 00:09:37,190 --> 00:09:40,560 You can add overhead here and design as well. 106 00:09:40,570 --> 00:09:46,040 So you will see one point zero four percent of data, which is raw right now. 107 00:09:46,060 --> 00:09:48,570 So anyhow, you have to deal with that. 108 00:09:48,580 --> 00:09:54,920 So very first, whatever I have contained in my div, I'm just going to copy in my data. 109 00:09:55,210 --> 00:09:57,490 So for this I have to call this copy. 110 00:09:57,790 --> 00:09:58,990 Just execute it. 111 00:09:58,990 --> 00:09:59,370 And if you. 112 00:09:59,400 --> 00:10:01,210 We are going to call ahead on your data. 113 00:10:01,230 --> 00:10:04,710 So this is exactly what your day was. 114 00:10:05,070 --> 00:10:09,770 So now you will see whatever negative value in your experience has. 115 00:10:10,110 --> 00:10:17,850 I'm just going to use on them by their function to change the negative values to mean value derived 116 00:10:17,850 --> 00:10:21,210 from data within the same age group. 117 00:10:21,540 --> 00:10:27,730 So for this, I have to just call a simple function as no by got. 118 00:10:28,440 --> 00:10:33,090 And here I have to see data of experience. 119 00:10:33,090 --> 00:10:34,810 You can test that as well. 120 00:10:35,160 --> 00:10:43,650 So wherever I have experience as negative, it means wherever I have experience as negative, just fill 121 00:10:43,680 --> 00:10:47,130 it with the meaning of the experience. 122 00:10:47,460 --> 00:10:50,690 Just fill it with the meaning of the experience. 123 00:10:51,360 --> 00:10:59,720 And whenever I don't have this negative experience it as what actually it is so forte's. 124 00:10:59,730 --> 00:11:05,440 I have to see that experience, then I have to update it as well. 125 00:11:05,670 --> 00:11:11,140 So data of X be regions of experience. 126 00:11:11,550 --> 00:11:19,500 So now just as you did and if you are going to call ahead on your data, let's say not had, let's say 127 00:11:19,510 --> 00:11:23,910 I'm going to check whether I have any negative value in my Netafim or not. 128 00:11:24,300 --> 00:11:32,010 So for this, I have to add this filter and now I have to just pass this filter in my data. 129 00:11:32,640 --> 00:11:40,670 So you will see it will return me a blank because you don't have any energy as negative right now. 130 00:11:40,680 --> 00:11:44,350 So that's all about decisions in the upcoming session. 131 00:11:44,700 --> 00:11:51,870 We are basically going to be analyzing this education feature, personal loan feature, as well as some 132 00:11:51,870 --> 00:11:54,030 more add ons visuals. 133 00:11:54,030 --> 00:12:01,530 And after that, I'm going to automate all these stuff using some basic code, using some functions 134 00:12:01,530 --> 00:12:02,270 in Python. 135 00:12:02,670 --> 00:12:04,340 So that's all about hope. 136 00:12:04,370 --> 00:12:05,210 So you love it. 137 00:12:05,550 --> 00:12:06,530 So thank you, guys. 138 00:12:06,540 --> 00:12:07,350 Have a nice day. 139 00:12:07,500 --> 00:12:10,740 Keep learning, keep growing and keep practicing.