1
00:00:00,990 --> 00:00:07,110
So in the previous version, we have done some of the very interactive visas we have done of our various

2
00:00:07,140 --> 00:00:13,650
processing steps in our data, the move to features, because both of the features doesn't make sense

3
00:00:13,650 --> 00:00:15,760
at all for our analysis.

4
00:00:16,050 --> 00:00:22,350
So in this session, I'm going to consider this one assignment in which I have all these problems statement.

5
00:00:22,590 --> 00:00:29,100
The very first one is I had with life distribution of data in which I'm going to consider some of the

6
00:00:29,100 --> 00:00:30,560
particular columns.

7
00:00:30,870 --> 00:00:39,390
So let's say for this edition, I'm going to consider PARNAS Library for the retention purposes as well.

8
00:00:39,720 --> 00:00:41,910
But very first, we have already check.

9
00:00:41,920 --> 00:00:44,690
Yeah, it has some kind of a skewness in data.

10
00:00:45,030 --> 00:00:51,990
So let's say I'm going to check ready for what type of skewness in my each and every feature.

11
00:00:52,170 --> 00:00:59,790
So for this I have to just call this a skew function on my frame so it will return me Skewness in terms

12
00:00:59,790 --> 00:01:02,560
of positive in done some some negative values.

13
00:01:02,880 --> 00:01:10,280
So in case of income, you will see it has highest skewness in all these five features.

14
00:01:10,290 --> 00:01:13,860
You will see this shows positive skewness.

15
00:01:13,860 --> 00:01:21,990
It means it contains high positive outliers in data, whereas in case of age and in case of experience,

16
00:01:21,990 --> 00:01:30,150
you will see it contained very less number of low outliers in data that basically a type of conclusion

17
00:01:30,150 --> 00:01:35,940
you can drawn from this by just calling this a skew function on your data.

18
00:01:36,210 --> 00:01:41,670
So very first, let's say I'm going to check what are my data types of each and every column you will

19
00:01:41,670 --> 00:01:46,640
see all are of basically Intisar or of fraud.

20
00:01:47,010 --> 00:01:50,980
So the very first one is I have to visualize distributional data.

21
00:01:51,390 --> 00:02:00,450
So for this, I have to just call my hest function, just call his let's say I'm also going to set my

22
00:02:00,450 --> 00:02:02,730
own site, my own window side.

23
00:02:03,040 --> 00:02:08,700
So for this, you have to just set your custom parameter, which is fixie.

24
00:02:08,820 --> 00:02:14,580
And in this case, you have to set, let's say, a window of 20 call model.

25
00:02:14,850 --> 00:02:22,620
So just executed and you will see our distribution or I can see a histogram of each and every feature

26
00:02:22,950 --> 00:02:25,500
available in your data.

27
00:02:25,920 --> 00:02:33,690
So if you want to size of the colors so that you will make your histogram a little bit interactive so

28
00:02:33,690 --> 00:02:36,870
you can play with your color parameter as well.

29
00:02:36,990 --> 00:02:45,840
So if you want to inference from this histogram, you will see it and experience are to extend.

30
00:02:46,110 --> 00:02:55,260
You will see equally distributed whereas income and you will see this C C average, which is my credit

31
00:02:55,260 --> 00:03:04,380
card average, is due to the left, you see, whereas income and coverage expanding are just is skewed

32
00:03:04,380 --> 00:03:05,200
to the right.

33
00:03:05,460 --> 00:03:13,620
So it means it contains some high positive outliers in their entries or I can see in their features.

34
00:03:13,920 --> 00:03:21,330
So we have some more undergraduate than graduate from this education.

35
00:03:21,330 --> 00:03:28,320
You will see this one basically refers to undergraduate this to basically the first to graduate.

36
00:03:28,530 --> 00:03:36,420
And these three basically refers to some advanced and professional or I can see those persons that have

37
00:03:36,420 --> 00:03:41,320
done masters or those who have done some of the higher studies.

38
00:03:41,760 --> 00:03:50,730
So in case you will see in this online feature, you'll see this one basically refers to somewhere more

39
00:03:50,730 --> 00:03:55,890
than 50 to 60 percent of customers have enabled online banking.

40
00:03:56,130 --> 00:04:01,600
And given this that type of instance, you can drawn from your data.

41
00:04:01,860 --> 00:04:09,280
So let's say I'm going to visualize our distribution of basically my experience feature.

42
00:04:09,540 --> 00:04:14,310
So for this, I'm basically going to use this block and just execute it.

43
00:04:14,520 --> 00:04:18,530
So this thing you haven't imposed as a..

44
00:04:18,810 --> 00:04:24,110
So what you guys can do and is going to import my Seiple library.

45
00:04:24,120 --> 00:04:27,040
So Seabourne as soon as I can.

46
00:04:27,120 --> 00:04:28,730
I don't have imported earlier.

47
00:04:29,070 --> 00:04:29,600
Yeah.

48
00:04:29,640 --> 00:04:30,740
This was an issue.

49
00:04:31,260 --> 00:04:40,380
So as an as so if you are again going to execute it now, you will see your beautiful distribution of

50
00:04:40,380 --> 00:04:41,430
your experience.

51
00:04:41,820 --> 00:04:45,270
So you will see it has contained some negative values.

52
00:04:45,510 --> 00:04:46,740
That's our main goal.

53
00:04:46,770 --> 00:04:54,400
That's how we have to deal with this feature so that it is available for a better analysis purposes.

54
00:04:54,780 --> 00:04:59,290
So let's say I'm going to check what exactly is the NE.

55
00:04:59,740 --> 00:05:10,270
This experience, so this got me and you will see it mean is somehow close to 20 so that whatever negative

56
00:05:10,270 --> 00:05:15,580
values and experience, I'm just going to store it in a separate Degerfeldt So for this I'm going to

57
00:05:15,580 --> 00:05:20,350
put a conditioner's deal of experience less than zero.

58
00:05:20,620 --> 00:05:22,440
And this is exactly my filter.

59
00:05:22,570 --> 00:05:27,190
Then I have to pass this filter in my face so that I will get my filter.

60
00:05:28,120 --> 00:05:37,240
So let's say the name is let's say negative on those called E x b so this is exactly my grapheme.

61
00:05:37,480 --> 00:05:46,090
So let's say I'm going to check ahead on my this data so you will see it contains all the negative values

62
00:05:46,090 --> 00:05:47,730
of experience.

63
00:05:48,040 --> 00:05:58,260
So for this, if I'm going to check here, what is our distribution of aid in this negative, the so-called

64
00:05:58,280 --> 00:05:59,060
XP?

65
00:05:59,840 --> 00:06:05,460
For this, I have to just call this part, which is my distribution plot.

66
00:06:05,980 --> 00:06:13,990
So just execute it and you will see this is basically a distribution of each in case of having negative

67
00:06:13,990 --> 00:06:14,950
experience.

68
00:06:14,950 --> 00:06:23,200
So you will see over here, most of them are those customers that are aged between 20 to 30 that are

69
00:06:23,200 --> 00:06:26,020
basically having negative experience.

70
00:06:26,110 --> 00:06:32,610
If you will think logically how it is possible a person can have negative experience.

71
00:06:32,830 --> 00:06:39,880
So that's atter in your data and that's how dottiness in your data and how you have to deal with those

72
00:06:39,880 --> 00:06:40,320
data.

73
00:06:40,330 --> 00:06:43,100
You have to pre-process those data.

74
00:06:43,360 --> 00:06:51,010
So let's say I'm going to check what exactly the mean right now of this negative one is called experience

75
00:06:51,010 --> 00:06:57,220
of the basically of, oh, let's say I'm going to check me out of experience.

76
00:06:57,640 --> 00:07:00,490
So of this experience, not me.

77
00:07:00,910 --> 00:07:08,380
So now you will see how much fluctuation we have in this mean, because right now Vimy has somewhere

78
00:07:08,380 --> 00:07:09,450
close to negative.

79
00:07:09,850 --> 00:07:12,310
And this is exactly the mean, right.

80
00:07:12,310 --> 00:07:13,170
Not positive.

81
00:07:13,180 --> 00:07:17,760
So you will see how much fluctuation in our data we have.

82
00:07:18,160 --> 00:07:27,150
Let's say I'm going to check here what are my total number of entries in this negative on a school experience.

83
00:07:27,580 --> 00:07:31,720
So you will see and results until six twenty four.

84
00:07:31,780 --> 00:07:33,490
So now you can visualize.

85
00:07:33,490 --> 00:07:42,130
Yeah, among five thousand entries I have somewhere around six, twenty four entries that has negative

86
00:07:42,130 --> 00:07:43,550
experience next year.

87
00:07:43,570 --> 00:07:47,920
I have to print this stuff so I'm going to say I have to just print this.

88
00:07:48,170 --> 00:07:56,710
So here I'm going to see there are a number of records, so there are a number of records which has

89
00:07:57,190 --> 00:08:08,310
basically negative values for experience and say I'm going to say approx.

90
00:08:08,320 --> 00:08:15,190
And here again, I'm going to add a placeholder and once I will add my placeholder, I will fill the

91
00:08:15,190 --> 00:08:19,710
values of placeholders using by my format function.

92
00:08:19,750 --> 00:08:22,750
So dot of format.

93
00:08:22,960 --> 00:08:30,520
And here basically very first, I have to add the size of this negative experience here.

94
00:08:30,520 --> 00:08:35,980
I'm going to say I have this much number of negative experience.

95
00:08:35,990 --> 00:08:37,390
So here I have to see this.

96
00:08:38,020 --> 00:08:45,370
So this one placeholder gets replaced by this one and this one second placeholder will get replaced

97
00:08:45,370 --> 00:08:48,770
by whatever I'm going to write over here.

98
00:08:48,940 --> 00:09:00,340
So here I'm going to see negative of experience, dark page divided by D.F., Dark Side, whatever will

99
00:09:00,340 --> 00:09:02,170
be the side of my data frame.

100
00:09:02,320 --> 00:09:10,050
And then I have to convert it in the form of percentage by basically multiplying it by one hundred.

101
00:09:10,330 --> 00:09:19,750
So I'm going to see into one grade and you have to just add over here, you prentice's and here one

102
00:09:19,750 --> 00:09:25,420
more parenthesis because you have to compute all the stuff in your parenthesis.

103
00:09:25,420 --> 00:09:33,150
So just execute it and you will see you have six twenty four required which has negative values for

104
00:09:33,160 --> 00:09:37,180
experience approx, one point zero four percent.

105
00:09:37,190 --> 00:09:40,560
You can add overhead here and design as well.

106
00:09:40,570 --> 00:09:46,040
So you will see one point zero four percent of data, which is raw right now.

107
00:09:46,060 --> 00:09:48,570
So anyhow, you have to deal with that.

108
00:09:48,580 --> 00:09:54,920
So very first, whatever I have contained in my div, I'm just going to copy in my data.

109
00:09:55,210 --> 00:09:57,490
So for this I have to call this copy.

110
00:09:57,790 --> 00:09:58,990
Just execute it.

111
00:09:58,990 --> 00:09:59,370
And if you.

112
00:09:59,400 --> 00:10:01,210
We are going to call ahead on your data.

113
00:10:01,230 --> 00:10:04,710
So this is exactly what your day was.

114
00:10:05,070 --> 00:10:09,770
So now you will see whatever negative value in your experience has.

115
00:10:10,110 --> 00:10:17,850
I'm just going to use on them by their function to change the negative values to mean value derived

116
00:10:17,850 --> 00:10:21,210
from data within the same age group.

117
00:10:21,540 --> 00:10:27,730
So for this, I have to just call a simple function as no by got.

118
00:10:28,440 --> 00:10:33,090
And here I have to see data of experience.

119
00:10:33,090 --> 00:10:34,810
You can test that as well.

120
00:10:35,160 --> 00:10:43,650
So wherever I have experience as negative, it means wherever I have experience as negative, just fill

121
00:10:43,680 --> 00:10:47,130
it with the meaning of the experience.

122
00:10:47,460 --> 00:10:50,690
Just fill it with the meaning of the experience.

123
00:10:51,360 --> 00:10:59,720
And whenever I don't have this negative experience it as what actually it is so forte's.

124
00:10:59,730 --> 00:11:05,440
I have to see that experience, then I have to update it as well.

125
00:11:05,670 --> 00:11:11,140
So data of X be regions of experience.

126
00:11:11,550 --> 00:11:19,500
So now just as you did and if you are going to call ahead on your data, let's say not had, let's say

127
00:11:19,510 --> 00:11:23,910
I'm going to check whether I have any negative value in my Netafim or not.

128
00:11:24,300 --> 00:11:32,010
So for this, I have to add this filter and now I have to just pass this filter in my data.

129
00:11:32,640 --> 00:11:40,670
So you will see it will return me a blank because you don't have any energy as negative right now.

130
00:11:40,680 --> 00:11:44,350
So that's all about decisions in the upcoming session.

131
00:11:44,700 --> 00:11:51,870
We are basically going to be analyzing this education feature, personal loan feature, as well as some

132
00:11:51,870 --> 00:11:54,030
more add ons visuals.

133
00:11:54,030 --> 00:12:01,530
And after that, I'm going to automate all these stuff using some basic code, using some functions

134
00:12:01,530 --> 00:12:02,270
in Python.

135
00:12:02,670 --> 00:12:04,340
So that's all about hope.

136
00:12:04,370 --> 00:12:05,210
So you love it.

137
00:12:05,550 --> 00:12:06,530
So thank you, guys.

138
00:12:06,540 --> 00:12:07,350
Have a nice day.

139
00:12:07,500 --> 00:12:10,740
Keep learning, keep growing and keep practicing.