1
00:00:00,060 --> 00:00:00,720
Our project.

2
00:00:02,350 --> 00:00:04,780
Let's first import important libraries.

3
00:00:06,980 --> 00:00:09,770
So first, we are importing bonders, Speedy.

4
00:00:11,740 --> 00:00:17,500
That is a machine learning laboratory in Britain that brought data structures of high level and a wide

5
00:00:17,500 --> 00:00:18,820
variety of tools for analysis.

6
00:00:20,670 --> 00:00:26,970
One of the great feature of this library is it has an ability to translate complex operations with data

7
00:00:27,270 --> 00:00:33,570
using just one or two commands, bonders, make sure that the entire process of manipulating data will

8
00:00:33,570 --> 00:00:35,340
be easier next.

9
00:00:35,340 --> 00:00:41,690
Very important number as and B number is considered as one of the most popular machine learning the

10
00:00:41,700 --> 00:00:48,090
ability then Zurfluh and many other Ledbetter's uses numbers internally to perform multiple operations.

11
00:00:48,900 --> 00:00:51,960
No, makes complex mathematical implementations.

12
00:00:51,960 --> 00:00:52,640
Very simple.

13
00:00:53,340 --> 00:00:58,260
Then we are importing matplotlib the by plot as bialy.

14
00:00:59,520 --> 00:01:04,950
Matplotlib Bayport is a platoon leader he used for duty graphics.

15
00:01:06,060 --> 00:01:13,350
Then they're embodying Seabourne as as inUS Seabourne is also a visualisations librarian, Biton, it

16
00:01:13,350 --> 00:01:15,060
is built on top of microglia.

17
00:01:15,810 --> 00:01:22,890
If matplotlib tries to make easy things, easy and hard things possible, Seabourne tries to make well-defined

18
00:01:22,890 --> 00:01:24,510
set of things easy.

19
00:01:24,750 --> 00:01:27,390
Seabourne has many features comparing to microglia.

20
00:01:28,110 --> 00:01:32,550
Then we're importing accuracy's good from and scaling that matrix.

21
00:01:32,970 --> 00:01:34,370
Import Accuracy's.

22
00:01:34,380 --> 00:01:39,990
Could we use accuracy score to check our model's accuracy then import warning's.

23
00:01:41,350 --> 00:01:43,910
Warnings, dot warnings ignored.

24
00:01:44,740 --> 00:01:46,750
So if you have any updated library.

25
00:01:47,820 --> 00:01:49,440
So when we run the cells.

26
00:01:50,580 --> 00:01:55,170
If there is any outdated library, then it will show to updated.

27
00:01:56,040 --> 00:02:01,680
So if you use warnings, dot filter warnings ignored, so it will ignore all the warnings.

28
00:02:01,980 --> 00:02:04,470
So it will ignore only warnings, not errors.

29
00:02:05,460 --> 00:02:08,000
Randall Elbrus shifting to the.

30
00:02:09,900 --> 00:02:11,240
Knowledge, our dataset.

31
00:02:13,650 --> 00:02:18,570
So data citizens, yes, information, so our filing is financial and data.

32
00:02:20,580 --> 00:02:30,330
So in order to lower this year to file, we are using pandas so we don't read underscores ESV and in

33
00:02:30,330 --> 00:02:36,390
brackets, we have final financial data and we are storing this in the.

34
00:02:38,210 --> 00:02:40,490
And to see top photos in our data from.

35
00:02:41,610 --> 00:02:43,530
The writing the DOT had.

36
00:02:45,330 --> 00:02:50,700
So they've got a head function, gives us top five rules and information about top Fayrouz.

37
00:02:51,820 --> 00:02:54,500
Now, later, Anderson, I should point out to Randall.

38
00:03:00,390 --> 00:03:07,770
So here we have Collum names us into Idy age, Batia Deal, which is weekly, biweekly and homeowner

39
00:03:09,010 --> 00:03:12,300
the house are not and their income.

40
00:03:13,640 --> 00:03:18,410
And moms employed in total years employed and their current address, your.

41
00:03:19,430 --> 00:03:24,230
So from how much time they're staying at that address and their personal account details.

42
00:03:26,140 --> 00:03:27,880
So there are so many columns, Hoyer.

43
00:03:30,440 --> 00:03:33,110
So it was Fallone.

44
00:03:36,400 --> 00:03:37,390
And then rescuers.

45
00:03:39,420 --> 00:03:40,890
There are many risks, of course.

46
00:03:44,340 --> 00:03:45,600
And then we have U.S..

47
00:03:47,590 --> 00:03:49,680
That they have signed on.

48
00:03:50,440 --> 00:03:52,370
So this is our level.

49
00:03:53,080 --> 00:03:56,560
So all these are our features and this is our level.

50
00:03:58,140 --> 00:04:02,240
So using all these features, we need to predict this is using.

51
00:04:07,510 --> 00:04:12,800
So in our label, we have one zero, which means sign or not sign.

52
00:04:13,600 --> 00:04:15,430
Let's get the count for signed.

53
00:04:15,650 --> 00:04:18,400
I'm not saying let's proceed for the.

54
00:04:22,360 --> 00:04:29,300
So here in this segment, they're printing total number of signed, which is equal to zero.

55
00:04:29,620 --> 00:04:33,630
So from data from D.F., you are taking column you signed.

56
00:04:34,210 --> 00:04:41,650
So in that column, your condition has equal to equal to one, which means in its own column it will

57
00:04:41,650 --> 00:04:45,250
print count of all rules, which has one.

58
00:04:47,100 --> 00:04:48,270
So here it will bring.

59
00:04:49,830 --> 00:04:54,560
All roads which have zero hits, Anderson and Anderson.

60
00:04:56,140 --> 00:05:01,300
So here we have a total of nine thousand six hundred ten people who have signed, which is equal to

61
00:05:01,540 --> 00:05:05,560
one and eight thousand to 16 and people who have not signed.

62
00:05:07,370 --> 00:05:10,700
Now, let's check the percentage of missing data in each column.

63
00:05:11,120 --> 00:05:14,600
So for that, we are taking our data from seasonal.

64
00:05:15,730 --> 00:05:17,500
Good values and not some.

65
00:05:18,620 --> 00:05:20,360
To give some of our land values.

66
00:05:21,330 --> 00:05:29,580
In two hundred, divided by Lindop over what is needed of them, and we are storing a 10 percent and

67
00:05:30,060 --> 00:05:30,600
missing.

68
00:05:34,470 --> 00:05:40,350
Then are creating another bit of room with the name missing underscore value, underscore veev is going

69
00:05:40,350 --> 00:05:42,150
to be deducted from.

70
00:05:43,480 --> 00:05:52,480
And here we are giving it this very IT person underscore missing this additional key name and person

71
00:05:52,480 --> 00:05:54,390
underscored missing are the values.

72
00:05:55,210 --> 00:06:00,790
So all the values which are important underscore missing we are using here and we are starting this

73
00:06:00,790 --> 00:06:02,800
in missing and got value.

74
00:06:02,800 --> 00:06:08,860
And it could be a sort of visualized, missing, underscored value in the scope of your writing if your.

75
00:06:10,330 --> 00:06:11,230
I said, Anderson.

76
00:06:16,840 --> 00:06:22,570
So we have zero percent missing values, values in all the columns.

77
00:06:24,390 --> 00:06:25,650
OK, let's put aside for the.

78
00:06:29,370 --> 00:06:35,730
Now, let's create another data set with the name dataset to Anderlecht to remove unnecessary columns

79
00:06:35,730 --> 00:06:36,160
from here.

80
00:06:36,960 --> 00:06:45,300
So to remove unnecessary columns right there, that drop and the columns is equal to column names.

81
00:06:46,600 --> 00:06:50,980
So we are removing these columns from our beef and storing it in the to.

82
00:06:54,020 --> 00:06:57,550
And also, let's create a histogram for our columns.

83
00:06:58,160 --> 00:07:00,770
First, go configure our figure site.

84
00:07:02,110 --> 00:07:05,140
So that figure figure is equal to.

85
00:07:07,150 --> 00:07:07,450
Well.

86
00:07:09,950 --> 00:07:18,170
Not subplot for all the columns, and let's give the subtitle as histograms of numerical columns.

87
00:07:19,110 --> 00:07:20,880
And says is colleague Wendy.

88
00:07:22,670 --> 00:07:24,320
So here we are creating a for loop.

89
00:07:25,770 --> 00:07:29,460
For I in range of data to the ship of one.

90
00:07:31,280 --> 00:07:34,330
Do not subplot six rows, three columns.

91
00:07:36,360 --> 00:07:38,900
And that physical reality that G.K.

92
00:07:39,890 --> 00:07:49,910
And if the SEC title deeds are to dot column dot values, I saw this look will run on all column names.

93
00:07:49,910 --> 00:07:51,440
So I hear is the column name.

94
00:07:53,030 --> 00:08:02,090
So it will print that column for that title, so value is equal to impetuses dataset, to the dialogue

95
00:08:02,540 --> 00:08:03,380
all rose.

96
00:08:04,680 --> 00:08:05,830
I column.

97
00:08:06,800 --> 00:08:08,210
So I thought I'd the.

98
00:08:09,150 --> 00:08:10,380
So I call on them.

99
00:08:11,490 --> 00:08:14,580
So this will repeat for all columns, not unique.

100
00:08:15,780 --> 00:08:23,850
So if Valse is greater than or equal 200, then Multiskilled 100, not exploit the Instagram, so build

101
00:08:23,850 --> 00:08:27,120
it up histo data say that I log.

102
00:08:28,560 --> 00:08:35,870
All rules, and I tried to call them, I call them and Benz's call Duval's and color is equal to this

103
00:08:35,880 --> 00:08:37,230
color code, which is blue.

104
00:08:38,130 --> 00:08:46,590
So finally, we are going to call out a rectangle is equal to zero comma zero point zero three comma

105
00:08:46,590 --> 00:08:48,870
one zero point nine five.

106
00:08:50,460 --> 00:08:54,120
Knowledge that Anderson Elbrus shifted from little.

107
00:08:55,430 --> 00:08:58,180
So this is the histogram for all of our plants.

108
00:08:59,360 --> 00:09:00,410
First we have age.

109
00:09:01,820 --> 00:09:08,510
So in this program here, we have value and here we have found so mostly the value is between 20 to

110
00:09:08,510 --> 00:09:13,340
70, which means that the maximum people from 20 to 70 years of age.

111
00:09:13,580 --> 00:09:14,240
And here is the.

112
00:09:16,280 --> 00:09:20,960
So there are a thousand people of today's age, there are more than a thousand people who are forty

113
00:09:21,020 --> 00:09:21,860
five years of age.

114
00:09:23,770 --> 00:09:25,160
Before this all this blood.

115
00:09:26,760 --> 00:09:29,640
It is the value and it is a value.

116
00:09:31,630 --> 00:09:32,070
Just in.

117
00:09:33,420 --> 00:09:40,560
So we more than 750 people have income, more than three thousand, approximately three thousand.

118
00:09:43,160 --> 00:09:46,700
Similarly, for all columns, we have different values.

119
00:09:48,060 --> 00:09:49,190
With different on.

120
00:09:53,230 --> 00:09:56,630
So this is the risk score and this is the amount requested.

121
00:09:57,040 --> 00:10:04,330
So among the more than 2000 people requested one thousand.

122
00:10:05,960 --> 00:10:11,450
Now, let's check the correlation between our futures and levels so generally used to create heightmap,

123
00:10:11,630 --> 00:10:15,200
but today we're going to create a lot of correlation.

124
00:10:16,470 --> 00:10:17,430
Let's proceed further.

125
00:10:19,430 --> 00:10:27,870
So you're you're creating correlation, Bacolod, so we're taking data to that correlation with the

126
00:10:28,190 --> 00:10:28,700
design.

127
00:10:29,000 --> 00:10:32,270
So here we are correlating data to features.

128
00:10:33,860 --> 00:10:38,810
With the U.S. Collomb, so U.S. Collum from the.

129
00:10:40,390 --> 00:10:41,650
Dot, dot, dot.

130
00:10:41,680 --> 00:10:46,540
But so we're creating a plot and figures are used for more than.

131
00:10:47,550 --> 00:10:56,850
And we're creating a title as good relations with design, just our DNA is our column name, and the

132
00:10:56,850 --> 00:10:58,740
font size is equal to 20.

133
00:10:59,610 --> 00:11:05,550
Criticism's true colors is equal to being green, blue and magenta.

134
00:11:06,770 --> 00:11:13,010
So these are the colors, so to check all the parameters and all the values which are used in this,

135
00:11:13,280 --> 00:11:18,440
not just click in between this parenthesis and Shifta type.

136
00:11:23,260 --> 00:11:27,280
So in this vertical bar, practically, sure, all the parameters on the.

137
00:11:29,600 --> 00:11:32,030
Which can be used in this butler.

138
00:11:35,110 --> 00:11:38,020
Monarchs are undersell Elbrus shift and retronasal.

139
00:11:43,720 --> 00:11:45,910
So we have created a backlog.

140
00:11:53,020 --> 00:11:57,430
So these are the column names which are related with.

141
00:11:59,780 --> 00:12:00,640
Or is I'm.

142
00:12:06,670 --> 00:12:14,320
So each and current address here is negatively geared, mostly negatively correlated.

143
00:12:18,310 --> 00:12:21,550
And this one and this one, so among registered.

144
00:12:24,140 --> 00:12:30,500
And the personal account has a more positive correlation with the E signed column.

145
00:12:33,040 --> 00:12:36,490
And the US score is like zero point.

146
00:12:37,510 --> 00:12:38,400
Zero one zero.

147
00:12:39,730 --> 00:12:44,320
Not much got injured and household income was also negatively correlated.

148
00:12:45,450 --> 00:12:48,410
Lettuce is also mostly made routinely, according to.

149
00:12:52,380 --> 00:12:59,370
So from this blog, we can see that you seem to have high negative correlation with variable age and

150
00:12:59,610 --> 00:13:00,690
current address.

151
00:13:02,040 --> 00:13:03,510
I can't address your.

152
00:13:04,770 --> 00:13:08,600
And have positive correlation with the amount requested.