1 00:00:00,060 --> 00:00:00,720 Our project. 2 00:00:02,350 --> 00:00:04,780 Let's first import important libraries. 3 00:00:06,980 --> 00:00:09,770 So first, we are importing bonders, Speedy. 4 00:00:11,740 --> 00:00:17,500 That is a machine learning laboratory in Britain that brought data structures of high level and a wide 5 00:00:17,500 --> 00:00:18,820 variety of tools for analysis. 6 00:00:20,670 --> 00:00:26,970 One of the great feature of this library is it has an ability to translate complex operations with data 7 00:00:27,270 --> 00:00:33,570 using just one or two commands, bonders, make sure that the entire process of manipulating data will 8 00:00:33,570 --> 00:00:35,340 be easier next. 9 00:00:35,340 --> 00:00:41,690 Very important number as and B number is considered as one of the most popular machine learning the 10 00:00:41,700 --> 00:00:48,090 ability then Zurfluh and many other Ledbetter's uses numbers internally to perform multiple operations. 11 00:00:48,900 --> 00:00:51,960 No, makes complex mathematical implementations. 12 00:00:51,960 --> 00:00:52,640 Very simple. 13 00:00:53,340 --> 00:00:58,260 Then we are importing matplotlib the by plot as bialy. 14 00:00:59,520 --> 00:01:04,950 Matplotlib Bayport is a platoon leader he used for duty graphics. 15 00:01:06,060 --> 00:01:13,350 Then they're embodying Seabourne as as inUS Seabourne is also a visualisations librarian, Biton, it 16 00:01:13,350 --> 00:01:15,060 is built on top of microglia. 17 00:01:15,810 --> 00:01:22,890 If matplotlib tries to make easy things, easy and hard things possible, Seabourne tries to make well-defined 18 00:01:22,890 --> 00:01:24,510 set of things easy. 19 00:01:24,750 --> 00:01:27,390 Seabourne has many features comparing to microglia. 20 00:01:28,110 --> 00:01:32,550 Then we're importing accuracy's good from and scaling that matrix. 21 00:01:32,970 --> 00:01:34,370 Import Accuracy's. 22 00:01:34,380 --> 00:01:39,990 Could we use accuracy score to check our model's accuracy then import warning's. 23 00:01:41,350 --> 00:01:43,910 Warnings, dot warnings ignored. 24 00:01:44,740 --> 00:01:46,750 So if you have any updated library. 25 00:01:47,820 --> 00:01:49,440 So when we run the cells. 26 00:01:50,580 --> 00:01:55,170 If there is any outdated library, then it will show to updated. 27 00:01:56,040 --> 00:02:01,680 So if you use warnings, dot filter warnings ignored, so it will ignore all the warnings. 28 00:02:01,980 --> 00:02:04,470 So it will ignore only warnings, not errors. 29 00:02:05,460 --> 00:02:08,000 Randall Elbrus shifting to the. 30 00:02:09,900 --> 00:02:11,240 Knowledge, our dataset. 31 00:02:13,650 --> 00:02:18,570 So data citizens, yes, information, so our filing is financial and data. 32 00:02:20,580 --> 00:02:30,330 So in order to lower this year to file, we are using pandas so we don't read underscores ESV and in 33 00:02:30,330 --> 00:02:36,390 brackets, we have final financial data and we are storing this in the. 34 00:02:38,210 --> 00:02:40,490 And to see top photos in our data from. 35 00:02:41,610 --> 00:02:43,530 The writing the DOT had. 36 00:02:45,330 --> 00:02:50,700 So they've got a head function, gives us top five rules and information about top Fayrouz. 37 00:02:51,820 --> 00:02:54,500 Now, later, Anderson, I should point out to Randall. 38 00:03:00,390 --> 00:03:07,770 So here we have Collum names us into Idy age, Batia Deal, which is weekly, biweekly and homeowner 39 00:03:09,010 --> 00:03:12,300 the house are not and their income. 40 00:03:13,640 --> 00:03:18,410 And moms employed in total years employed and their current address, your. 41 00:03:19,430 --> 00:03:24,230 So from how much time they're staying at that address and their personal account details. 42 00:03:26,140 --> 00:03:27,880 So there are so many columns, Hoyer. 43 00:03:30,440 --> 00:03:33,110 So it was Fallone. 44 00:03:36,400 --> 00:03:37,390 And then rescuers. 45 00:03:39,420 --> 00:03:40,890 There are many risks, of course. 46 00:03:44,340 --> 00:03:45,600 And then we have U.S.. 47 00:03:47,590 --> 00:03:49,680 That they have signed on. 48 00:03:50,440 --> 00:03:52,370 So this is our level. 49 00:03:53,080 --> 00:03:56,560 So all these are our features and this is our level. 50 00:03:58,140 --> 00:04:02,240 So using all these features, we need to predict this is using. 51 00:04:07,510 --> 00:04:12,800 So in our label, we have one zero, which means sign or not sign. 52 00:04:13,600 --> 00:04:15,430 Let's get the count for signed. 53 00:04:15,650 --> 00:04:18,400 I'm not saying let's proceed for the. 54 00:04:22,360 --> 00:04:29,300 So here in this segment, they're printing total number of signed, which is equal to zero. 55 00:04:29,620 --> 00:04:33,630 So from data from D.F., you are taking column you signed. 56 00:04:34,210 --> 00:04:41,650 So in that column, your condition has equal to equal to one, which means in its own column it will 57 00:04:41,650 --> 00:04:45,250 print count of all rules, which has one. 58 00:04:47,100 --> 00:04:48,270 So here it will bring. 59 00:04:49,830 --> 00:04:54,560 All roads which have zero hits, Anderson and Anderson. 60 00:04:56,140 --> 00:05:01,300 So here we have a total of nine thousand six hundred ten people who have signed, which is equal to 61 00:05:01,540 --> 00:05:05,560 one and eight thousand to 16 and people who have not signed. 62 00:05:07,370 --> 00:05:10,700 Now, let's check the percentage of missing data in each column. 63 00:05:11,120 --> 00:05:14,600 So for that, we are taking our data from seasonal. 64 00:05:15,730 --> 00:05:17,500 Good values and not some. 65 00:05:18,620 --> 00:05:20,360 To give some of our land values. 66 00:05:21,330 --> 00:05:29,580 In two hundred, divided by Lindop over what is needed of them, and we are storing a 10 percent and 67 00:05:30,060 --> 00:05:30,600 missing. 68 00:05:34,470 --> 00:05:40,350 Then are creating another bit of room with the name missing underscore value, underscore veev is going 69 00:05:40,350 --> 00:05:42,150 to be deducted from. 70 00:05:43,480 --> 00:05:52,480 And here we are giving it this very IT person underscore missing this additional key name and person 71 00:05:52,480 --> 00:05:54,390 underscored missing are the values. 72 00:05:55,210 --> 00:06:00,790 So all the values which are important underscore missing we are using here and we are starting this 73 00:06:00,790 --> 00:06:02,800 in missing and got value. 74 00:06:02,800 --> 00:06:08,860 And it could be a sort of visualized, missing, underscored value in the scope of your writing if your. 75 00:06:10,330 --> 00:06:11,230 I said, Anderson. 76 00:06:16,840 --> 00:06:22,570 So we have zero percent missing values, values in all the columns. 77 00:06:24,390 --> 00:06:25,650 OK, let's put aside for the. 78 00:06:29,370 --> 00:06:35,730 Now, let's create another data set with the name dataset to Anderlecht to remove unnecessary columns 79 00:06:35,730 --> 00:06:36,160 from here. 80 00:06:36,960 --> 00:06:45,300 So to remove unnecessary columns right there, that drop and the columns is equal to column names. 81 00:06:46,600 --> 00:06:50,980 So we are removing these columns from our beef and storing it in the to. 82 00:06:54,020 --> 00:06:57,550 And also, let's create a histogram for our columns. 83 00:06:58,160 --> 00:07:00,770 First, go configure our figure site. 84 00:07:02,110 --> 00:07:05,140 So that figure figure is equal to. 85 00:07:07,150 --> 00:07:07,450 Well. 86 00:07:09,950 --> 00:07:18,170 Not subplot for all the columns, and let's give the subtitle as histograms of numerical columns. 87 00:07:19,110 --> 00:07:20,880 And says is colleague Wendy. 88 00:07:22,670 --> 00:07:24,320 So here we are creating a for loop. 89 00:07:25,770 --> 00:07:29,460 For I in range of data to the ship of one. 90 00:07:31,280 --> 00:07:34,330 Do not subplot six rows, three columns. 91 00:07:36,360 --> 00:07:38,900 And that physical reality that G.K. 92 00:07:39,890 --> 00:07:49,910 And if the SEC title deeds are to dot column dot values, I saw this look will run on all column names. 93 00:07:49,910 --> 00:07:51,440 So I hear is the column name. 94 00:07:53,030 --> 00:08:02,090 So it will print that column for that title, so value is equal to impetuses dataset, to the dialogue 95 00:08:02,540 --> 00:08:03,380 all rose. 96 00:08:04,680 --> 00:08:05,830 I column. 97 00:08:06,800 --> 00:08:08,210 So I thought I'd the. 98 00:08:09,150 --> 00:08:10,380 So I call on them. 99 00:08:11,490 --> 00:08:14,580 So this will repeat for all columns, not unique. 100 00:08:15,780 --> 00:08:23,850 So if Valse is greater than or equal 200, then Multiskilled 100, not exploit the Instagram, so build 101 00:08:23,850 --> 00:08:27,120 it up histo data say that I log. 102 00:08:28,560 --> 00:08:35,870 All rules, and I tried to call them, I call them and Benz's call Duval's and color is equal to this 103 00:08:35,880 --> 00:08:37,230 color code, which is blue. 104 00:08:38,130 --> 00:08:46,590 So finally, we are going to call out a rectangle is equal to zero comma zero point zero three comma 105 00:08:46,590 --> 00:08:48,870 one zero point nine five. 106 00:08:50,460 --> 00:08:54,120 Knowledge that Anderson Elbrus shifted from little. 107 00:08:55,430 --> 00:08:58,180 So this is the histogram for all of our plants. 108 00:08:59,360 --> 00:09:00,410 First we have age. 109 00:09:01,820 --> 00:09:08,510 So in this program here, we have value and here we have found so mostly the value is between 20 to 110 00:09:08,510 --> 00:09:13,340 70, which means that the maximum people from 20 to 70 years of age. 111 00:09:13,580 --> 00:09:14,240 And here is the. 112 00:09:16,280 --> 00:09:20,960 So there are a thousand people of today's age, there are more than a thousand people who are forty 113 00:09:21,020 --> 00:09:21,860 five years of age. 114 00:09:23,770 --> 00:09:25,160 Before this all this blood. 115 00:09:26,760 --> 00:09:29,640 It is the value and it is a value. 116 00:09:31,630 --> 00:09:32,070 Just in. 117 00:09:33,420 --> 00:09:40,560 So we more than 750 people have income, more than three thousand, approximately three thousand. 118 00:09:43,160 --> 00:09:46,700 Similarly, for all columns, we have different values. 119 00:09:48,060 --> 00:09:49,190 With different on. 120 00:09:53,230 --> 00:09:56,630 So this is the risk score and this is the amount requested. 121 00:09:57,040 --> 00:10:04,330 So among the more than 2000 people requested one thousand. 122 00:10:05,960 --> 00:10:11,450 Now, let's check the correlation between our futures and levels so generally used to create heightmap, 123 00:10:11,630 --> 00:10:15,200 but today we're going to create a lot of correlation. 124 00:10:16,470 --> 00:10:17,430 Let's proceed further. 125 00:10:19,430 --> 00:10:27,870 So you're you're creating correlation, Bacolod, so we're taking data to that correlation with the 126 00:10:28,190 --> 00:10:28,700 design. 127 00:10:29,000 --> 00:10:32,270 So here we are correlating data to features. 128 00:10:33,860 --> 00:10:38,810 With the U.S. Collomb, so U.S. Collum from the. 129 00:10:40,390 --> 00:10:41,650 Dot, dot, dot. 130 00:10:41,680 --> 00:10:46,540 But so we're creating a plot and figures are used for more than. 131 00:10:47,550 --> 00:10:56,850 And we're creating a title as good relations with design, just our DNA is our column name, and the 132 00:10:56,850 --> 00:10:58,740 font size is equal to 20. 133 00:10:59,610 --> 00:11:05,550 Criticism's true colors is equal to being green, blue and magenta. 134 00:11:06,770 --> 00:11:13,010 So these are the colors, so to check all the parameters and all the values which are used in this, 135 00:11:13,280 --> 00:11:18,440 not just click in between this parenthesis and Shifta type. 136 00:11:23,260 --> 00:11:27,280 So in this vertical bar, practically, sure, all the parameters on the. 137 00:11:29,600 --> 00:11:32,030 Which can be used in this butler. 138 00:11:35,110 --> 00:11:38,020 Monarchs are undersell Elbrus shift and retronasal. 139 00:11:43,720 --> 00:11:45,910 So we have created a backlog. 140 00:11:53,020 --> 00:11:57,430 So these are the column names which are related with. 141 00:11:59,780 --> 00:12:00,640 Or is I'm. 142 00:12:06,670 --> 00:12:14,320 So each and current address here is negatively geared, mostly negatively correlated. 143 00:12:18,310 --> 00:12:21,550 And this one and this one, so among registered. 144 00:12:24,140 --> 00:12:30,500 And the personal account has a more positive correlation with the E signed column. 145 00:12:33,040 --> 00:12:36,490 And the US score is like zero point. 146 00:12:37,510 --> 00:12:38,400 Zero one zero. 147 00:12:39,730 --> 00:12:44,320 Not much got injured and household income was also negatively correlated. 148 00:12:45,450 --> 00:12:48,410 Lettuce is also mostly made routinely, according to. 149 00:12:52,380 --> 00:12:59,370 So from this blog, we can see that you seem to have high negative correlation with variable age and 150 00:12:59,610 --> 00:13:00,690 current address. 151 00:13:02,040 --> 00:13:03,510 I can't address your. 152 00:13:04,770 --> 00:13:08,600 And have positive correlation with the amount requested.