1 00:00:05,830 --> 00:00:12,560 Here everyone so in this video we are going to do some basic analyses beginning with which 10 states 2 00:00:12,560 --> 00:00:18,240 have the most number of schools that open project together donations bloated data using part. 3 00:00:18,680 --> 00:00:22,910 Then we have what are the top 10 states in which school get the most amount of average donations will 4 00:00:22,910 --> 00:00:24,090 they project. 5 00:00:24,110 --> 00:00:29,510 Some of you are thinking that why we need to do that because that's the analyses that you can do the 6 00:00:29,510 --> 00:00:35,810 things and lights these things like find the product that is most selling in the last seven days or 7 00:00:35,810 --> 00:00:41,180 the maximum number of revenue generated by any product something like that things. 8 00:00:41,210 --> 00:00:46,360 So we will begin with this one that stands to have the most number of schools. 9 00:00:46,370 --> 00:00:52,370 Now the thing is how you can find that vis 10 states have the maximum number of schools that open projects 10 00:00:52,370 --> 00:00:53,760 together delicious. 11 00:00:53,780 --> 00:00:58,070 So for that we need the states first thing. 12 00:00:58,490 --> 00:01:00,510 This is how you are going to work with that one. 13 00:01:00,530 --> 00:01:05,670 First you need states that which states are involved in that particular thing. 14 00:01:05,810 --> 00:01:12,790 Then we need to count the number of states that are involved in that thing and how many times did that. 15 00:01:12,980 --> 00:01:18,790 Like if you focus on that one the most number of schools that opened projects together donations. 16 00:01:18,800 --> 00:01:24,800 So we need to have something like in which state we have maximum number of schools that are involved 17 00:01:25,100 --> 00:01:27,640 in these kinds of donations together. 18 00:01:27,660 --> 00:01:28,850 That particular donation. 19 00:01:29,270 --> 00:01:36,410 So first task is we need to find all these states then we need to count that how many schools are available 20 00:01:36,410 --> 00:01:37,710 in each state. 21 00:01:37,790 --> 00:01:44,720 And after that when we need that which one has the maximum number of schools we can say and we have 22 00:01:44,720 --> 00:01:46,550 to find 10 of these. 23 00:01:46,550 --> 00:01:53,030 So after getting the number of counts we have to raise them in such a way so that we can find the top. 24 00:01:53,960 --> 00:01:56,930 So let's move to the end less is how we can do it. 25 00:01:56,960 --> 00:02:00,630 We have this number of data we have about most data. 26 00:02:00,710 --> 00:02:01,840 If you notice this one here. 27 00:02:02,390 --> 00:02:04,710 And we also have these data finds. 28 00:02:04,820 --> 00:02:06,520 So how we can find data. 29 00:02:07,520 --> 00:02:14,600 So go to these files and think about that but our problem is about the states that are having schools 30 00:02:14,660 --> 00:02:16,010 for the donations. 31 00:02:16,010 --> 00:02:27,820 So if you focus on your data up then Davey here we have the data project and school load head donations 32 00:02:27,910 --> 00:02:28,610 do just do. 33 00:02:28,940 --> 00:02:34,730 So teachers is a kind of data maybe that will not provide these states of element. 34 00:02:35,480 --> 00:02:40,090 So teachers is not the one that we need then we have schools dot head. 35 00:02:40,610 --> 00:02:42,630 Maybe this will provide us the data. 36 00:02:42,710 --> 00:02:49,130 So if you notice on the data set here we have school I.D. school name school Metro type school percentage. 37 00:02:49,160 --> 00:02:53,880 So this is something like similar to Denver that one like. 38 00:02:53,960 --> 00:02:57,380 After that when we have schools state so they will go with that one. 39 00:02:58,250 --> 00:02:59,710 This is a main point. 40 00:02:59,720 --> 00:03:04,020 So once you find that particular thing that we need to analyze go for it. 41 00:03:04,610 --> 00:03:11,060 If you observe a does then in that one we did not have this school straight into sources dot head in 42 00:03:11,060 --> 00:03:12,380 the projects dot head. 43 00:03:13,070 --> 00:03:17,610 If you notice we did not have that one again in the dot notes. 44 00:03:17,620 --> 00:03:21,700 We also do not have that one and in the new nations we also did not have that one. 45 00:03:22,120 --> 00:03:24,550 So school is the key that we need here. 46 00:03:24,970 --> 00:03:32,170 Now the data that we have must do so far the data phys ed one that is data four also contains that one. 47 00:03:32,170 --> 00:03:34,450 So go for that one or this one. 48 00:03:35,080 --> 00:03:41,890 So in the case in which your data can be sorted only by using small data set then go for Dedmon but 49 00:03:41,890 --> 00:03:47,310 in the cases you need to figure out the data from the competition to then before that. 50 00:03:47,740 --> 00:03:51,900 So this time maybe this one can solve a problem. 51 00:03:52,000 --> 00:03:56,230 So we did not require the larger data so go for this one. 52 00:03:56,230 --> 00:03:58,990 So here we have the Coolum school state. 53 00:03:58,990 --> 00:04:01,750 Now we get the column. 54 00:04:01,750 --> 00:04:03,270 That is school state. 55 00:04:03,310 --> 00:04:05,300 And one thing we left there. 56 00:04:05,920 --> 00:04:10,500 Let me again show you Dedmon notice. 57 00:04:10,690 --> 00:04:11,950 We have these states. 58 00:04:12,370 --> 00:04:14,440 And here we have school links. 59 00:04:14,440 --> 00:04:21,560 Now if you notice these five are different but maybe then we have two on the or against it today. 60 00:04:21,640 --> 00:04:31,690 You notice fewer of the states are repeating there we have Texas here access here then we have California 61 00:04:31,690 --> 00:04:34,850 here and I have the California dates. 62 00:04:35,020 --> 00:04:37,200 So Maryland two times here. 63 00:04:39,250 --> 00:04:45,570 And all the adults also they are available here a number of times like it's not like every state have 64 00:04:45,580 --> 00:04:47,170 just only one school. 65 00:04:47,170 --> 00:04:49,180 States have many number of schools. 66 00:04:49,180 --> 00:04:56,510 So here what we need to do we will count that each state has maximum number of schools and take the 67 00:04:56,510 --> 00:04:57,970 top that from Declan. 68 00:04:57,970 --> 00:05:00,880 Now the question is are we going to observe all the files here. 69 00:05:01,270 --> 00:05:11,910 If you see in the shape school data frame have 7 2 9 9 3 rules it means it has seventy two thousand 70 00:05:11,910 --> 00:05:13,380 nine hundred and ninety trees. 71 00:05:13,380 --> 00:05:14,390 The middle schools. 72 00:05:14,850 --> 00:05:21,480 So how we can separate them by using the country's will order against the states then maybe we have 73 00:05:21,540 --> 00:05:27,530 maximum number of 20 to 30 states not more than that one so either need to analyze all the data. 74 00:05:27,950 --> 00:05:35,530 So here we will use other data science tools to analyze that one in just these two sin Texas or because 75 00:05:35,530 --> 00:05:37,120 it says of a level we have. 76 00:05:38,480 --> 00:05:46,880 So we begin by creating a data that is in a variable est and this will be from schools. 77 00:05:46,880 --> 00:05:53,850 So here we have schools and the column I'm using there is school state. 78 00:05:54,500 --> 00:05:58,420 So now I have access that particular column I'm in school state. 79 00:05:58,420 --> 00:05:59,660 Know what to do now. 80 00:06:00,320 --> 00:06:03,230 So now what will I do. 81 00:06:03,410 --> 00:06:06,440 I will first count the values available there. 82 00:06:07,040 --> 00:06:16,060 So value underscore count we had the sun and the parties to something that will count the number of 83 00:06:16,060 --> 00:06:23,250 values of a level they're like two times the taxes three times the any of the that number of times in 84 00:06:23,250 --> 00:06:25,280 Washington these things. 85 00:06:25,330 --> 00:06:30,450 So this is that fund matter that will help us to count the values. 86 00:06:30,520 --> 00:06:40,030 Now once we have all the values counted like if we have Washington well then we have California with 87 00:06:40,320 --> 00:06:46,520 one day then we have any other states say x y z we have four. 88 00:06:46,750 --> 00:06:54,070 Now we have these files but our second globe problem is that how we know that Fitch has the maximum 89 00:06:54,070 --> 00:07:01,390 and which has the minimum how we can find the top ten from all these fights. 90 00:07:01,390 --> 00:07:08,650 So what we need to do here now requires something assorted list assorted list is something in which 91 00:07:08,740 --> 00:07:14,620 either we have the ascending order of values or we have the values in descending order. 92 00:07:15,460 --> 00:07:19,310 So the thing here is we require to sort of values. 93 00:07:19,310 --> 00:07:23,660 Now we again are going to analyze all the values and sort according to that one. 94 00:07:23,800 --> 00:07:25,530 Like first that one has the maximum. 95 00:07:25,540 --> 00:07:32,170 Then something like that when all the values noncitizens know that again way to use a method here that 96 00:07:32,170 --> 00:07:35,650 is known as sort underscored values. 97 00:07:35,770 --> 00:07:37,540 It was sort all the values there. 98 00:07:38,020 --> 00:07:43,060 And the order we will just make the ascending false. 99 00:07:43,120 --> 00:07:51,800 Here we have ascending and this one false so that the values will be available in descending order it 100 00:07:51,800 --> 00:07:53,870 means we have the maximum on the top. 101 00:07:54,890 --> 00:07:57,260 And how many values we require. 102 00:07:58,100 --> 00:07:59,570 And according to this one. 103 00:08:00,590 --> 00:08:05,450 So here we have the date if you shifted then we go to the attribute error series which it has not. 104 00:08:05,960 --> 00:08:15,490 Well you underscored combed the I have left s this one also contains I asked them so for that one shifted 105 00:08:15,500 --> 00:08:15,840 on. 106 00:08:15,850 --> 00:08:23,720 There we be literal life been the s you will have the top ten of these like the California has eight 107 00:08:23,720 --> 00:08:29,510 thousand four hundred and fifty seven schools there's something we can never count thereby using like 108 00:08:29,750 --> 00:08:36,440 Washington has this one now and that one now by just counting visualizing the list then we have Texas 109 00:08:36,500 --> 00:08:42,890 New York Florida and if you notice this one it's a thousand degrees by 2000 then degrees by three thousand 110 00:08:43,070 --> 00:08:52,380 then nearly equals tenth floor now we have the data if you do not include head here you will have but 111 00:08:52,380 --> 00:09:01,050 the thing I left there this princes that made a move Dedmon also you will have all the available states 112 00:09:01,100 --> 00:09:08,650 step so they are approximately 30 and if you'll notice here all the values are counted in that one. 113 00:09:08,740 --> 00:09:12,760 Now let me again bend to only the 10 values there. 114 00:09:13,200 --> 00:09:14,990 So here we have the had no. 115 00:09:15,000 --> 00:09:18,540 Our next step is blow the data using Mark Block. 116 00:09:18,900 --> 00:09:21,810 So we will plotted data here by using a barcode. 117 00:09:22,230 --> 00:09:25,290 So I'm going to use the filing this time here. 118 00:09:25,320 --> 00:09:34,350 You can use any of Deven kind equal to but if you remember then we have X data and X data. 119 00:09:34,350 --> 00:09:36,320 Let me show these states on next data. 120 00:09:36,450 --> 00:09:46,930 So here we have states and divide title y capital and then title and then that will be equal to like 121 00:09:46,930 --> 00:09:55,670 number of schools something like that one so here we had this one and last. 122 00:09:55,670 --> 00:10:02,510 The title of plot that something like number of schools 123 00:10:04,700 --> 00:10:05,600 involved 124 00:10:08,420 --> 00:10:12,590 in projects by 125 00:10:27,150 --> 00:10:31,060 so shifted done that one day we had the backlog. 126 00:10:31,800 --> 00:10:34,510 So here is a barcode available here. 127 00:10:34,980 --> 00:10:39,840 And if you notice these values first we have eight thousand four hundred fifty seven. 128 00:10:39,840 --> 00:10:44,190 Then we have this one six thousand four hundred eighty five and they're decreasing. 129 00:10:44,190 --> 00:10:46,110 And here we have the constant rate. 130 00:10:46,860 --> 00:10:52,380 So that's how you can get the data. 131 00:10:52,820 --> 00:10:54,240 What's the cushion we have. 132 00:10:54,240 --> 00:10:56,840 And states have the most number of schools. 133 00:10:56,850 --> 00:11:02,510 So here we have 10 states that have most number of schools and we have utilized the data by using upload 134 00:11:02,520 --> 00:11:03,770 here. 135 00:11:03,840 --> 00:11:12,420 So if you did not involve this one there and now try to pin the bar note you will have all the states 136 00:11:12,420 --> 00:11:14,120 of a living there. 137 00:11:14,130 --> 00:11:16,940 If you notice we have all this to say I'm on the x axis. 138 00:11:16,950 --> 00:11:22,020 We have states only by axis we have a number of schools and this one have a title. 139 00:11:23,730 --> 00:11:29,090 So now I hope you will discern that how we do this one and this one is very interesting. 140 00:11:30,600 --> 00:11:34,590 Let me have again that one lady. 141 00:11:35,920 --> 00:11:40,130 Now we are done with this but let's move to the next question. 142 00:11:40,260 --> 00:11:47,310 What are the top 10 states in which school get the most amount of average donations for their projects. 143 00:11:47,340 --> 00:11:54,540 So the second question is something more typical than the first one and this one is a little task for 144 00:11:54,540 --> 00:12:01,020 you both the video and tried for a second so that you will get that are you getting these things or 145 00:12:01,020 --> 00:12:01,380 not. 146 00:12:05,230 --> 00:12:06,590 I hope you get that one. 147 00:12:06,810 --> 00:12:09,560 This is not once again but you have posed the video. 148 00:12:09,640 --> 00:12:12,940 I told you so now let's begin with discussion. 149 00:12:13,000 --> 00:12:14,250 Top 10 steps. 150 00:12:14,500 --> 00:12:21,900 So we again need steps in which school get the most amount of average donations for their projects. 151 00:12:21,910 --> 00:12:23,910 So in this time. 152 00:12:23,950 --> 00:12:30,700 So at this time we require this school to sort of states and we also require donation. 153 00:12:31,000 --> 00:12:39,430 And if you'll notice on the data now we have the donation amount present in donations but we don't have 154 00:12:39,430 --> 00:12:40,790 states here. 155 00:12:40,810 --> 00:12:47,890 These states are available in the schools I believe we have right now. 156 00:12:48,290 --> 00:12:50,560 So here we have school states. 157 00:12:50,630 --> 00:12:53,840 So the thing here is how we can do that right now. 158 00:12:54,050 --> 00:12:56,110 One column is present in school. 159 00:12:56,180 --> 00:12:58,520 Another column is presented in donations. 160 00:12:58,520 --> 00:13:02,830 So this time we're going to need the complete data that is this one. 161 00:13:02,840 --> 00:13:06,790 It also have the donation amount and school. 162 00:13:06,980 --> 00:13:08,390 Let me have that one. 163 00:13:08,750 --> 00:13:11,640 So here we have the school state. 164 00:13:11,780 --> 00:13:17,410 After that we have where the donation amount there is the donation amount. 165 00:13:18,050 --> 00:13:20,510 So we have both the things available here. 166 00:13:20,510 --> 00:13:22,950 Now let's move to the question. 167 00:13:23,660 --> 00:13:28,370 We have no states and donation and now focus on the question what are the top 10 states in which school 168 00:13:28,370 --> 00:13:31,280 get the most amount of average donations. 169 00:13:31,650 --> 00:13:38,240 So require the states in which days maximum donation these states are which has maximum number of schools 170 00:13:38,260 --> 00:13:44,420 involved that now require the states in which we have maximum donation and see whether they are equal 171 00:13:44,420 --> 00:13:45,740 to these or not. 172 00:13:47,090 --> 00:13:49,030 So again vary by assembly time. 173 00:13:49,160 --> 00:13:52,490 That is as true we are going to use then data are. 174 00:13:53,420 --> 00:13:57,890 And now the column that we are going to need here are two columns. 175 00:13:57,920 --> 00:14:04,910 So we first group these things by using school state. 176 00:14:04,910 --> 00:14:13,070 Here we have schools state and after these things are grouped according to that fund. 177 00:14:13,210 --> 00:14:18,950 So here first we have made make a group of debt one like the all these available debt debt. 178 00:14:19,120 --> 00:14:24,870 Then we will access the donation amount that what is the donation amount. 179 00:14:25,250 --> 00:14:25,950 Present there. 180 00:14:25,960 --> 00:14:35,060 So donation amount and then what we need is the maximum amount gathered by these. 181 00:14:35,080 --> 00:14:37,660 So here we will use mean. 182 00:14:38,710 --> 00:14:41,680 And then we are not done with that one. 183 00:14:41,830 --> 00:14:44,050 We have other data from that one. 184 00:14:44,080 --> 00:14:48,750 We have arranged for the school steps and own bases of that one. 185 00:14:48,790 --> 00:14:54,700 We have the donation amount like you can imagine this one in a simple way like we have arranged all 186 00:14:54,700 --> 00:14:55,120 these. 187 00:14:55,180 --> 00:15:01,550 Something like that one and then we have access the amount of donation present busted. 188 00:15:01,690 --> 00:15:06,760 Then we have find the mean of these because in each if you notice we have eight thousand four hundred 189 00:15:06,760 --> 00:15:07,360 fifty seven. 190 00:15:07,390 --> 00:15:12,730 So we need to find a means of one so can have we can have a basic idea of the average amount. 191 00:15:13,450 --> 00:15:15,030 So we had used to mean there. 192 00:15:15,400 --> 00:15:17,660 Now again the same things before need to sort. 193 00:15:17,660 --> 00:15:25,990 The values so sought underscore values and ascending equal to False. 194 00:15:26,020 --> 00:15:29,730 I hope you go that wondered how I have done this one. 195 00:15:30,940 --> 00:15:35,350 Make sure you are getting this thing that how this one is working. 196 00:15:35,350 --> 00:15:41,530 Grouping the things are going into first column then getting the values at that particular columns. 197 00:15:41,530 --> 00:15:45,270 After that this one ascending equal to folks and how many we need. 198 00:15:45,400 --> 00:15:46,840 Again we need only 10. 199 00:15:48,340 --> 00:15:53,180 We have this one give it a time to get processed 200 00:15:55,980 --> 00:15:58,510 so waiting for that one. 201 00:15:59,100 --> 00:16:01,850 Until then let me write this index for our partner. 202 00:16:02,460 --> 00:16:09,490 So just as to what I plot this one taking time because the file is very large so don't worry about it. 203 00:16:09,490 --> 00:16:12,030 One kind equals two. 204 00:16:12,030 --> 00:16:13,340 Here we have the board. 205 00:16:15,190 --> 00:16:27,260 And then X title to add this one state then we have white title and provided this one. 206 00:16:27,260 --> 00:16:28,070 Average 207 00:16:30,200 --> 00:16:31,190 donation 208 00:16:34,440 --> 00:16:36,950 but project. 209 00:16:37,260 --> 00:16:38,260 Here we have this one. 210 00:16:38,260 --> 00:16:54,140 Then the last title and title is the caution though and states in which he just like let me leave this 211 00:16:54,180 --> 00:17:01,140 enterprise states and index pass pervade maximum donation. 212 00:17:01,170 --> 00:17:01,680 There we go. 213 00:17:02,250 --> 00:17:04,710 And this one is again don't be same like that one. 214 00:17:04,740 --> 00:17:09,310 So change the color. 215 00:17:09,450 --> 00:17:13,880 Let me have a proper view here right now. 216 00:17:14,250 --> 00:17:19,980 So I will provide a color scale also so that it will be a little different from the above one and the 217 00:17:19,980 --> 00:17:22,280 color scale will be better. 218 00:17:22,350 --> 00:17:22,890 The blue one. 219 00:17:23,230 --> 00:17:30,390 Nowadays this one man processing and let me print that one as to David go with that one. 220 00:17:30,930 --> 00:17:35,730 So if you notice we had the California with maximum number of schools but maybe they only show about 221 00:17:35,730 --> 00:17:37,110 this low there. 222 00:17:37,500 --> 00:17:42,410 That's why we have filming with the maximum donation amount. 223 00:17:43,410 --> 00:17:51,110 Then we have how I just rolled Iceland North Dakota and New Jersey. 224 00:17:51,110 --> 00:17:53,470 So that's the thing there. 225 00:17:53,570 --> 00:17:57,190 Now we will applaud that but here we go deadline. 226 00:17:58,640 --> 00:18:00,820 So maximum 130. 227 00:18:00,860 --> 00:18:03,680 If you notice this one 130. 228 00:18:04,130 --> 00:18:06,800 Then we have this one decreasing and of course generate here. 229 00:18:07,430 --> 00:18:10,400 So these are the top 10 states with maximum number of donation. 230 00:18:12,230 --> 00:18:14,060 So we are done with this one also. 231 00:18:14,420 --> 00:18:20,080 And the next problem we have analyzed the maximum minimum mean median mode and twenty five point seventy 232 00:18:20,080 --> 00:18:22,820 five percent of donations. 233 00:18:23,240 --> 00:18:24,780 So this problem is for you. 234 00:18:25,010 --> 00:18:30,560 Give it a try and we will continue in the next video for the next problems and I hope you get that. 235 00:18:30,560 --> 00:18:30,920 How. 236 00:18:30,920 --> 00:18:31,840 I have been these. 237 00:18:32,570 --> 00:18:38,360 So this one is the basic idea mode you will understand while we sold the remaining problems. 238 00:18:38,360 --> 00:18:39,470 So thanks for watching. 239 00:18:39,470 --> 00:18:40,460 See in the next video.