1 00:00:06,090 --> 00:00:06,750 Hey everyone. 2 00:00:07,440 --> 00:00:11,400 So this video is about a little concept that is missing data. 3 00:00:11,430 --> 00:00:16,800 Now you have worked on the tables that are data frames in the pandas. 4 00:00:16,950 --> 00:00:20,310 And I have told you that pandas is used for data analysis. 5 00:00:20,670 --> 00:00:26,250 So what if you are analyzing data and you get something missing like you are getting marks of students 6 00:00:26,280 --> 00:00:28,590 and any student is absent. 7 00:00:28,590 --> 00:00:29,980 So how to deal with that. 8 00:00:30,450 --> 00:00:36,480 So for that one does has some features that are under the missing details and that we are going to cover 9 00:00:36,480 --> 00:00:36,780 here. 10 00:00:37,770 --> 00:00:48,510 So for that let we have an example dictionary that is like a first and then devalues it A will be something 11 00:00:48,510 --> 00:00:59,910 like One two three four and five then we have the B and devalues and b will be like six seven eight 12 00:00:59,940 --> 00:01:10,380 nine and last will be missing and four missing values we use something that is and B dot net that digitize 13 00:01:10,380 --> 00:01:12,570 the missing values. 14 00:01:13,470 --> 00:01:14,730 So come on. 15 00:01:14,730 --> 00:01:18,580 See this one because we are going to use this one again and again here. 16 00:01:18,780 --> 00:01:28,420 Now after B we have C that will be equal to late night after 9 0 9 1 2. 17 00:01:28,440 --> 00:01:40,040 Now this time the well is missing then we have days that will be three four. 18 00:01:40,050 --> 00:01:45,150 Now three values are missing then we have here. 19 00:01:45,160 --> 00:01:51,510 Time after that we have e and own e we have four values missing. 20 00:01:51,520 --> 00:02:03,010 So here just five and then four times and B don't net and go here. 21 00:02:03,170 --> 00:02:07,470 So I do not have now shifted. 22 00:02:07,800 --> 00:02:10,630 Now if you print this one that is first. 23 00:02:10,680 --> 00:02:20,100 Let me pass it to beginning and David we go now. 24 00:02:20,110 --> 00:02:20,830 Here we have these. 25 00:02:20,950 --> 00:02:27,530 And if you print that one you will get a list of this dictionary of decision not to convert this to 26 00:02:27,530 --> 00:02:37,350 string to near doubling just to be dot data frame and then pass deep. 27 00:02:37,480 --> 00:02:43,310 You will get this one and give it into a variable that is made made him. 28 00:02:43,890 --> 00:02:48,640 Now we have def equal to this thing here. 29 00:02:48,650 --> 00:02:54,160 This and end denoting the missing values and here if you notice first one has all the values second 30 00:02:54,160 --> 00:02:55,780 one have one missing. 31 00:02:55,900 --> 00:02:57,550 Then third one have two missing. 32 00:02:57,550 --> 00:03:01,970 Then for three missing then the fourth one as for missing. 33 00:03:02,020 --> 00:03:07,750 Now to deal with them we have few methods with the pandas. 34 00:03:07,750 --> 00:03:17,020 That is first one is like if you want to have all these values whose particular value are missing. 35 00:03:17,320 --> 00:03:18,470 Get out of this table. 36 00:03:18,580 --> 00:03:24,010 Then we use something like that is known as the F to drop any. 37 00:03:24,100 --> 00:03:35,220 And then just pass the X is like if you press Breashears shift them you will get that you require ahead 38 00:03:35,240 --> 00:03:42,030 X is and there are some other things that is how trash about which we are going to discuss later after 39 00:03:42,030 --> 00:03:42,800 this one. 40 00:03:42,870 --> 00:03:50,700 So for Rose you just need to pass X is equal to zero all deals with missing values will be eliminated 41 00:03:50,700 --> 00:03:51,220 from here. 42 00:03:51,750 --> 00:03:57,650 If you pass this one one then you'll get the column which has all the values like this one. 43 00:03:58,740 --> 00:04:05,490 So that's how you can remove the rows and columns in which days and any missing the loop and by default 44 00:04:06,090 --> 00:04:07,440 the X is values zero. 45 00:04:07,440 --> 00:04:11,460 If you do this one you will get a draw in which all the values are irrelevant. 46 00:04:13,090 --> 00:04:16,860 Now if you are something like there's no problem. 47 00:04:16,900 --> 00:04:22,130 If two values are missing all three values are missing and values are missing. 48 00:04:22,210 --> 00:04:23,920 I will not work on them. 49 00:04:23,920 --> 00:04:26,480 So what if you want to work something like this. 50 00:04:26,860 --> 00:04:35,340 Then we have something known as trash that's something like if you need all these value then. 51 00:04:35,350 --> 00:04:42,070 Just like if you did 5 then only the rows in which all the values are available will get there. 52 00:04:42,070 --> 00:04:48,850 If you need something like I can work with three values available so just pass three you will get all 53 00:04:48,850 --> 00:04:54,640 the roads in which at least three values are available. 54 00:04:54,890 --> 00:04:59,740 And if you do something like just one value I can work with that. 55 00:05:00,110 --> 00:05:05,450 You will get the complete table here because all this has at least one value 1 2 2 4 5. 56 00:05:05,450 --> 00:05:12,380 These are not indexes these are the values a B if you have noticed a is equal to 1 2 2 4 5 the values 57 00:05:12,380 --> 00:05:16,970 in which you have any predominant are converted into float but those who do not have any before Lent 58 00:05:17,450 --> 00:05:20,150 are still the integers. 59 00:05:20,370 --> 00:05:25,740 So this is the trash and axes we have done now after that. 60 00:05:25,860 --> 00:05:33,570 But if you want to fill all these values like I need any particular value that is something like 1 in 61 00:05:33,570 --> 00:05:39,210 all this case like if you want to have some analytic based on multiplication then you want to add one 62 00:05:39,390 --> 00:05:44,430 in all the values like multiplication with one cannot affect the values. 63 00:05:44,430 --> 00:05:52,600 So for that one we have another method after this that is known as fill any here. 64 00:05:52,610 --> 00:05:56,480 Drop a drop and feel you have noticed most have any enemies. 65 00:05:56,490 --> 00:05:59,050 These values that we have used there. 66 00:05:59,070 --> 00:06:08,730 So here you use the value that you want to feel like I want to add one then you will get something like 67 00:06:09,260 --> 00:06:18,070 all the men are replaced by one so that's how you can replace the values that are missing in your data 68 00:06:18,100 --> 00:06:18,550 frame. 69 00:06:19,090 --> 00:06:25,630 And if you want to fill all this value by using some particular operations you can also do that one 70 00:06:25,640 --> 00:06:34,870 also like if I need to replace all the values in let me in be that are missing with the average value 71 00:06:34,990 --> 00:06:40,090 that is the most common thing whenever you are working on something like sales or you are working on 72 00:06:40,090 --> 00:06:45,970 margins then you replace the missing values by the average values because the average of all the values 73 00:06:45,970 --> 00:06:49,070 will be equal to prove value so that's approx. 74 00:06:49,090 --> 00:06:52,000 The output expected. 75 00:06:52,120 --> 00:06:56,620 So that's why filling all the items with the average values much more efficient. 76 00:06:57,100 --> 00:07:03,490 So for that if I need to fill all the values missing in B with the average value of b elements then 77 00:07:03,490 --> 00:07:08,770 I will lose something like first pass the B in quotes. 78 00:07:08,770 --> 00:07:13,280 Make sure you are passing the code again because these are indexes. 79 00:07:13,420 --> 00:07:19,980 These are some particular small things in bundles now you may get confused on like weird where the quotes 80 00:07:21,090 --> 00:07:29,550 in comp. So even when you're up to a professional also and even if you're a beginner also it will always 81 00:07:29,550 --> 00:07:31,380 confusing these things. 82 00:07:31,380 --> 00:07:37,320 So make sure to write them on a particular page like they have to our goals and what I do not have to. 83 00:07:38,070 --> 00:07:45,840 So this one this ABC the N E are just the values of the dictionary so their characters. 84 00:07:45,840 --> 00:07:52,860 And you need to pass the quotes here after passing them and then here in full. 85 00:07:53,040 --> 00:07:59,140 And they you need to have value that value will be denoting the value you are feeling like. 86 00:07:59,260 --> 00:08:08,130 We have done one and here if you don't do one in this one then you will get B just be replaced its last 87 00:08:08,130 --> 00:08:11,410 value with one that is this land replaced by one. 88 00:08:11,640 --> 00:08:21,090 And here you can make that mean value like I can do something like the F then here pass again b don't 89 00:08:22,440 --> 00:08:31,350 mean these mean and the functions like the mean are the function littered with these bundles that are 90 00:08:31,350 --> 00:08:36,150 defined under the group by that we are going to learn in the next video and when you do something like 91 00:08:36,150 --> 00:08:40,020 this one here you notice the values replaced by Sandpoint faith. 92 00:08:40,200 --> 00:08:42,240 That is the average of these four. 93 00:08:42,360 --> 00:08:49,110 Like if you tried the average then if I replace the difference that is 5 that is common in all these 94 00:08:49,470 --> 00:08:57,080 after 5 I have one two plus five and we have seven then here plus five three eight here plus four with 95 00:08:57,240 --> 00:08:58,950 five nine here. 96 00:08:58,950 --> 00:09:01,290 So here I have four three two one. 97 00:09:01,530 --> 00:09:06,710 And the original four three two one you will get divided by then you will get seven point five. 98 00:09:06,720 --> 00:09:14,490 That is the average value of this one how four plus three seven plus two eight or nine and then one 99 00:09:14,670 --> 00:09:17,110 and then divided by four point five. 100 00:09:17,130 --> 00:09:18,570 And then we are the difference five. 101 00:09:18,570 --> 00:09:20,190 We get to this end point five. 102 00:09:20,260 --> 00:09:24,590 That's how mean is calculated here and the mean is added with that. 103 00:09:24,750 --> 00:09:33,500 You can also do this thing with any value like if you do with C just do see here you will get the average 104 00:09:33,500 --> 00:09:39,670 of these two one and two average will be one because the values are in for floating point and the download 105 00:09:39,770 --> 00:09:42,460 will be fun approximately twelve point five. 106 00:09:42,860 --> 00:09:49,780 If you do something like Here you are doing with C replacing with C and add Hep B then the average of 107 00:09:49,790 --> 00:09:52,040 B will be added in these values. 108 00:09:52,310 --> 00:09:58,100 So this thing here doesn't depend on this thing here you can add the average of any value you can even 109 00:09:58,100 --> 00:10:06,990 take a variable and take the average of fraud or something like that one and do the same thing after 110 00:10:06,990 --> 00:10:07,610 that. 111 00:10:07,770 --> 00:10:12,970 Here we have it mean and if you do with E like an E. 112 00:10:13,000 --> 00:10:20,490 We have just five and dairy pass e all the values will replaced by Dec. 1 and if you try to do something 113 00:10:20,490 --> 00:10:33,170 like this one here I have to then here I have two you will get at because that thing doesn't work in. 114 00:10:33,170 --> 00:10:40,670 Same with these rows and so make sure just passing the characters that are left in column. 115 00:10:40,730 --> 00:10:47,840 So this one is again a home for you try to replace with the rows let how you can formulate this problem 116 00:10:47,930 --> 00:10:53,870 that adding average or adding some value to details missing values in rows. 117 00:10:54,350 --> 00:10:57,720 So here we are done with the missing values also. 118 00:10:57,890 --> 00:11:05,180 I hope you understand this one and one thing if you do something like this one a you will not get any 119 00:11:05,180 --> 00:11:11,300 value replaced because already feel values will not get replaced just the missing values that are not 120 00:11:11,360 --> 00:11:13,610 available will be replaced. 121 00:11:14,720 --> 00:11:16,450 So done with the missing values. 122 00:11:16,550 --> 00:11:17,750 And thanks for watching. 123 00:11:17,750 --> 00:11:18,920 I was in the next video.