1 00:00:01,750 --> 00:00:06,110 So let's identify and treat outliers in Python. 2 00:00:07,170 --> 00:00:11,160 So from our EDT, we identify three variables. 3 00:00:12,390 --> 00:00:19,980 That we want to look at, since we definitely know that rainfall and and how grooms gun ban all play 4 00:00:19,990 --> 00:00:20,160 it. 5 00:00:22,110 --> 00:00:24,800 We will directly write a function to read them. 6 00:00:25,950 --> 00:00:31,680 Before that, if you remember, in order to re lecture, we discussed about capping and loading. 7 00:00:34,060 --> 00:00:36,080 So first, we need to identify them. 8 00:00:36,250 --> 00:00:42,670 99 and the one but Sunday value of these two variables to do that. 9 00:00:43,480 --> 00:00:47,290 That is a function in Mumbai called percentile. 10 00:00:49,110 --> 00:00:54,190 We'll first look at the function we'll write and be percentile. 11 00:00:56,750 --> 00:01:00,860 Then first argument, we have to pass the column name. 12 00:01:01,010 --> 00:01:04,820 So we read Beef Dot and Groomes. 13 00:01:10,150 --> 00:01:14,770 And then in the square of it, we will write the percentile value. 14 00:01:14,890 --> 00:01:15,700 We want to see. 15 00:01:15,850 --> 00:01:17,260 So we want the 99. 16 00:01:17,870 --> 00:01:18,450 All right. 17 00:01:18,920 --> 00:01:19,650 Ninety nine. 18 00:01:20,430 --> 00:01:21,250 Then on this score. 19 00:01:24,080 --> 00:01:33,620 You can see this is an array in the 99 percentile of and half grooms is fifteen point three nine nine 20 00:01:33,890 --> 00:01:34,550 five two. 21 00:01:36,170 --> 00:01:38,660 But the output here is an array. 22 00:01:39,170 --> 00:01:45,770 So remember, if we want to fit the first number, often today, we have to specify the location of 23 00:01:45,770 --> 00:01:48,120 that value in the square record. 24 00:01:48,200 --> 00:01:50,440 So we'll write and be that person day. 25 00:01:55,380 --> 00:01:55,750 Beef. 26 00:01:56,180 --> 00:01:57,110 And our brooms. 27 00:02:00,800 --> 00:02:02,850 99 for 99 percent pain. 28 00:02:05,320 --> 00:02:07,930 And then after that, in a square record, we'll read zero. 29 00:02:08,170 --> 00:02:14,310 So we are actually getting the first element of this, Ray. 30 00:02:17,350 --> 00:02:24,610 We will save the value of this 99 percentile and another variable which we call upper limit. 31 00:02:24,760 --> 00:02:25,970 So we will write you. 32 00:02:26,050 --> 00:02:26,500 We. 33 00:02:32,280 --> 00:02:40,230 So we are just saving this fifteen point three nine nine value in another way, Raybuck, you we. 34 00:02:47,420 --> 00:02:54,530 Now, how to identify the rules of the and hard growing value is more than this number, who do that 35 00:02:54,800 --> 00:02:56,630 will right, B.F.? 36 00:02:58,180 --> 00:03:02,350 And squared record will rate the condition we're. 37 00:03:06,670 --> 00:03:08,700 Beef, goat and hard groom. 38 00:03:15,490 --> 00:03:16,940 It's more than you'll see. 39 00:03:18,460 --> 00:03:19,480 So we'll write it out. 40 00:03:19,570 --> 00:03:20,320 Then you'll be. 41 00:03:22,500 --> 00:03:26,850 Then you we when we done this, go on. 42 00:03:32,700 --> 00:03:42,390 You can see we are getting all the values we're over and the growing value is greater than this 99 percent 43 00:03:42,390 --> 00:03:44,040 daily value and courtroom's. 44 00:03:48,460 --> 00:03:50,210 You remember the you we will lose. 45 00:03:50,300 --> 00:03:51,840 We've been burned three nine nine. 46 00:03:53,200 --> 00:03:59,550 So if you see we are getting all the rules we have, this value is greater than this SUV. 47 00:04:00,140 --> 00:04:04,120 Well, you know, we wanted to limit this value. 48 00:04:04,370 --> 00:04:10,210 If you remember, in our capping and loading, we discovered that we can multiply this value by any 49 00:04:10,350 --> 00:04:14,230 BGA or any value we can replace those values. 50 00:04:15,830 --> 00:04:19,260 No, we know how to identify the outliers in our data. 51 00:04:20,220 --> 00:04:23,010 Now let's just cap this well loose. 52 00:04:24,350 --> 00:04:29,730 Now for our case, we are picking an inquiry to the multiplication of a local two three. 53 00:04:30,300 --> 00:04:35,230 Since we only want to believe the genuine outliers. 54 00:04:35,280 --> 00:04:35,610 Which is. 55 00:04:35,640 --> 00:04:37,140 Hundred and one and eighty one. 56 00:04:37,170 --> 00:04:40,570 And we don't want to touch these three outliers. 57 00:04:40,650 --> 00:04:44,220 That is for pinpoint or to zero, which is very close below. 58 00:04:44,230 --> 00:04:45,050 What are you we value. 59 00:04:45,660 --> 00:04:48,420 That's why we are taking an inquiry to three. 60 00:04:49,250 --> 00:04:49,770 All right. 61 00:04:50,780 --> 00:04:52,710 B.F. Thought and Hot Grooms. 62 00:04:53,770 --> 00:04:56,540 We want to change the values of this word evil. 63 00:04:56,680 --> 00:05:04,470 That's why we are selecting this very evil only and in record will specify the condition where we have 64 00:05:04,480 --> 00:05:05,830 Dorte and Heart Groomes. 65 00:05:09,370 --> 00:05:11,460 Is greater than three EUI. 66 00:05:15,910 --> 00:05:18,880 So 3U is approximately 46. 67 00:05:19,270 --> 00:05:23,810 So you can see or these two are four hundred and one and 86. 68 00:05:24,280 --> 00:05:27,220 We want to limit this and cap this well loose. 69 00:05:28,090 --> 00:05:31,580 And we want to limit this by value equal to three. 70 00:05:31,580 --> 00:05:31,870 We. 71 00:05:37,260 --> 00:05:38,070 Fit on this. 72 00:05:41,190 --> 00:05:43,860 This is just a warning, not an. 73 00:05:44,220 --> 00:05:48,270 So we can continue now if we rerun this a statement. 74 00:05:49,230 --> 00:05:49,890 We'll see. 75 00:05:52,200 --> 00:05:55,740 We have limited the value to BASIX basics. 76 00:05:56,670 --> 00:05:59,710 This was Arone hundred and this was around 80. 77 00:06:00,090 --> 00:06:05,250 Now we are getting a constant value of what basics? 78 00:06:06,900 --> 00:06:09,810 This is how we treat our players using Biton. 79 00:06:10,930 --> 00:06:16,200 Similarly, in the rainfall, there are values which are outlined on the lower side. 80 00:06:18,470 --> 00:06:22,290 Let's identify the outliers in rainfall and treat them as a. 81 00:06:25,830 --> 00:06:28,110 We will write and be dorkbot Sunday. 82 00:06:31,910 --> 00:06:35,120 If not green for. 83 00:06:42,140 --> 00:06:44,810 And we won the first Whitsuntide value. 84 00:06:48,660 --> 00:06:54,840 Since the outlier on Lordan and we won the first value of this update, that's why we have put zero. 85 00:06:55,880 --> 00:06:57,140 The value is 20. 86 00:06:58,660 --> 00:07:01,930 Now, we will save this value in another variable called Elvie. 87 00:07:02,480 --> 00:07:03,700 That is the lower value. 88 00:07:07,170 --> 00:07:08,850 Let me quote put this on. 89 00:07:11,450 --> 00:07:16,090 Our LDV is a variable which is containing this first percentile value. 90 00:07:17,010 --> 00:07:20,690 Now we compare this, Elvie, with our array. 91 00:07:20,870 --> 00:07:26,180 And we'll try to identify all the values which are lower than this and we value. 92 00:07:28,740 --> 00:07:28,980 Right. 93 00:07:29,030 --> 00:07:29,520 Beer. 94 00:07:34,890 --> 00:07:37,080 We're being bought green for. 95 00:07:39,510 --> 00:07:40,990 Is less than an. 96 00:07:45,150 --> 00:07:45,840 Run this. 97 00:07:46,740 --> 00:07:49,560 You can see we are only getting one single value. 98 00:07:49,670 --> 00:07:52,020 We are the rainfall value is three. 99 00:07:52,740 --> 00:07:56,430 So this is definitely an outlier and we should treat it. 100 00:07:58,940 --> 00:08:04,640 As mentioned in the teary lecture for the lower values will multiply by a decimal point. 101 00:08:05,480 --> 00:08:08,590 So in our case, we will write zero point three. 102 00:08:18,030 --> 00:08:19,700 We will select all the values. 103 00:08:23,720 --> 00:08:30,470 Where we have no rainfall is less than zero point three into two, Elvie. 104 00:08:31,800 --> 00:08:37,100 And we will equate it to zero one three times of every. 105 00:08:41,580 --> 00:08:43,080 We've done this and we'll. 106 00:08:45,400 --> 00:08:47,960 We've done this statement again. 107 00:08:47,990 --> 00:08:49,580 You can see that the. 108 00:08:51,300 --> 00:08:54,480 Rainfall value is now six and sort of three. 109 00:08:56,930 --> 00:08:59,900 That is how we treat outlands using Biton. 110 00:09:02,400 --> 00:09:05,340 So, no, since we have treated our outlets. 111 00:09:05,490 --> 00:09:09,140 Let's take a look at what UDD once more. 112 00:09:09,700 --> 00:09:10,040 All right. 113 00:09:10,250 --> 00:09:12,760 The if not disgraced. 114 00:09:19,960 --> 00:09:23,040 Let's look at the hotel rooms and rainfall. 115 00:09:27,440 --> 00:09:30,290 As you can see now, the maximum value is 46. 116 00:09:31,990 --> 00:09:35,830 And the mean and median values are a lot closer than before. 117 00:09:37,150 --> 00:09:41,700 Similarly for rainfall, the lower value is now six instead of three. 118 00:09:42,760 --> 00:09:46,120 And again, the median value is closer to the mean value. 119 00:09:48,960 --> 00:09:52,530 That's all for Eau Claire treatment and identification and buy them.