1 00:00:00,920 --> 00:00:05,480 In this video, we will learn how to handle the outlying values in our dataset. 2 00:00:06,560 --> 00:00:15,560 We have identified from univariate analysis that in hot rooms contained exceptionally large values. 3 00:00:17,280 --> 00:00:20,610 These values are outliers and we have to treat them. 4 00:00:22,640 --> 00:00:29,560 From the lecture, we have told you that when we have an outlier, we cap the values in the variable 5 00:00:29,990 --> 00:00:37,490 to upper limit to find the upper limit, we find either a non outlaying value and find a multiplication 6 00:00:37,490 --> 00:00:37,930 factor. 7 00:00:39,390 --> 00:00:44,500 All we use the 99 percent daily value and use the multiplication factor to get the upper value. 8 00:00:45,420 --> 00:00:49,680 We don't change the values of all the old values to this upper value. 9 00:00:52,480 --> 00:01:00,550 To do the same thing in and courtrooms, we need to first identify those cells which contain the outline 10 00:01:00,560 --> 00:01:01,030 values. 11 00:01:02,530 --> 00:01:11,800 To identify the cells, we are going to apply a filter first to apply filters on the table, we select 12 00:01:11,800 --> 00:01:16,510 the top row of our table, then we go to data. 13 00:01:18,020 --> 00:01:20,000 And we click on this filter option. 14 00:01:23,770 --> 00:01:30,210 You can see that these additional small boxes containing a small triangle are coming with each column. 15 00:01:31,660 --> 00:01:37,940 These boxes are giving us the filtering and sorting option for this column here. 16 00:01:37,990 --> 00:01:44,500 You can see that all the values of this column are listed in this small box. 17 00:01:45,800 --> 00:01:50,970 So if we go to any rooms now and open this small box for. 18 00:01:53,690 --> 00:02:00,410 You can see the values are starting from ten point zero five and it goes on and suddenly the values 19 00:02:00,410 --> 00:02:06,320 increase from fifteen point four to eighty one point one two and then one hundred one point one two. 20 00:02:07,790 --> 00:02:11,570 So we have to treat these two exceptionally large values. 21 00:02:13,590 --> 00:02:21,270 And to get the valley of upper limit, we are going to use this largest normal value. 22 00:02:23,190 --> 00:02:27,300 Which is fifteen point four, and we will use a multiplication factor of three. 23 00:02:28,140 --> 00:02:34,140 So the value we are going to replace these outlying values is three and to fifteen point four, which 24 00:02:34,140 --> 00:02:35,190 is forty six point two. 25 00:02:37,270 --> 00:02:45,310 So to select these two cells will first and check all of them and go and check these last three cells. 26 00:02:49,820 --> 00:02:54,950 You can see we have only those observations with us, which have these three types of values. 27 00:02:56,260 --> 00:03:03,340 Now, to change the value in this particular cell, to train to fifteen point four, we will rate is 28 00:03:03,340 --> 00:03:08,320 equal to three in do fifteen point four. 29 00:03:12,930 --> 00:03:16,230 For this also, we will continue to train to fifteen point four. 30 00:03:21,220 --> 00:03:28,540 So this method of treating outliers, that is we create an upper limit and change the outlying values 31 00:03:28,540 --> 00:03:30,710 to that upper limit is very subjective. 32 00:03:31,360 --> 00:03:35,560 You can choose any non Nordling value. 33 00:03:35,590 --> 00:03:42,280 You can choose any multiplication factor to establish the value and change these outlying values according 34 00:03:42,280 --> 00:03:43,150 to your business need. 35 00:03:44,290 --> 00:03:49,270 So with these actions, we have treated the outlying values in our dataset. 36 00:03:50,110 --> 00:03:54,280 Now we will take this select Altec box to get the whole table. 37 00:03:55,450 --> 00:03:59,920 Now you can see that we do not have those two exceptionally large values. 38 00:04:00,340 --> 00:04:01,440 We have this value. 39 00:04:01,990 --> 00:04:05,930 If you think this is still too big, you can treat it further. 40 00:04:05,980 --> 00:04:07,640 You can get a smaller value here. 41 00:04:08,710 --> 00:04:14,530 For now, we are going to keep this value as the upper limit porras and courtroom's variable.