1 00:00:01,050 --> 00:00:05,730 In this video, we will learn the intuition behind our nearest neighbors classifier. 2 00:00:07,930 --> 00:00:13,640 CNN is another approach which attempts to classify basis based classify it. 3 00:00:13,900 --> 00:00:20,440 That is what a given observation based on the conditional probability of belonging to each class. 4 00:00:21,070 --> 00:00:24,100 It will classify the observation into each class. 5 00:00:24,220 --> 00:00:25,680 As we saw in Alioto. 6 00:00:27,600 --> 00:00:31,590 Mark, unlike Ali, Gannon, is a non parametric mentor. 7 00:00:32,180 --> 00:00:33,320 That is in Kinnon. 8 00:00:33,470 --> 00:00:36,520 We do not assume any functional form of the relationship. 9 00:00:37,680 --> 00:00:41,520 Therefore, the final go can have very complex chip. 10 00:00:42,940 --> 00:00:48,280 Also, potentially, the final go can have very high accuracy, too. 11 00:00:49,810 --> 00:00:54,250 Let me clean the content behind CNN using this simple diagram. 12 00:00:56,580 --> 00:01:03,130 For simplicity, we are assuming that we have only two predictive variables so that I can short on a 13 00:01:03,130 --> 00:01:03,810 two day plot. 14 00:01:05,170 --> 00:01:08,920 Although the same concept can be extended for any number of predictors. 15 00:01:10,790 --> 00:01:18,350 So suppose I have one predictor on the x axis and the other one on the Y axis and decollete of of these 16 00:01:18,440 --> 00:01:24,380 small circles which represent each data point is telling us the class of the responsibility. 17 00:01:24,500 --> 00:01:27,840 But that is some circles are an orange color. 18 00:01:28,280 --> 00:01:30,860 So that is one class and somewhat in Blue-Collar. 19 00:01:33,050 --> 00:01:41,620 Now, I have this point, which is Mondelēz X and I want to classify in2 either blue glass or the orange 20 00:01:41,620 --> 00:01:44,780 glass, engage nearest neighbors. 21 00:01:45,640 --> 00:01:47,860 We decide the value of key. 22 00:01:48,310 --> 00:01:54,330 That is how many point near that particular point we want to consider. 23 00:01:56,950 --> 00:01:59,440 So suppose I take is equal to three. 24 00:01:59,680 --> 00:02:06,790 That is, I will take three nearest points to that point and out of those three. 25 00:02:07,210 --> 00:02:10,840 I find the conditional probability of each class blue or orange. 26 00:02:12,160 --> 00:02:18,310 So if you look at this point, I'm brown, the smaller circle, which can and Kompass do three point 27 00:02:18,430 --> 00:02:26,470 so that I have the three nearest neighbors out of these three to belong to the blue category. 28 00:02:27,040 --> 00:02:33,070 Therefore, the conditional probability of blue is to Vetri and one belongs to orange. 29 00:02:33,190 --> 00:02:34,520 Never conditional release. 30 00:02:34,540 --> 00:02:35,250 One by three. 31 00:02:36,260 --> 00:02:42,230 Since the conditional probability of blue category is higher, I will assign this point. 32 00:02:42,430 --> 00:02:44,840 Magda's across to the blue category. 33 00:02:48,500 --> 00:02:54,880 If I had gays equal to one, that is, I will decide only basis one nearest neighbor. 34 00:02:55,610 --> 00:03:03,020 Then I will find the point nearest to this Crosspoint, which is probably this orange circle. 35 00:03:04,160 --> 00:03:09,200 In that case, I will assign Ordenes category to this cross. 36 00:03:11,130 --> 00:03:17,720 If we take gays equal to do, then these two point will be closer to these. 37 00:03:17,850 --> 00:03:24,750 This Crosspoint, in that case, the orange category will be having a commissioner of a video point 38 00:03:24,750 --> 00:03:25,090 for you. 39 00:03:25,170 --> 00:03:26,880 And the blue will also be having a condition. 40 00:03:26,880 --> 00:03:29,280 Where will they appoint in such a case? 41 00:03:29,340 --> 00:03:34,020 Our software package will be assigning the class randomly. 42 00:03:36,810 --> 00:03:44,610 So when I'm running this gain and in much of good package, I will be setting seed by setting seed. 43 00:03:44,730 --> 00:03:47,550 We will both be getting these same random solutions. 44 00:03:47,850 --> 00:03:53,460 So whenever the conditions for a release seem for two glasses and the software package randomly assigned 45 00:03:53,490 --> 00:03:56,340 Stigler's, we both will get the same answers. 46 00:03:56,850 --> 00:04:01,230 So setting seed will help us, getting the same answers that is reproducing. 47 00:04:01,230 --> 00:04:03,320 That isn't too broad. 48 00:04:03,390 --> 00:04:09,690 This graph, that is to identify the boundaries of this nearest neighbor classifier. 49 00:04:11,520 --> 00:04:13,940 We have created a grid off point. 50 00:04:14,820 --> 00:04:20,730 So for all the different values of X and Y, we have created a grid of points. 51 00:04:21,000 --> 00:04:24,180 And you assigned that last to each of these points. 52 00:04:24,960 --> 00:04:27,640 So you see all these points are in Blue-Collar. 53 00:04:28,110 --> 00:04:33,900 All these points are Ulgen, Blue-Collar and these points are in orange color and all these points. 54 00:04:33,900 --> 00:04:41,110 I mean, the classes based on this concept, only when do you assign categories to all these Green Point. 55 00:04:41,850 --> 00:04:43,980 Wherever decollete off grid point is changing. 56 00:04:44,430 --> 00:04:48,310 I've drawn this boundary and this is all we will get. 57 00:04:48,330 --> 00:04:51,050 The boundary of the classifier. 58 00:04:51,570 --> 00:04:52,680 And again, in this neighborhood. 59 00:04:57,480 --> 00:05:03,670 One of the most important parameter in Kenya and its neighbor is the value of key gay is often called 60 00:05:03,680 --> 00:05:12,450 the hyper barometer of Ganon classify a gay controls the flexibility of this boundary. 61 00:05:13,380 --> 00:05:22,230 So if you look at this, go I gays equal to one, my classified will closely follow each individual 62 00:05:22,230 --> 00:05:22,680 point. 63 00:05:23,010 --> 00:05:30,270 So if I have a blue point here, it will closely follow this and the boundary will be very complicated 64 00:05:31,360 --> 00:05:33,600 and will have a lot of twists and turns. 65 00:05:35,610 --> 00:05:42,650 Whereas if I use a very high value of key, the boundary will not be very sensitive to individual data 66 00:05:42,650 --> 00:05:43,130 point. 67 00:05:45,270 --> 00:05:47,430 It is having very less dones. 68 00:05:49,690 --> 00:05:54,710 So the flexibility of this boundary is being go on by the value of key. 69 00:05:55,240 --> 00:06:01,900 It is very important that we choose the optimal value of K so that we get the date of which it is entered 70 00:06:01,900 --> 00:06:04,000 by Distorter Lane, which. 71 00:06:05,610 --> 00:06:12,240 As the minimum edited, so since then we have gains you could look under, a lot of points were getting 72 00:06:12,240 --> 00:06:15,420 misclassified when we have Gaige equal to one. 73 00:06:16,350 --> 00:06:17,100 A lot of point. 74 00:06:17,160 --> 00:06:18,780 Again, get misclassified. 75 00:06:22,760 --> 00:06:27,230 Although the training at a rate will be very low when gays are equal to one. 76 00:06:27,810 --> 00:06:35,450 My deepest Adelaide will be very high because this girl is too dependent on these individual values 77 00:06:36,380 --> 00:06:42,140 and may not be actually following the true function of relationship between the Predator and the responsibility. 78 00:06:42,200 --> 00:06:45,740 But so both of these will have. 79 00:06:46,360 --> 00:06:47,950 At Adelaide in the desert. 80 00:06:49,310 --> 00:06:55,070 It is very important that we get the optimum value of cake, get our test at Adelaide is minimum. 81 00:06:58,150 --> 00:07:04,850 Another important thing to notice, because the and classify it predicts declasse of a given test observation 82 00:07:05,330 --> 00:07:08,060 by identifying the observations that are nearest to it. 83 00:07:08,900 --> 00:07:11,030 The scale of variables matters. 84 00:07:12,590 --> 00:07:18,500 Any variables that are on a large scale will have a much larger effect on the distance between the observations 85 00:07:19,070 --> 00:07:24,000 and hands on the can and classify it then the variables that are on a smaller scale. 86 00:07:24,890 --> 00:07:30,260 For instance, imagine a dataset that contains two variables Salvy and age. 87 00:07:32,030 --> 00:07:38,300 As far as CNN is concerned, a difference of a thousand dollars in salary is enormous compared to a 88 00:07:38,300 --> 00:07:40,400 difference of 50 years in age. 89 00:07:42,080 --> 00:07:48,380 Consequently, salary will drive dickin in classification results and age will have almost no effect. 90 00:07:49,910 --> 00:07:55,880 This is contrary to our intuition that salary difference of thousand dollar is quite small compared 91 00:07:55,880 --> 00:07:58,100 to in is defense of 50 years. 92 00:07:59,210 --> 00:08:05,150 A good way to handle this problem is to standardize the data so that all variables are given a mean 93 00:08:05,150 --> 00:08:07,670 of zero and a standard deviation of one. 94 00:08:09,350 --> 00:08:16,490 Then all variables will be on a compatible skin to standardize data in the background. 95 00:08:16,580 --> 00:08:19,370 Our software package will be using a formula like this. 96 00:08:20,480 --> 00:08:22,020 We do not need to bother about it. 97 00:08:23,510 --> 00:08:27,920 And just doing it for your students who would like to know what happened in the background. 98 00:08:29,630 --> 00:08:34,070 We just follow this formula to standardize all the variables in our dataset. 99 00:08:35,540 --> 00:08:39,620 We learn how to standardize variables in that software package. 100 00:08:40,090 --> 00:08:40,880 In becoming video.