1 00:00:02,450 --> 00:00:10,880 In this session, we will understand the maths behind Cannon can use nearest neighbor. 2 00:00:12,530 --> 00:00:19,870 Came here, a neighbor concept is used both in a classification supervised learning algorithm and also 3 00:00:19,880 --> 00:00:21,560 an unsupervised learning. 4 00:00:21,980 --> 00:00:29,390 OK, so let's explain this using an example I am trying to find in this case whether this green ball 5 00:00:29,390 --> 00:00:34,830 belongs to the group of blue balls or the group of Red Bulls. 6 00:00:35,510 --> 00:00:36,870 So how will I find that out? 7 00:00:37,340 --> 00:00:43,450 I will find the distance between this green ball and the nearest red and blue balls. 8 00:00:44,390 --> 00:00:47,070 OK, so that is all I do. 9 00:00:47,810 --> 00:00:48,950 And what is this key? 10 00:00:51,390 --> 00:00:58,320 He is the number of data points that I'm considering for the purpose of determining whether this blue 11 00:00:58,320 --> 00:01:03,370 ball belongs to a green ball, belongs to the Blue Group or to the red group. 12 00:01:03,930 --> 00:01:08,580 So what I do is I'm taking four points, Suki's four. 13 00:01:09,000 --> 00:01:13,950 So I compute the distance between this point and this Red Point. 14 00:01:14,100 --> 00:01:20,850 Similarly, I compute the distance between this point on this point and with each of the two blue points. 15 00:01:20,910 --> 00:01:25,080 So I compute the distance for four scenarios. 16 00:01:25,080 --> 00:01:31,020 Right, because I'm considering for nearest neighbors, how will I compute the distance? 17 00:01:31,230 --> 00:01:33,440 I use the equilibrium distance formula. 18 00:01:33,840 --> 00:01:34,150 Right. 19 00:01:34,440 --> 00:01:38,070 Suppose I have to find the distance between this and this point. 20 00:01:38,340 --> 00:01:43,680 OK, this point is x1 come on way run on this point, this extra come out. 21 00:01:44,280 --> 00:01:51,190 So the formula is x2 minus x1 the all squared plus Y to minus Y one the world square. 22 00:01:51,780 --> 00:01:54,780 So I compute for such distances. 23 00:01:56,200 --> 00:01:56,530 Clear. 24 00:01:58,510 --> 00:01:58,950 OK. 25 00:02:00,900 --> 00:02:08,610 The formula that I use this Euclidian, I can also use what is known as Manhattan distance, that is, 26 00:02:09,060 --> 00:02:15,960 instead of finding the distance through a straight line like this, I compute the distance from here 27 00:02:16,140 --> 00:02:16,840 to here. 28 00:02:16,860 --> 00:02:19,020 That is the base and then the height. 29 00:02:19,930 --> 00:02:28,000 OK, that's another way of computing the distance you can use any one of the two methods for computing 30 00:02:28,000 --> 00:02:28,720 the distance. 31 00:02:29,530 --> 00:02:34,420 OK, now let's explain this further using an example. 32 00:02:34,900 --> 00:02:40,000 I have seven data points for height, age and the corresponding weight. 33 00:02:40,980 --> 00:02:48,330 I'm being asked to predict the rate for the eighth data point that only Heighten agent provided. 34 00:02:49,210 --> 00:02:51,230 OK, so how will I compute? 35 00:02:52,270 --> 00:03:02,110 I compute the distance between this point five point eight height and age 37 with respect to each of 36 00:03:02,110 --> 00:03:03,370 these five data points. 37 00:03:03,850 --> 00:03:07,120 For that, I use the equilibrium formula right. 38 00:03:07,630 --> 00:03:12,410 In the case of six and 40, how will they compute Euclidean distance? 39 00:03:12,610 --> 00:03:15,910 It will be six minus five point eight. 40 00:03:15,910 --> 00:03:20,900 The whole square plus 40, minus thirty seven, the whole square. 41 00:03:21,610 --> 00:03:24,310 I add the two and then I take a square root. 42 00:03:24,910 --> 00:03:25,260 Right. 43 00:03:25,480 --> 00:03:27,940 I get a value of three point zero one. 44 00:03:28,450 --> 00:03:31,930 Similarly, I compute five point eight and thirty seven. 45 00:03:31,930 --> 00:03:34,670 I compare against six point one one twenty six. 46 00:03:35,110 --> 00:03:36,250 So on and so forth. 47 00:03:36,490 --> 00:03:39,970 I compute this equilibrium distance, ok. 48 00:03:40,900 --> 00:03:45,530 However, I am considering nearest neighbor of three only. 49 00:03:46,450 --> 00:03:49,120 OK, so I take. 50 00:03:50,240 --> 00:03:57,470 These three points in the nearest neighbors right in the distance is lawyer, that means it is closer 51 00:03:57,470 --> 00:03:57,890 to the. 52 00:03:58,960 --> 00:04:01,280 Five point eight and thirty seven data point, right? 53 00:04:01,480 --> 00:04:09,910 So I consider the three lawyers the distances and the corresponding rate, I think I add the three week 54 00:04:09,940 --> 00:04:14,400 that is at 78 and 60, I am divided by three. 55 00:04:14,740 --> 00:04:19,440 I'm taking an average and the wait I get is seventy two point six seven. 56 00:04:20,050 --> 00:04:27,220 So the weight corresponding to the height of five point eight and thirty seven is seventy two point 57 00:04:27,340 --> 00:04:28,060 six seven. 58 00:04:28,630 --> 00:04:29,830 That is my prediction. 59 00:04:30,870 --> 00:04:42,300 This is how I compute the predicted value using Cannon, just like in the case of regression and decision 60 00:04:42,300 --> 00:04:49,650 tree, the entire process of computing the nearest neighbors is done using a prebuilt library that is 61 00:04:49,650 --> 00:04:51,990 available both in Python as well as in R. 62 00:04:53,220 --> 00:04:55,020 OK, is this clear?