1 00:00:00,180 --> 00:00:00,660 Hello. 2 00:00:00,960 --> 00:00:07,530 So today in this session, we are going to learn what exactly is this Gernon, which is nothing but 3 00:00:07,530 --> 00:00:16,620 my key nearest neighbor's algorithm, which is used both of the cases, whether we have some classification 4 00:00:16,620 --> 00:00:20,550 uses or whether we have some regression using it. 5 00:00:20,820 --> 00:00:23,580 So it is used in both of the scenarios. 6 00:00:23,760 --> 00:00:27,180 But yeah, there is some limitation to this canon. 7 00:00:27,540 --> 00:00:29,240 So what what is that limitation? 8 00:00:29,520 --> 00:00:38,040 So one hour, whenever you have huge data, so whenever you have some huge data, never used this again 9 00:00:38,220 --> 00:00:44,880 because it is it is always advisable to data scientist whenever they are going to work for real. 10 00:00:44,880 --> 00:00:48,070 One aspect, they are not the answer to that. 11 00:00:48,090 --> 00:00:50,040 Yeah, you have to use this scanning. 12 00:00:50,460 --> 00:00:51,870 Definitely because. 13 00:00:51,870 --> 00:00:53,730 Because computation wise. 14 00:00:53,880 --> 00:00:58,350 Computation wise, this CNN takes a lot of time. 15 00:00:58,530 --> 00:00:59,990 It takes lots of time. 16 00:01:00,000 --> 00:01:01,500 So definitely computation. 17 00:01:01,710 --> 00:01:03,520 It is too much costly. 18 00:01:03,810 --> 00:01:08,140 So there is not any disabusing kennen in real world scenario. 19 00:01:08,160 --> 00:01:13,980 So whenever you are going to perform real what to expect whenever you have data in terms of like say 20 00:01:14,250 --> 00:01:17,850 gee whiz, let's say in terabytes, in petabytes. 21 00:01:18,040 --> 00:01:23,330 So whenever you have such a huge amount of data, always tried to ignore again. 22 00:01:23,370 --> 00:01:27,060 And at that particular time, if you don't have that much huge amount of data. 23 00:01:27,450 --> 00:01:29,150 Yeah, always, always go ahead of it. 24 00:01:29,430 --> 00:01:31,140 This is not a major issue regarding that. 25 00:01:31,320 --> 00:01:36,240 So let's talk about what exactly is this and what what what are the basics? 26 00:01:36,240 --> 00:01:41,940 What what exactly the mathematics behind this so that it was let's understand, what is this key? 27 00:01:42,210 --> 00:01:43,980 What exactly is this? 28 00:01:44,460 --> 00:01:52,140 So this is nothing, but this is nothing but number of nearest neighbor's number of nearest neighbors 29 00:01:52,410 --> 00:01:58,260 to are particular against the particular data point that that we have to calculate, that we have to 30 00:01:58,500 --> 00:01:59,820 basically calculate. 31 00:02:00,120 --> 00:02:07,260 So the question is how we have to calculate, how we have to calculate this this game for basically 32 00:02:07,260 --> 00:02:08,850 the are two major approaches. 33 00:02:09,330 --> 00:02:15,870 The very first one is just just to use this cross-validation, just to use our approach cross-validation 34 00:02:15,870 --> 00:02:22,770 in which you can go ahead with this randomized CV that that often you have heard about it and exactly 35 00:02:22,770 --> 00:02:30,180 what grid search c.v you can you can just go ahead with this and put some random value of, let's say, 36 00:02:30,180 --> 00:02:32,940 two, three to 10 trillion. 37 00:02:33,180 --> 00:02:39,480 And definitely you will get your best value of care suitable for your use case that that's what this 38 00:02:39,690 --> 00:02:42,390 randomized search and this grid search will do. 39 00:02:43,410 --> 00:02:48,540 So let me consider some very basic uses to explain it in a very easiest way. 40 00:02:48,600 --> 00:02:51,300 Let me let let's let this has some data. 41 00:02:51,690 --> 00:02:55,370 This is data that is plotted over here in your scatterplot. 42 00:02:55,620 --> 00:02:57,450 Let's say this is entire data. 43 00:02:57,480 --> 00:03:04,800 You you will say let's say the data and this circle should be the let's say the circles of data is exactly 44 00:03:04,800 --> 00:03:08,070 of my next class one or of my level one. 45 00:03:08,510 --> 00:03:09,400 Let's say this. 46 00:03:09,420 --> 00:03:14,390 This is called shape is exactly my class to I can see it is my living. 47 00:03:14,400 --> 00:03:20,390 Do whatever you get as it's all up to you support in the future whenever you are going to work on some 48 00:03:20,400 --> 00:03:24,930 some project, let's say let's say some new data will come, let's say some new detail. 49 00:03:24,930 --> 00:03:31,350 You can see your test data, you can say this test data or you can say it is my unseen data. 50 00:03:31,380 --> 00:03:36,480 Let's say this is my unseen data on which I have to do some prediction depending upon depending upon 51 00:03:36,480 --> 00:03:36,960 the source. 52 00:03:37,440 --> 00:03:39,750 Let's see if I'm going to block the data point. 53 00:03:39,750 --> 00:03:45,380 Let's say let's say this is exactly that data point, that this is exactly that data point. 54 00:03:45,720 --> 00:03:52,620 And let's say you have to predict we have to predict this, this, this this data point that you have 55 00:03:52,620 --> 00:03:53,610 mapped over here. 56 00:03:53,940 --> 00:03:57,320 This will belong to either plus one or plus two. 57 00:03:57,870 --> 00:04:06,570 So just think how you can put this to which plus two which plus this this data points belong to whether 58 00:04:06,570 --> 00:04:10,440 it will belong to one or whether it belongs to plus two. 59 00:04:11,510 --> 00:04:18,050 So, yeah, definitely, if you are going to use Kiana over here, yeah, this case is just like a piece 60 00:04:18,050 --> 00:04:20,540 of cake, just like a piece of cake. 61 00:04:20,990 --> 00:04:21,400 Yeah. 62 00:04:21,410 --> 00:04:28,970 So let's say let's say let's say you are going to use Kenen over here to find this this datapoint belong 63 00:04:28,970 --> 00:04:29,810 to which class. 64 00:04:30,150 --> 00:04:33,680 So let's say I'm going to assume equal to three. 65 00:04:33,680 --> 00:04:36,910 Just just hypothetically, I'm going to assume my gay values. 66 00:04:37,140 --> 00:04:45,050 It means I'm going to consider three nearest neighbors to this data point three near Sibos to this data 67 00:04:45,050 --> 00:04:47,360 point, not a question will arise. 68 00:04:47,360 --> 00:04:53,460 How I will compute, how I will compute what are exactly my three nearest neighbors. 69 00:04:53,690 --> 00:04:59,990 It means you have to use a concept of distance that what you have often had in your school is because 70 00:05:00,020 --> 00:05:04,940 if let's say if I know distance between the state of mind to each other, data point with each other, 71 00:05:04,940 --> 00:05:11,020 data that are scattered over here, I can definitely get to know what are my three nearest neighbors. 72 00:05:11,210 --> 00:05:13,520 It means I have to use the concept of distance. 73 00:05:14,390 --> 00:05:18,890 So definitely, if you have heard about something, if you have heard about this formula, which is 74 00:05:18,890 --> 00:05:25,490 nothing but excellent, excellent scalper's, why to minus Viviana's Grand Huelskamp and this one, 75 00:05:25,520 --> 00:05:32,020 this is nothing but just a distance between two points, distance between two points, having coded 76 00:05:32,060 --> 00:05:35,840 it as X and Y one and x2 y2 in some. 77 00:05:35,840 --> 00:05:36,650 In some plain. 78 00:05:37,190 --> 00:05:37,940 In some plain. 79 00:05:37,940 --> 00:05:39,440 In a two dimensional pain. 80 00:05:39,920 --> 00:05:46,790 So basically this is a formula that you have often done in your school days or definitely you will you 81 00:05:46,790 --> 00:05:50,050 will learn this in your school is now a question that arise. 82 00:05:50,210 --> 00:05:53,550 What if, what if, what if I don't have it. 83 00:05:53,590 --> 00:05:54,700 I don't have that. 84 00:05:54,790 --> 00:06:01,700 I have some this data point is gathered in such a way that it is scattered in, let's say, in and dimensions 85 00:06:01,700 --> 00:06:03,320 or in some teeny dimension. 86 00:06:03,590 --> 00:06:07,580 So it means it means that we open a new page and nobody let me open it. 87 00:06:07,600 --> 00:06:07,790 Yeah. 88 00:06:08,150 --> 00:06:13,540 Let's say this is my data point to this data point and that this is a data point that we have to compute 89 00:06:13,540 --> 00:06:19,010 the standard that let's say it's this, it's coordinates in three dimensional is X and Y and Z one. 90 00:06:19,280 --> 00:06:22,820 And it is nothing but X to Y to Alexis. 91 00:06:23,040 --> 00:06:29,350 Some some let's say you have to compute distance between, OK, so what we have to do, you have to 92 00:06:29,360 --> 00:06:35,150 just use the formula, which is nothing but excellent as excellent a square form for your X coordinate 93 00:06:35,510 --> 00:06:36,670 similar do a.. 94 00:06:36,680 --> 00:06:36,920 Why. 95 00:06:36,920 --> 00:06:39,020 What is good for the Y coordinate. 96 00:06:39,350 --> 00:06:44,130 Similarly something check two minus one for your chart government. 97 00:06:44,360 --> 00:06:48,140 Similarly, if you have any dimensions it will get it will get expanded. 98 00:06:48,950 --> 00:06:50,990 Simple it just like an expansion. 99 00:06:51,230 --> 00:06:56,000 So this is exactly this is exactly my Euclidean distance. 100 00:06:56,840 --> 00:06:59,180 You have heard, you have definitely heard about this. 101 00:06:59,210 --> 00:07:02,120 This is exactly my Euclidean distance. 102 00:07:02,120 --> 00:07:11,780 So basically Kinen uses this distance internally to find my plane nearest neighbors for that particular 103 00:07:11,780 --> 00:07:20,090 data point for which I have to put in some use it Gearman Ganon also uses or you can also modify or 104 00:07:20,090 --> 00:07:26,150 you can also modify, also modify something known as Manhattan distance. 105 00:07:26,330 --> 00:07:34,010 So instead of using this this Euclidean distance at some particular places, you can use this Manhattan 106 00:07:34,010 --> 00:07:34,520 distance. 107 00:07:34,700 --> 00:07:38,390 So what exactly is this Manhattan distance? 108 00:07:38,570 --> 00:07:40,100 So Manhattan distance is nothing. 109 00:07:40,100 --> 00:07:47,090 But you can see it is just my absolute distance, if you have heard it is just my absolute distance 110 00:07:47,090 --> 00:07:49,460 and and it has many names. 111 00:07:49,460 --> 00:07:50,090 Let's say it. 112 00:07:50,540 --> 00:07:52,550 What is your city block distance. 113 00:07:52,850 --> 00:07:53,990 What is your city block. 114 00:07:54,410 --> 00:07:56,240 So this is also my Manhattan distance. 115 00:07:56,240 --> 00:07:59,360 Let's say you have heard your Alver not logician. 116 00:08:00,140 --> 00:08:01,970 That is exactly Manhattan distance. 117 00:08:02,420 --> 00:08:02,870 Let's say. 118 00:08:02,990 --> 00:08:04,310 Let's see if I would ask you. 119 00:08:04,320 --> 00:08:04,670 Yeah. 120 00:08:04,700 --> 00:08:05,150 Tell me. 121 00:08:05,390 --> 00:08:05,660 Yeah. 122 00:08:05,660 --> 00:08:14,390 Let's say I am traveling from from point A to point it say to before from point X two point one. 123 00:08:14,720 --> 00:08:20,750 Let's say, let's say this is my let's say this is my path and if I have to compute distance I can easily 124 00:08:20,750 --> 00:08:23,810 go ahead with this, this, this Euclidean distance. 125 00:08:23,810 --> 00:08:29,330 So definitely it has some it has some coordinates like the explanation and it has some Kornet extra 126 00:08:29,330 --> 00:08:29,660 weight. 127 00:08:29,960 --> 00:08:31,640 So I can definitely compute distance. 128 00:08:32,030 --> 00:08:34,540 But what if it let me open a new page. 129 00:08:34,640 --> 00:08:42,050 Yeah, but what if it I have to travel from point A to point why. 130 00:08:42,440 --> 00:08:45,680 But but why are x1 x to x3. 131 00:08:45,890 --> 00:08:46,670 Let's elected. 132 00:08:46,670 --> 00:08:48,950 This is my next to this is my path. 133 00:08:49,370 --> 00:08:50,490 That's a this is my path. 134 00:08:50,610 --> 00:08:52,820 Let's say this is my Xoom. 135 00:08:52,880 --> 00:08:56,050 Let's this is my X to it, this is my X three. 136 00:08:56,210 --> 00:08:58,220 And if I will ask you. 137 00:08:58,790 --> 00:08:59,150 Yeah. 138 00:08:59,180 --> 00:09:03,950 How you can compute the distance between this, this X and distance between. 139 00:09:03,950 --> 00:09:07,190 I can see this with X and Y how you can compute it. 140 00:09:07,370 --> 00:09:10,950 So let's say I'm traveling, I'm traveling from point at. 141 00:09:11,260 --> 00:09:18,520 To point why and I am going to say I have to compute its distance, so in such case, in such type of 142 00:09:18,520 --> 00:09:28,780 use cases, I have to use our Manhattan distance, I have to use a word Manhattan distance, like like 143 00:09:28,780 --> 00:09:36,250 how Google Maps you have heard, like how Google Maps and how all the other uses for definitely internally 144 00:09:36,550 --> 00:09:42,970 they use this they uses this Manhattan distance to calculate distance between two places. 145 00:09:43,030 --> 00:09:44,080 That's how the uses. 146 00:09:44,410 --> 00:09:47,790 So basically using the approach, how, how they will compute. 147 00:09:48,790 --> 00:09:53,860 So they will compute like this, like from this one and this then again this. 148 00:09:54,280 --> 00:09:54,940 Then this. 149 00:09:55,390 --> 00:09:56,080 Then this. 150 00:09:56,080 --> 00:09:56,680 Then this. 151 00:09:57,460 --> 00:09:58,150 Then this. 152 00:09:59,050 --> 00:10:04,750 And this and again, this, that, again, would moved from this, then again this, then again, that 153 00:10:04,750 --> 00:10:08,740 again and again and I am going to reach it to my place. 154 00:10:09,130 --> 00:10:13,690 So in this way, in this way, they can come up with some distance. 155 00:10:15,040 --> 00:10:21,130 So this is exactly my this is exactly my Manhattan distance. 156 00:10:21,340 --> 00:10:28,720 Or if I have to explain you in a very layman's terms and I can say, Febles, this this this Manhattan, 157 00:10:28,720 --> 00:10:36,580 this place is nothing but calculating distance between real vectors using some of the absolute difference. 158 00:10:37,420 --> 00:10:37,640 Yeah. 159 00:10:37,840 --> 00:10:39,480 Similar similar similar problem statement. 160 00:10:39,550 --> 00:10:44,270 Suppose suppose that this is my point X and this is my wife. 161 00:10:44,740 --> 00:10:52,300 So basically this is exactly my this is exactly my this is exactly my Euclidean distance but. 162 00:10:53,270 --> 00:11:00,400 Going from this way and again, going from this way, so whatever it will be, whatever it will be, 163 00:11:01,040 --> 00:11:04,580 it is exactly my Manhattan distance. 164 00:11:06,020 --> 00:11:14,810 Whereas whereas this is this is exactly my Euclidean distance and this is like, say, this is my rectory 165 00:11:14,810 --> 00:11:15,990 and this is my Vectibix. 166 00:11:16,160 --> 00:11:22,490 So summation of this victory and B is exactly my Manhattan. 167 00:11:23,120 --> 00:11:30,860 That's a basic a very basic mathematics behind this Euclidian as well as Manhattan D.A. We have something 168 00:11:30,860 --> 00:11:32,480 known as having dist.. 169 00:11:32,480 --> 00:11:34,370 Yeah, we have some quinonez. 170 00:11:34,790 --> 00:11:40,420 We have something known as having this test, which we all are going to cover in our little session 171 00:11:40,880 --> 00:11:45,270 that that we will basically use in terms of categorical data. 172 00:11:45,270 --> 00:11:51,020 So whenever we have them, whenever we have to compute the between some categorical data, categorical 173 00:11:51,020 --> 00:11:55,990 data and such type of use case, we have to use something known as this hanging distance. 174 00:11:56,000 --> 00:12:02,320 So that is all about the basics of Ken and what is you clearly understand what is Manhattan distance? 175 00:12:02,630 --> 00:12:05,320 We have also learned the basics behind this. 176 00:12:05,510 --> 00:12:06,650 What is this use case? 177 00:12:06,650 --> 00:12:13,130 What the value of K, how to compute value of K, basically using our cross validation approaches and 178 00:12:13,130 --> 00:12:17,940 here I'm going to assume a value of K Street, then we have to solve this case as well. 179 00:12:18,230 --> 00:12:23,060 So in the upcoming session we are going to solve this use case, whatever we have this value of K and 180 00:12:23,060 --> 00:12:26,220 how you can set your nearest neighbor. 181 00:12:26,480 --> 00:12:27,860 So that's all about the session. 182 00:12:27,890 --> 00:12:29,120 Hope you love it very much. 183 00:12:29,480 --> 00:12:30,140 Thank you. 184 00:12:30,200 --> 00:12:31,190 Have a nice day. 185 00:12:31,220 --> 00:12:32,180 Keep learning. 186 00:12:32,180 --> 00:12:33,020 Keep growing. 187 00:12:33,230 --> 00:12:34,040 Keep practicing.