1 00:00:00,090 --> 00:00:00,530 Hello. 2 00:00:00,750 --> 00:00:05,880 So before we're in deep dive into the session, let's have a walkthrough on all our previous session 3 00:00:05,880 --> 00:00:08,110 that we all have done with respect to Ken. 4 00:00:08,610 --> 00:00:15,990 So basically, we have this huge case how we have to solve this thing using concept of Euclidean distance. 5 00:00:16,000 --> 00:00:21,720 You can come up with this new data point basically belongs to this this this class one. 6 00:00:21,720 --> 00:00:23,430 This is circle class. 7 00:00:23,790 --> 00:00:28,800 And then we have also got some basics regarding this canid how to compute. 8 00:00:28,800 --> 00:00:31,410 This man had then what is exactly Euclidian. 9 00:00:31,590 --> 00:00:37,200 And then we have also learned why this Janet is a lazy learning algorithm. 10 00:00:37,200 --> 00:00:44,700 And then we also cover up why this Ganon is not recommended if you have some huge amount of data, let's 11 00:00:44,700 --> 00:00:50,940 say if your data is in some big data framework's in some Apache Hypertrophied or Blixen in some Hadoop 12 00:00:50,940 --> 00:00:51,500 clusters. 13 00:00:51,870 --> 00:00:55,380 So at the time, Ganin is not recommended then. 14 00:00:55,380 --> 00:01:00,440 We have also because because at the time that that entire data is considered as a model itself. 15 00:01:00,690 --> 00:01:05,300 So it makes no sense at all having such a complex model. 16 00:01:05,910 --> 00:01:08,450 So we have all covered this in all of this. 17 00:01:09,210 --> 00:01:15,720 So in this session, what we have to learn, we have to learn basically how to compute, how to compute, 18 00:01:15,720 --> 00:01:17,640 basically distance. 19 00:01:18,620 --> 00:01:25,730 Between let's see what if we have a categorical data in our use case, how to compute distance between 20 00:01:25,880 --> 00:01:27,830 categorical data, how to compute? 21 00:01:28,400 --> 00:01:31,810 Basically we have something known as Heming Distance. 22 00:01:32,450 --> 00:01:35,390 So using this, we can come up what exactly? 23 00:01:35,390 --> 00:01:37,670 The distance between our categorical data. 24 00:01:37,970 --> 00:01:44,780 Let's say let's say I have some let's have some categorically say I have some categorical data, say 25 00:01:44,780 --> 00:01:46,060 A, B, C, D or two. 26 00:01:46,070 --> 00:01:47,300 This is write down A. 27 00:01:47,740 --> 00:01:52,110 This is my E, EDF say this is this is the entire thing. 28 00:01:52,310 --> 00:01:59,690 So if I will see you how you can compute distance between distance between this and distance between 29 00:01:59,690 --> 00:02:01,250 this, this, all these things. 30 00:02:01,820 --> 00:02:10,260 So in case of categorical, we try to check correctly my character, basically character by character. 31 00:02:10,640 --> 00:02:14,330 So you will see in this A2A it is, it is the same. 32 00:02:14,510 --> 00:02:16,320 It means it's, it is zero. 33 00:02:16,700 --> 00:02:23,720 Similarly, Betawi, they are different segments, its values one if similar than zero, if not similar 34 00:02:23,720 --> 00:02:24,600 than one simple. 35 00:02:24,930 --> 00:02:31,550 So one and one over here then what we will do then using this, having this tenth what I'm going to 36 00:02:31,550 --> 00:02:34,940 do it says summation of. 37 00:02:36,270 --> 00:02:39,510 All Vaughn's summation of all one. 38 00:02:40,530 --> 00:02:46,890 Where you're X doesn't equal to Y, that's what this having distance will see. 39 00:02:47,310 --> 00:02:55,110 So if I have to write it mathematically that I can say it is nothing but submission of equals to one 40 00:02:55,110 --> 00:03:01,290 to bill and still the number of characters I have, the number of characters I have and submission of 41 00:03:01,290 --> 00:03:06,780 all the ones I can see where my X doesn't equal to Y. 42 00:03:07,140 --> 00:03:15,120 That's what this is, having distances, some simple distance and nothing but just one one one three. 43 00:03:15,390 --> 00:03:15,880 Simple. 44 00:03:16,170 --> 00:03:18,570 So I can see the distance between this and distance between. 45 00:03:18,570 --> 00:03:20,110 This is nothing but three. 46 00:03:20,590 --> 00:03:26,970 So let's talk about a very basic justis where we can use this having this dense concept. 47 00:03:27,300 --> 00:03:30,130 So let me open a new page very first opening. 48 00:03:30,540 --> 00:03:35,220 Suppose suppose I have here some data to support supposin class one support. 49 00:03:35,250 --> 00:03:36,690 This is data of cluster. 50 00:03:37,640 --> 00:03:44,180 And suppose I have somewhere have some data points of class to support here, I have some new data and 51 00:03:44,180 --> 00:03:49,920 on the basis of that, I have prepared get it belong to class, to a class one that we have to some 52 00:03:50,270 --> 00:03:57,430 of the data of class one is let's say let's say evictee, let's say ABC. 53 00:03:57,770 --> 00:04:01,520 So this is my other data and let's say ABC. 54 00:04:03,060 --> 00:04:11,310 And let's say ABC, this is entitled Classon, similarly, I have some data with respect to X, Y, 55 00:04:11,310 --> 00:04:17,000 Z, A similarly Oread, I have X, Y, be similarly over here. 56 00:04:17,010 --> 00:04:24,090 I have something X, Y, a similar law here I have something X, Y, a, yeah. 57 00:04:24,900 --> 00:04:29,090 So whereas in class two I have all this data, this is only two plus one. 58 00:04:29,640 --> 00:04:31,770 So let's say I have to predict four. 59 00:04:32,130 --> 00:04:42,300 I have to predict for this X is Sini belongs to which class it belongs to plus one or whether it belongs 60 00:04:42,300 --> 00:04:43,170 to class two. 61 00:04:43,440 --> 00:04:46,320 We have to protect this then. 62 00:04:46,440 --> 00:04:55,230 Then when my guess is three it means, it means I have to search for three nearest neighbors. 63 00:04:55,830 --> 00:05:02,870 Then once we have three nearest neighbor, then we will consider the probability of the higher count. 64 00:05:02,880 --> 00:05:05,460 What I can say we will go with basically Mesabi. 65 00:05:05,460 --> 00:05:08,120 That's that's what is the exact meaning of this goes to three. 66 00:05:08,940 --> 00:05:16,710 So in such cases, in such cases, whenever I have this this justis, I can't consider my Euclidean 67 00:05:16,710 --> 00:05:19,410 distance because this is my categorical data. 68 00:05:19,560 --> 00:05:23,280 In such case, I have to go with my having distance. 69 00:05:23,670 --> 00:05:25,480 I have to go with this having mister. 70 00:05:25,820 --> 00:05:27,060 To me it is this. 71 00:05:27,450 --> 00:05:27,740 Yeah. 72 00:05:28,950 --> 00:05:29,280 So what. 73 00:05:29,280 --> 00:05:30,040 We will do it here. 74 00:05:30,330 --> 00:05:35,260 We will compute distance between distances will compute distance between this, this distance, between 75 00:05:35,260 --> 00:05:36,810 this, this distance between. 76 00:05:36,810 --> 00:05:43,470 This is similar here we have to compute between the distance, between this distance, between this 77 00:05:43,470 --> 00:05:46,110 and similarly we have to compute distance between this. 78 00:05:46,440 --> 00:05:47,820 So let me come to a distance. 79 00:05:48,120 --> 00:05:52,620 So basically over here you have two same terms, basically. 80 00:05:52,890 --> 00:05:55,350 So distance would be nothing but two. 81 00:05:55,620 --> 00:06:00,930 Similarly, you here I have just a single thing, which is exactly. 82 00:06:00,930 --> 00:06:01,640 Which is why I see. 83 00:06:01,660 --> 00:06:08,220 So here my t similar here I have distance as fort because tachometer now the common character I can 84 00:06:08,220 --> 00:06:13,350 see similarly here I have this data three similarly warrior I have three. 85 00:06:13,350 --> 00:06:17,940 You will see I have just a single common tongue which is exactly X.. 86 00:06:18,360 --> 00:06:19,260 Similarly what here. 87 00:06:19,260 --> 00:06:20,820 I have three here. 88 00:06:20,820 --> 00:06:23,220 I have two here I have again two. 89 00:06:23,790 --> 00:06:24,660 Here I have again. 90 00:06:25,170 --> 00:06:30,260 So you will see, you will see two nearest neighbors to Nusi. 91 00:06:30,270 --> 00:06:32,430 Both are of basically class two. 92 00:06:33,420 --> 00:06:43,680 So, so what we can do here so we can see, we can see this X is as, as X SCD has two nearest neighbours 93 00:06:43,680 --> 00:06:47,970 of class two because you will see if the distance is two is distances two. 94 00:06:48,060 --> 00:06:50,370 It means these are my two nearest neighbours. 95 00:06:50,880 --> 00:06:58,320 So it means, it means this X if this X is belongs to class to. 96 00:07:00,050 --> 00:07:07,340 That said, your faith, your statement gets of sync, but you have to just come to the stance, that's 97 00:07:07,340 --> 00:07:07,620 it. 98 00:07:07,640 --> 00:07:09,410 So that's all about the situation. 99 00:07:09,410 --> 00:07:15,890 In the upcoming session, we are going to learn what exactly are the real life use cases of Jennet and 100 00:07:16,040 --> 00:07:18,230 what is pros and cons of Ken. 101 00:07:18,230 --> 00:07:20,000 And we are also going to learn about this. 102 00:07:20,240 --> 00:07:21,350 So that's all about it. 103 00:07:21,400 --> 00:07:22,780 Hope you love it very much. 104 00:07:22,820 --> 00:07:23,510 Thank you. 105 00:07:23,570 --> 00:07:24,460 How should I stay? 106 00:07:24,470 --> 00:07:25,460 Keep learning. 107 00:07:25,460 --> 00:07:26,270 Keep growing. 108 00:07:26,630 --> 00:07:27,430 Keep dancing.