1 00:00:00,120 --> 00:00:06,390 Halaal, before going deep dive into the session, let's have a quick recap of what we all have done 2 00:00:06,390 --> 00:00:07,990 in all our previous session. 3 00:00:08,430 --> 00:00:14,170 So basically we have learned some basics behind Kenen why to use and when not to use. 4 00:00:14,190 --> 00:00:18,790 Yeah, we have definitely learned not to use this and whenever you use data. 5 00:00:18,990 --> 00:00:21,140 So definitely it would take lots of time. 6 00:00:21,420 --> 00:00:25,590 So it is always suggested to data scientist now. 7 00:00:25,590 --> 00:00:32,730 But you can and whenever you have huge data and we have also learned we have this use case as well that 8 00:00:32,730 --> 00:00:34,260 we have to solve in this session. 9 00:00:34,560 --> 00:00:37,290 And then we have learned the basics behind what is your. 10 00:00:37,680 --> 00:00:40,740 This is nothing but a distance between two points. 11 00:00:41,040 --> 00:00:47,760 And then we have learned something this this this use case, how Google Maps and all logos use the Manhattan 12 00:00:47,760 --> 00:00:50,100 business to compute distance between two places. 13 00:00:50,790 --> 00:00:52,220 Then we have learned something. 14 00:00:52,230 --> 00:00:57,330 This is a very basic idea behind what this method understands, which is nothing but the summation of 15 00:00:57,330 --> 00:00:58,240 this to us. 16 00:00:58,710 --> 00:01:04,420 So this is exactly the case that we have that we have to start in this over here. 17 00:01:04,560 --> 00:01:13,230 So basically, we have learned, yeah, we have to use a concept of Euclidean distance to compute distance 18 00:01:13,710 --> 00:01:17,600 what our data point we have with respect to all other data points. 19 00:01:17,610 --> 00:01:24,450 And here, let's say my values, K3, it means we have to find clean nearest neighbors with respect 20 00:01:24,450 --> 00:01:25,330 to this data point. 21 00:01:25,740 --> 00:01:31,650 Let's say I going to compute distance between each and every data point, let's say, from this today. 22 00:01:31,690 --> 00:01:38,730 Similarly, the similarities in the similarities, similarities to all the data points available over 23 00:01:38,730 --> 00:01:39,120 here. 24 00:01:39,690 --> 00:01:42,990 All the data points, all other data points. 25 00:01:42,990 --> 00:01:45,500 I'm going to calculate distance here. 26 00:01:45,840 --> 00:01:49,620 Let's say let's say the move on these things. 27 00:01:49,650 --> 00:01:50,000 Yeah. 28 00:01:50,430 --> 00:01:55,610 So let's say the three closest data points my eye can see. 29 00:01:55,620 --> 00:01:57,780 Let's say that this is my first away. 30 00:01:57,870 --> 00:02:00,900 That is very much good to and this is my second. 31 00:02:01,810 --> 00:02:05,610 Alexis, say this is my this is my third point. 32 00:02:05,920 --> 00:02:12,770 Let's say after calculating this to let me say, after calculating this death, we will try to select 33 00:02:12,810 --> 00:02:13,040 it up. 34 00:02:13,530 --> 00:02:19,680 We will try to select a point which are exactly nothing, but which is nothing but my two circles, 35 00:02:19,800 --> 00:02:21,030 my two circles. 36 00:02:21,270 --> 00:02:25,790 And one of this this is greater level than what we have to do. 37 00:02:26,190 --> 00:02:33,740 We have to basically compute probability of new data point with respect to with respect to this circle 38 00:02:33,870 --> 00:02:36,840 and with respect to this with respect to this. 39 00:02:36,870 --> 00:02:38,360 This is that ship level. 40 00:02:39,120 --> 00:02:45,960 So what we will do, we have to see what is our probability of new data point, which is exactly this 41 00:02:45,960 --> 00:02:46,740 diamond shape. 42 00:02:48,480 --> 00:02:51,570 With respect to this class, with respect to. 43 00:02:53,670 --> 00:02:59,340 With respect to this class, we have to compute it, so probability will be nothing, but I can say 44 00:02:59,370 --> 00:03:07,260 to white hope because you will see it's going to stop and total total datapoints or against the total 45 00:03:07,260 --> 00:03:08,130 shift at three. 46 00:03:08,460 --> 00:03:12,720 So it means it probably will be nothing but just just two or three. 47 00:03:13,110 --> 00:03:21,870 Similarly, similarly, with with respect to this this level, I have something one by will observe 48 00:03:22,170 --> 00:03:29,310 my new data point, which is Dayman ship has a highest probability with respect to this this circle 49 00:03:29,320 --> 00:03:29,750 ship. 50 00:03:30,450 --> 00:03:34,470 So it means it will belong to this class. 51 00:03:34,470 --> 00:03:38,210 It will belong to this circle class that I can see. 52 00:03:38,520 --> 00:03:40,660 Let me open the previous page. 53 00:03:40,720 --> 00:03:42,120 Yeah, I can see. 54 00:03:42,570 --> 00:03:43,470 I can see. 55 00:03:43,740 --> 00:03:45,660 This is this data point. 56 00:03:45,660 --> 00:03:51,270 This data point belongs to belongs to this circle ship plus four. 57 00:03:51,270 --> 00:03:52,110 Definitely. 58 00:03:52,740 --> 00:03:58,550 CNN is highly used for whenever we are looking for similar items. 59 00:03:58,830 --> 00:04:04,230 Let me open a new page and let's discuss some some characteristics of Ken. 60 00:04:04,350 --> 00:04:05,610 Let's discuss that. 61 00:04:05,610 --> 00:04:06,060 Discuss. 62 00:04:06,310 --> 00:04:13,370 So basically, basically this Giana you have heard about, this canon is lazy learning algorithm. 63 00:04:13,650 --> 00:04:18,090 So what exactly is the meaning of this this lazy loner concept? 64 00:04:18,510 --> 00:04:25,080 So we all have learned some basics or some basics of machine learning algorithm like like what is linear 65 00:04:25,080 --> 00:04:25,820 regression? 66 00:04:25,830 --> 00:04:32,300 Let's say what is decision tree and let's say what is random forest and what is logistic regression. 67 00:04:32,490 --> 00:04:39,150 So in all these algorithms, what we will try to learn, so we will try to learn basically some relationships 68 00:04:39,150 --> 00:04:40,100 in our data. 69 00:04:40,890 --> 00:04:47,220 But but but with respect to this, Gernon but with respect to this canon, it is not happening. 70 00:04:47,220 --> 00:04:57,510 So so basically, basically in this kenen in the scanning algorithm, unless and until you are not looking 71 00:04:57,510 --> 00:05:04,140 for a prediction, you are not looking for a prediction unless and until you don't have you don't have 72 00:05:04,140 --> 00:05:12,450 some testing data or you don't have or you don't have some unseen data and can and will never try to 73 00:05:12,450 --> 00:05:14,880 learn relation from data. 74 00:05:15,210 --> 00:05:24,210 What I can say, Kinen, this algorithm will never try to build some kind of structure from data a little 75 00:05:24,210 --> 00:05:24,520 bit. 76 00:05:24,570 --> 00:05:29,690 You don't have some test data unless and until you don't have some some unseen data. 77 00:05:30,660 --> 00:05:37,680 So basically, until unless you don't have unsign data, Kanal will not do anything that that's why 78 00:05:37,680 --> 00:05:38,150 it is known. 79 00:05:38,160 --> 00:05:48,180 And that's why this Donat Lazy Learning algorithm or or I can see this, this algorithm that as you 80 00:05:48,180 --> 00:05:52,410 will learn the basics of what is linear, what is, what is the decision. 81 00:05:52,410 --> 00:05:54,660 So they are trying to build some kind of model. 82 00:05:54,840 --> 00:06:01,650 But if I will talk about the scannon it now tries to build model it now or tries to build models of 83 00:06:01,650 --> 00:06:08,730 what it will do, it just calculate or I can see it, just try to check, distance it, just try to 84 00:06:08,730 --> 00:06:14,760 check the distance between your new data, whatever new data you have, or you can say whatever and 85 00:06:14,760 --> 00:06:15,780 see that you have. 86 00:06:16,050 --> 00:06:23,460 And basically on based off this distances, it is able to give you some kind of prediction because you 87 00:06:23,460 --> 00:06:29,310 will see in our use case, we have used something known as Euclidean distance for basically using this 88 00:06:29,310 --> 00:06:30,420 Euclidean distance. 89 00:06:30,420 --> 00:06:32,400 It is going to give you some kind of prediction. 90 00:06:32,400 --> 00:06:34,260 Yeah, it belongs to this class. 91 00:06:35,190 --> 00:06:44,100 So basically, I can see basically in this canon, in this can it in this algorithm, my entire dataset, 92 00:06:44,940 --> 00:06:53,850 my entire dataset itself itself is a model, my entire dataset itself is a model. 93 00:06:54,090 --> 00:07:01,710 So if my data would be very huge, then model will be huge and complexity would be huge and it will 94 00:07:01,710 --> 00:07:03,930 take more time to do prediction. 95 00:07:04,260 --> 00:07:09,660 That's why that's why this canon is not advisable. 96 00:07:09,660 --> 00:07:17,700 If you have if you had huge amount of data, let's say let's say in terms of Greece, in terms of let's 97 00:07:17,700 --> 00:07:23,690 say at terabytes or in terms of petabytes or you have some huge amount of data in some databases collected, 98 00:07:23,730 --> 00:07:28,110 your data is stored in some of the big data frameworks, let's say in some al-bakhit. 99 00:07:28,140 --> 00:07:30,320 Let's Appadurai, what about a bit? 100 00:07:30,660 --> 00:07:38,040 So whenever you have such a huge amount of data at that type of scenario, this Giana isn't recommended 101 00:07:38,040 --> 00:07:42,470 because this entire data will be considered as as a model. 102 00:07:43,170 --> 00:07:49,200 So if you have that much complex model, it makes no sense at all of this model. 103 00:07:49,380 --> 00:07:53,070 That is why it is not recommended whenever you have huge amount. 104 00:07:53,700 --> 00:07:59,870 So that's all about the session in the upcoming session, we are going to learn how exactly we can compute 105 00:07:59,910 --> 00:08:06,870 distance between categorical data and we are also going to learn how this cannon is used in case of 106 00:08:06,870 --> 00:08:09,960 regression and different expect as well. 107 00:08:10,320 --> 00:08:11,440 So that's all about it. 108 00:08:11,460 --> 00:08:13,200 Hopefully you'll love the session very much. 109 00:08:13,470 --> 00:08:14,160 Thank you. 110 00:08:14,230 --> 00:08:15,150 How should I stay? 111 00:08:15,420 --> 00:08:16,320 Keep learning. 112 00:08:16,320 --> 00:08:17,100 Keep growing. 113 00:08:17,190 --> 00:08:18,120 Keep practicing.