1
00:00:00,120 --> 00:00:06,390
Halaal, before going deep dive into the session, let's have a quick recap of what we all have done

2
00:00:06,390 --> 00:00:07,990
in all our previous session.

3
00:00:08,430 --> 00:00:14,170
So basically we have learned some basics behind Kenen why to use and when not to use.

4
00:00:14,190 --> 00:00:18,790
Yeah, we have definitely learned not to use this and whenever you use data.

5
00:00:18,990 --> 00:00:21,140
So definitely it would take lots of time.

6
00:00:21,420 --> 00:00:25,590
So it is always suggested to data scientist now.

7
00:00:25,590 --> 00:00:32,730
But you can and whenever you have huge data and we have also learned we have this use case as well that

8
00:00:32,730 --> 00:00:34,260
we have to solve in this session.

9
00:00:34,560 --> 00:00:37,290
And then we have learned the basics behind what is your.

10
00:00:37,680 --> 00:00:40,740
This is nothing but a distance between two points.

11
00:00:41,040 --> 00:00:47,760
And then we have learned something this this this use case, how Google Maps and all logos use the Manhattan

12
00:00:47,760 --> 00:00:50,100
business to compute distance between two places.

13
00:00:50,790 --> 00:00:52,220
Then we have learned something.

14
00:00:52,230 --> 00:00:57,330
This is a very basic idea behind what this method understands, which is nothing but the summation of

15
00:00:57,330 --> 00:00:58,240
this to us.

16
00:00:58,710 --> 00:01:04,420
So this is exactly the case that we have that we have to start in this over here.

17
00:01:04,560 --> 00:01:13,230
So basically, we have learned, yeah, we have to use a concept of Euclidean distance to compute distance

18
00:01:13,710 --> 00:01:17,600
what our data point we have with respect to all other data points.

19
00:01:17,610 --> 00:01:24,450
And here, let's say my values, K3, it means we have to find clean nearest neighbors with respect

20
00:01:24,450 --> 00:01:25,330
to this data point.

21
00:01:25,740 --> 00:01:31,650
Let's say I going to compute distance between each and every data point, let's say, from this today.

22
00:01:31,690 --> 00:01:38,730
Similarly, the similarities in the similarities, similarities to all the data points available over

23
00:01:38,730 --> 00:01:39,120
here.

24
00:01:39,690 --> 00:01:42,990
All the data points, all other data points.

25
00:01:42,990 --> 00:01:45,500
I'm going to calculate distance here.

26
00:01:45,840 --> 00:01:49,620
Let's say let's say the move on these things.

27
00:01:49,650 --> 00:01:50,000
Yeah.

28
00:01:50,430 --> 00:01:55,610
So let's say the three closest data points my eye can see.

29
00:01:55,620 --> 00:01:57,780
Let's say that this is my first away.

30
00:01:57,870 --> 00:02:00,900
That is very much good to and this is my second.

31
00:02:01,810 --> 00:02:05,610
Alexis, say this is my this is my third point.

32
00:02:05,920 --> 00:02:12,770
Let's say after calculating this to let me say, after calculating this death, we will try to select

33
00:02:12,810 --> 00:02:13,040
it up.

34
00:02:13,530 --> 00:02:19,680
We will try to select a point which are exactly nothing, but which is nothing but my two circles,

35
00:02:19,800 --> 00:02:21,030
my two circles.

36
00:02:21,270 --> 00:02:25,790
And one of this this is greater level than what we have to do.

37
00:02:26,190 --> 00:02:33,740
We have to basically compute probability of new data point with respect to with respect to this circle

38
00:02:33,870 --> 00:02:36,840
and with respect to this with respect to this.

39
00:02:36,870 --> 00:02:38,360
This is that ship level.

40
00:02:39,120 --> 00:02:45,960
So what we will do, we have to see what is our probability of new data point, which is exactly this

41
00:02:45,960 --> 00:02:46,740
diamond shape.

42
00:02:48,480 --> 00:02:51,570
With respect to this class, with respect to.

43
00:02:53,670 --> 00:02:59,340
With respect to this class, we have to compute it, so probability will be nothing, but I can say

44
00:02:59,370 --> 00:03:07,260
to white hope because you will see it's going to stop and total total datapoints or against the total

45
00:03:07,260 --> 00:03:08,130
shift at three.

46
00:03:08,460 --> 00:03:12,720
So it means it probably will be nothing but just just two or three.

47
00:03:13,110 --> 00:03:21,870
Similarly, similarly, with with respect to this this level, I have something one by will observe

48
00:03:22,170 --> 00:03:29,310
my new data point, which is Dayman ship has a highest probability with respect to this this circle

49
00:03:29,320 --> 00:03:29,750
ship.

50
00:03:30,450 --> 00:03:34,470
So it means it will belong to this class.

51
00:03:34,470 --> 00:03:38,210
It will belong to this circle class that I can see.

52
00:03:38,520 --> 00:03:40,660
Let me open the previous page.

53
00:03:40,720 --> 00:03:42,120
Yeah, I can see.

54
00:03:42,570 --> 00:03:43,470
I can see.

55
00:03:43,740 --> 00:03:45,660
This is this data point.

56
00:03:45,660 --> 00:03:51,270
This data point belongs to belongs to this circle ship plus four.

57
00:03:51,270 --> 00:03:52,110
Definitely.

58
00:03:52,740 --> 00:03:58,550
CNN is highly used for whenever we are looking for similar items.

59
00:03:58,830 --> 00:04:04,230
Let me open a new page and let's discuss some some characteristics of Ken.

60
00:04:04,350 --> 00:04:05,610
Let's discuss that.

61
00:04:05,610 --> 00:04:06,060
Discuss.

62
00:04:06,310 --> 00:04:13,370
So basically, basically this Giana you have heard about, this canon is lazy learning algorithm.

63
00:04:13,650 --> 00:04:18,090
So what exactly is the meaning of this this lazy loner concept?

64
00:04:18,510 --> 00:04:25,080
So we all have learned some basics or some basics of machine learning algorithm like like what is linear

65
00:04:25,080 --> 00:04:25,820
regression?

66
00:04:25,830 --> 00:04:32,300
Let's say what is decision tree and let's say what is random forest and what is logistic regression.

67
00:04:32,490 --> 00:04:39,150
So in all these algorithms, what we will try to learn, so we will try to learn basically some relationships

68
00:04:39,150 --> 00:04:40,100
in our data.

69
00:04:40,890 --> 00:04:47,220
But but but with respect to this, Gernon but with respect to this canon, it is not happening.

70
00:04:47,220 --> 00:04:57,510
So so basically, basically in this kenen in the scanning algorithm, unless and until you are not looking

71
00:04:57,510 --> 00:05:04,140
for a prediction, you are not looking for a prediction unless and until you don't have you don't have

72
00:05:04,140 --> 00:05:12,450
some testing data or you don't have or you don't have some unseen data and can and will never try to

73
00:05:12,450 --> 00:05:14,880
learn relation from data.

74
00:05:15,210 --> 00:05:24,210
What I can say, Kinen, this algorithm will never try to build some kind of structure from data a little

75
00:05:24,210 --> 00:05:24,520
bit.

76
00:05:24,570 --> 00:05:29,690
You don't have some test data unless and until you don't have some some unseen data.

77
00:05:30,660 --> 00:05:37,680
So basically, until unless you don't have unsign data, Kanal will not do anything that that's why

78
00:05:37,680 --> 00:05:38,150
it is known.

79
00:05:38,160 --> 00:05:48,180
And that's why this Donat Lazy Learning algorithm or or I can see this, this algorithm that as you

80
00:05:48,180 --> 00:05:52,410
will learn the basics of what is linear, what is, what is the decision.

81
00:05:52,410 --> 00:05:54,660
So they are trying to build some kind of model.

82
00:05:54,840 --> 00:06:01,650
But if I will talk about the scannon it now tries to build model it now or tries to build models of

83
00:06:01,650 --> 00:06:08,730
what it will do, it just calculate or I can see it, just try to check, distance it, just try to

84
00:06:08,730 --> 00:06:14,760
check the distance between your new data, whatever new data you have, or you can say whatever and

85
00:06:14,760 --> 00:06:15,780
see that you have.

86
00:06:16,050 --> 00:06:23,460
And basically on based off this distances, it is able to give you some kind of prediction because you

87
00:06:23,460 --> 00:06:29,310
will see in our use case, we have used something known as Euclidean distance for basically using this

88
00:06:29,310 --> 00:06:30,420
Euclidean distance.

89
00:06:30,420 --> 00:06:32,400
It is going to give you some kind of prediction.

90
00:06:32,400 --> 00:06:34,260
Yeah, it belongs to this class.

91
00:06:35,190 --> 00:06:44,100
So basically, I can see basically in this canon, in this can it in this algorithm, my entire dataset,

92
00:06:44,940 --> 00:06:53,850
my entire dataset itself itself is a model, my entire dataset itself is a model.

93
00:06:54,090 --> 00:07:01,710
So if my data would be very huge, then model will be huge and complexity would be huge and it will

94
00:07:01,710 --> 00:07:03,930
take more time to do prediction.

95
00:07:04,260 --> 00:07:09,660
That's why that's why this canon is not advisable.

96
00:07:09,660 --> 00:07:17,700
If you have if you had huge amount of data, let's say let's say in terms of Greece, in terms of let's

97
00:07:17,700 --> 00:07:23,690
say at terabytes or in terms of petabytes or you have some huge amount of data in some databases collected,

98
00:07:23,730 --> 00:07:28,110
your data is stored in some of the big data frameworks, let's say in some al-bakhit.

99
00:07:28,140 --> 00:07:30,320
Let's Appadurai, what about a bit?

100
00:07:30,660 --> 00:07:38,040
So whenever you have such a huge amount of data at that type of scenario, this Giana isn't recommended

101
00:07:38,040 --> 00:07:42,470
because this entire data will be considered as as a model.

102
00:07:43,170 --> 00:07:49,200
So if you have that much complex model, it makes no sense at all of this model.

103
00:07:49,380 --> 00:07:53,070
That is why it is not recommended whenever you have huge amount.

104
00:07:53,700 --> 00:07:59,870
So that's all about the session in the upcoming session, we are going to learn how exactly we can compute

105
00:07:59,910 --> 00:08:06,870
distance between categorical data and we are also going to learn how this cannon is used in case of

106
00:08:06,870 --> 00:08:09,960
regression and different expect as well.

107
00:08:10,320 --> 00:08:11,440
So that's all about it.

108
00:08:11,460 --> 00:08:13,200
Hopefully you'll love the session very much.

109
00:08:13,470 --> 00:08:14,160
Thank you.

110
00:08:14,230 --> 00:08:15,150
How should I stay?

111
00:08:15,420 --> 00:08:16,320
Keep learning.

112
00:08:16,320 --> 00:08:17,100
Keep growing.

113
00:08:17,190 --> 00:08:18,120
Keep practicing.