1
00:00:02,050 --> 00:00:06,920
In this session, people understand about gradient descent.

2
00:00:08,390 --> 00:00:12,260
If you really see every machine learning or deep learning algorithm.

3
00:00:13,330 --> 00:00:15,370
Is an optimization exercise.

4
00:00:16,320 --> 00:00:16,950
That is.

5
00:00:18,640 --> 00:00:25,780
The algorithm tries to reduce the difference between predicted minus actual, which is nothing but a

6
00:00:25,780 --> 00:00:31,720
loss like last corresponds to the difference between predicted minus actual.

7
00:00:32,950 --> 00:00:36,820
In one night, if I take the average of.

8
00:00:37,770 --> 00:00:41,110
The loss of multiple operations, I have what is known as a cost function.

9
00:00:42,000 --> 00:00:48,510
So the algorithm tries to minimize this cost because I want my algorithm to be.

10
00:00:50,430 --> 00:00:57,420
Having a result that is as close to the actual as possible, I want my forecast accuracy to be higher,

11
00:00:57,990 --> 00:01:05,010
which means the cost function should be minimal, the loss should be minimized.

12
00:01:06,330 --> 00:01:11,760
That means the difference between predicted and actual should be as low as possible.

13
00:01:12,540 --> 00:01:15,030
That is the optimization exercise.

14
00:01:15,570 --> 00:01:17,850
So how do you do this optimization exercise?

15
00:01:18,300 --> 00:01:19,470
Let's take an example.

16
00:01:21,320 --> 00:01:28,280
The insurance company is trying to predict whether an insurance policy should be issued to an individual

17
00:01:28,280 --> 00:01:31,850
or not, for that, six factors are considered.

18
00:01:33,350 --> 00:01:35,870
How will the insurance company determine?

19
00:01:36,950 --> 00:01:41,310
Which one should get an insurance or not or how old are going to go about its job?

20
00:01:42,110 --> 00:01:51,620
It goes about his job by identifying the relative importance or DVDs of the different factors once the

21
00:01:51,620 --> 00:01:55,370
relative importance of AIDS are known.

22
00:01:55,430 --> 00:01:58,610
It is easier to make the prediction right because.

23
00:01:59,510 --> 00:02:02,410
You want to know which factors is more important, right?

24
00:02:03,170 --> 00:02:08,760
It is nothing but the Vassil W1, W2, W3, so on and so forth.

25
00:02:09,440 --> 00:02:14,950
If I multiply the two, I have an output that will correspond to the prediction value.

26
00:02:14,960 --> 00:02:15,250
Right.

27
00:02:16,680 --> 00:02:23,790
If I multiply the two, I will have a value that would correspond to the output, which is nothing but

28
00:02:23,790 --> 00:02:24,640
the predicted value.

29
00:02:25,290 --> 00:02:29,460
What I do is I multiply the the factors and DVDs.

30
00:02:29,580 --> 00:02:30,480
And then I saw.

31
00:02:31,430 --> 00:02:38,580
Right, I will then compare against the actual also the actual is taken from the historical data once

32
00:02:38,580 --> 00:02:41,290
they make a comparison, I know the error or the loss, right.

33
00:02:42,140 --> 00:02:45,470
And that feedback is fed into the algorithm.

34
00:02:45,770 --> 00:02:49,670
So the algorithm will try a new set of tricks like this.

35
00:02:49,880 --> 00:02:58,490
Multiple iterations will happen and the iteration where the Lozar error is minimal.

36
00:02:59,620 --> 00:03:02,120
Is taken as the final set of rapes.

37
00:03:03,610 --> 00:03:05,290
But are you getting it?

38
00:03:06,990 --> 00:03:14,490
So this is how the world of algorithms operate, the algorithm wants to find the reality with.

39
00:03:15,710 --> 00:03:22,710
It takes feedback after assigning random set of weights by comparing between actual authors predict.

40
00:03:23,600 --> 00:03:27,200
This is what happens in an optimization exercise.

41
00:03:29,310 --> 00:03:33,300
So what is this gradient descent then, let's say?

42
00:03:34,300 --> 00:03:37,660
You are at this point in this mountain and you have to calm down.

43
00:03:38,580 --> 00:03:44,160
Let's also assume that you are like a computer robot, because if you are an individual, you obviously

44
00:03:44,160 --> 00:03:46,820
know that you have to come down from this point.

45
00:03:47,370 --> 00:03:52,370
You have to obviously come down because you are a computer, right?

46
00:03:52,710 --> 00:03:56,840
You don't know whether to come down like this or you have to go up and then come down.

47
00:03:59,220 --> 00:04:03,810
So the gradient descent algorithm helps us to make those decisions.

48
00:04:04,210 --> 00:04:12,180
OK, because what is important is whether you should come down or go up and then come down and how big

49
00:04:12,180 --> 00:04:13,260
a step you need to pick.

50
00:04:14,200 --> 00:04:14,590
Right.

51
00:04:15,760 --> 00:04:23,520
Are you getting it so for this gradient descent algorithm uses the concept of derivatives, because

52
00:04:23,520 --> 00:04:30,270
what is a derivative of a derivative is nothing but change in way divided by change in X?

53
00:04:33,530 --> 00:04:39,650
It is nothing but slope, and this is nothing but the derivative that we use in calculus.

54
00:04:40,880 --> 00:04:49,280
Right, the distance reduces when this gradient descent is negative, so we actually compute the negative

55
00:04:49,280 --> 00:04:56,390
of the gradient, which is nothing but the opposite of gradient value, the negative of way by the X

56
00:04:56,390 --> 00:04:58,070
is what we consider.

57
00:04:59,730 --> 00:05:02,580
Let us go back to the mundane example, right?

58
00:05:02,610 --> 00:05:04,870
So we are here, we need to come down or go up, right.

59
00:05:05,550 --> 00:05:10,530
This is similar to the Cole right here in water, the coal.

60
00:05:11,130 --> 00:05:17,370
And let's say you are here, you don't know whether to take small steps are bigger steps.

61
00:05:18,090 --> 00:05:21,120
If you are closer to the goal, you don't have to take bigger steps.

62
00:05:21,120 --> 00:05:21,410
Right.

63
00:05:22,080 --> 00:05:23,190
How will you decide that?

64
00:05:23,190 --> 00:05:27,720
Or rather, how will the computer know the computer uses the concept of the website?

65
00:05:28,740 --> 00:05:33,250
Because when you're closer to your goal, the slope will be much steeper, right?

66
00:05:34,870 --> 00:05:42,190
Are you getting it when you are closer to Goldust, slope will be much steeper and hence you know that

67
00:05:42,190 --> 00:05:43,870
you need to take only small steps.

68
00:05:45,200 --> 00:05:49,610
When you are away from the goal, you need to take larger steps.

69
00:05:50,550 --> 00:05:56,940
So that's how the gradient descent algorithm helps to make the decision right.

70
00:05:57,980 --> 00:06:05,030
The individual or the computer robot right on the computer program will use the concept of gradient

71
00:06:05,030 --> 00:06:12,620
descent so that it knows whether it needs to go up or go down and whether it needs to take bigger steps

72
00:06:12,620 --> 00:06:13,610
or small steps.

73
00:06:14,600 --> 00:06:20,140
And this bigger step and smaller steps will correspond to the different rates that we are talking about.

74
00:06:21,730 --> 00:06:24,210
Are you able to appreciate the correlational?

75
00:06:25,450 --> 00:06:25,790
Right.

76
00:06:26,620 --> 00:06:33,700
So this process of giving feedback, like we assign different weights, compute the sum and compute

77
00:06:33,700 --> 00:06:41,290
the difference between actual versus predicted, and we passed this feedback so that another round of

78
00:06:41,290 --> 00:06:42,370
iteration can be done.

79
00:06:42,940 --> 00:06:46,210
So this entire process is known as back propagation.

80
00:06:47,210 --> 00:06:55,220
The the activities that comprise these steps are assigning weights, computing the sum and computing

81
00:06:55,220 --> 00:06:57,860
the loss and providing feedback.

82
00:06:57,890 --> 00:07:03,830
This is known as hidden layer because this is hidden from the input, because we see the input and we

83
00:07:03,830 --> 00:07:04,400
see the output.

84
00:07:04,400 --> 00:07:04,690
Right.

85
00:07:05,630 --> 00:07:07,760
We don't get to see what is going on inside.

86
00:07:07,760 --> 00:07:09,890
And hence it is known as hidden layer.

87
00:07:10,670 --> 00:07:15,470
So this this is what happens in a deep learning neural network also.

88
00:07:16,820 --> 00:07:17,190
Right.

89
00:07:17,720 --> 00:07:26,060
So every machine learning or deep learning algorithm involves an optimization exercise where you minimize

90
00:07:26,840 --> 00:07:30,200
the loss, which is nothing but the difference between actual losses predicting.

91
00:07:32,940 --> 00:07:37,170
So if you are wondering whether you ought to do all these steps, whether you have to go through multiple

92
00:07:37,170 --> 00:07:43,260
iterations, whether you have the right programs, do not worry that pre-built libraries available and

93
00:07:43,260 --> 00:07:44,180
you can make use of them.

94
00:07:44,820 --> 00:07:45,110
Right.

95
00:07:45,300 --> 00:07:46,860
But you need to understand the concept.

96
00:07:47,550 --> 00:07:51,180
Only then, you know, you can appreciate the output.

97
00:07:51,730 --> 00:07:56,340
Only then you can value add in the entire algorithm development process.

98
00:07:57,680 --> 00:07:58,070
Claire.