1 00:00:02,050 --> 00:00:06,920 In this session, people understand about gradient descent. 2 00:00:08,390 --> 00:00:12,260 If you really see every machine learning or deep learning algorithm. 3 00:00:13,330 --> 00:00:15,370 Is an optimization exercise. 4 00:00:16,320 --> 00:00:16,950 That is. 5 00:00:18,640 --> 00:00:25,780 The algorithm tries to reduce the difference between predicted minus actual, which is nothing but a 6 00:00:25,780 --> 00:00:31,720 loss like last corresponds to the difference between predicted minus actual. 7 00:00:32,950 --> 00:00:36,820 In one night, if I take the average of. 8 00:00:37,770 --> 00:00:41,110 The loss of multiple operations, I have what is known as a cost function. 9 00:00:42,000 --> 00:00:48,510 So the algorithm tries to minimize this cost because I want my algorithm to be. 10 00:00:50,430 --> 00:00:57,420 Having a result that is as close to the actual as possible, I want my forecast accuracy to be higher, 11 00:00:57,990 --> 00:01:05,010 which means the cost function should be minimal, the loss should be minimized. 12 00:01:06,330 --> 00:01:11,760 That means the difference between predicted and actual should be as low as possible. 13 00:01:12,540 --> 00:01:15,030 That is the optimization exercise. 14 00:01:15,570 --> 00:01:17,850 So how do you do this optimization exercise? 15 00:01:18,300 --> 00:01:19,470 Let's take an example. 16 00:01:21,320 --> 00:01:28,280 The insurance company is trying to predict whether an insurance policy should be issued to an individual 17 00:01:28,280 --> 00:01:31,850 or not, for that, six factors are considered. 18 00:01:33,350 --> 00:01:35,870 How will the insurance company determine? 19 00:01:36,950 --> 00:01:41,310 Which one should get an insurance or not or how old are going to go about its job? 20 00:01:42,110 --> 00:01:51,620 It goes about his job by identifying the relative importance or DVDs of the different factors once the 21 00:01:51,620 --> 00:01:55,370 relative importance of AIDS are known. 22 00:01:55,430 --> 00:01:58,610 It is easier to make the prediction right because. 23 00:01:59,510 --> 00:02:02,410 You want to know which factors is more important, right? 24 00:02:03,170 --> 00:02:08,760 It is nothing but the Vassil W1, W2, W3, so on and so forth. 25 00:02:09,440 --> 00:02:14,950 If I multiply the two, I have an output that will correspond to the prediction value. 26 00:02:14,960 --> 00:02:15,250 Right. 27 00:02:16,680 --> 00:02:23,790 If I multiply the two, I will have a value that would correspond to the output, which is nothing but 28 00:02:23,790 --> 00:02:24,640 the predicted value. 29 00:02:25,290 --> 00:02:29,460 What I do is I multiply the the factors and DVDs. 30 00:02:29,580 --> 00:02:30,480 And then I saw. 31 00:02:31,430 --> 00:02:38,580 Right, I will then compare against the actual also the actual is taken from the historical data once 32 00:02:38,580 --> 00:02:41,290 they make a comparison, I know the error or the loss, right. 33 00:02:42,140 --> 00:02:45,470 And that feedback is fed into the algorithm. 34 00:02:45,770 --> 00:02:49,670 So the algorithm will try a new set of tricks like this. 35 00:02:49,880 --> 00:02:58,490 Multiple iterations will happen and the iteration where the Lozar error is minimal. 36 00:02:59,620 --> 00:03:02,120 Is taken as the final set of rapes. 37 00:03:03,610 --> 00:03:05,290 But are you getting it? 38 00:03:06,990 --> 00:03:14,490 So this is how the world of algorithms operate, the algorithm wants to find the reality with. 39 00:03:15,710 --> 00:03:22,710 It takes feedback after assigning random set of weights by comparing between actual authors predict. 40 00:03:23,600 --> 00:03:27,200 This is what happens in an optimization exercise. 41 00:03:29,310 --> 00:03:33,300 So what is this gradient descent then, let's say? 42 00:03:34,300 --> 00:03:37,660 You are at this point in this mountain and you have to calm down. 43 00:03:38,580 --> 00:03:44,160 Let's also assume that you are like a computer robot, because if you are an individual, you obviously 44 00:03:44,160 --> 00:03:46,820 know that you have to come down from this point. 45 00:03:47,370 --> 00:03:52,370 You have to obviously come down because you are a computer, right? 46 00:03:52,710 --> 00:03:56,840 You don't know whether to come down like this or you have to go up and then come down. 47 00:03:59,220 --> 00:04:03,810 So the gradient descent algorithm helps us to make those decisions. 48 00:04:04,210 --> 00:04:12,180 OK, because what is important is whether you should come down or go up and then come down and how big 49 00:04:12,180 --> 00:04:13,260 a step you need to pick. 50 00:04:14,200 --> 00:04:14,590 Right. 51 00:04:15,760 --> 00:04:23,520 Are you getting it so for this gradient descent algorithm uses the concept of derivatives, because 52 00:04:23,520 --> 00:04:30,270 what is a derivative of a derivative is nothing but change in way divided by change in X? 53 00:04:33,530 --> 00:04:39,650 It is nothing but slope, and this is nothing but the derivative that we use in calculus. 54 00:04:40,880 --> 00:04:49,280 Right, the distance reduces when this gradient descent is negative, so we actually compute the negative 55 00:04:49,280 --> 00:04:56,390 of the gradient, which is nothing but the opposite of gradient value, the negative of way by the X 56 00:04:56,390 --> 00:04:58,070 is what we consider. 57 00:04:59,730 --> 00:05:02,580 Let us go back to the mundane example, right? 58 00:05:02,610 --> 00:05:04,870 So we are here, we need to come down or go up, right. 59 00:05:05,550 --> 00:05:10,530 This is similar to the Cole right here in water, the coal. 60 00:05:11,130 --> 00:05:17,370 And let's say you are here, you don't know whether to take small steps are bigger steps. 61 00:05:18,090 --> 00:05:21,120 If you are closer to the goal, you don't have to take bigger steps. 62 00:05:21,120 --> 00:05:21,410 Right. 63 00:05:22,080 --> 00:05:23,190 How will you decide that? 64 00:05:23,190 --> 00:05:27,720 Or rather, how will the computer know the computer uses the concept of the website? 65 00:05:28,740 --> 00:05:33,250 Because when you're closer to your goal, the slope will be much steeper, right? 66 00:05:34,870 --> 00:05:42,190 Are you getting it when you are closer to Goldust, slope will be much steeper and hence you know that 67 00:05:42,190 --> 00:05:43,870 you need to take only small steps. 68 00:05:45,200 --> 00:05:49,610 When you are away from the goal, you need to take larger steps. 69 00:05:50,550 --> 00:05:56,940 So that's how the gradient descent algorithm helps to make the decision right. 70 00:05:57,980 --> 00:06:05,030 The individual or the computer robot right on the computer program will use the concept of gradient 71 00:06:05,030 --> 00:06:12,620 descent so that it knows whether it needs to go up or go down and whether it needs to take bigger steps 72 00:06:12,620 --> 00:06:13,610 or small steps. 73 00:06:14,600 --> 00:06:20,140 And this bigger step and smaller steps will correspond to the different rates that we are talking about. 74 00:06:21,730 --> 00:06:24,210 Are you able to appreciate the correlational? 75 00:06:25,450 --> 00:06:25,790 Right. 76 00:06:26,620 --> 00:06:33,700 So this process of giving feedback, like we assign different weights, compute the sum and compute 77 00:06:33,700 --> 00:06:41,290 the difference between actual versus predicted, and we passed this feedback so that another round of 78 00:06:41,290 --> 00:06:42,370 iteration can be done. 79 00:06:42,940 --> 00:06:46,210 So this entire process is known as back propagation. 80 00:06:47,210 --> 00:06:55,220 The the activities that comprise these steps are assigning weights, computing the sum and computing 81 00:06:55,220 --> 00:06:57,860 the loss and providing feedback. 82 00:06:57,890 --> 00:07:03,830 This is known as hidden layer because this is hidden from the input, because we see the input and we 83 00:07:03,830 --> 00:07:04,400 see the output. 84 00:07:04,400 --> 00:07:04,690 Right. 85 00:07:05,630 --> 00:07:07,760 We don't get to see what is going on inside. 86 00:07:07,760 --> 00:07:09,890 And hence it is known as hidden layer. 87 00:07:10,670 --> 00:07:15,470 So this this is what happens in a deep learning neural network also. 88 00:07:16,820 --> 00:07:17,190 Right. 89 00:07:17,720 --> 00:07:26,060 So every machine learning or deep learning algorithm involves an optimization exercise where you minimize 90 00:07:26,840 --> 00:07:30,200 the loss, which is nothing but the difference between actual losses predicting. 91 00:07:32,940 --> 00:07:37,170 So if you are wondering whether you ought to do all these steps, whether you have to go through multiple 92 00:07:37,170 --> 00:07:43,260 iterations, whether you have the right programs, do not worry that pre-built libraries available and 93 00:07:43,260 --> 00:07:44,180 you can make use of them. 94 00:07:44,820 --> 00:07:45,110 Right. 95 00:07:45,300 --> 00:07:46,860 But you need to understand the concept. 96 00:07:47,550 --> 00:07:51,180 Only then, you know, you can appreciate the output. 97 00:07:51,730 --> 00:07:56,340 Only then you can value add in the entire algorithm development process. 98 00:07:57,680 --> 00:07:58,070 Claire.