1 00:00:00,590 --> 00:00:04,250 In this section, I will introduce you to machine learning. 2 00:00:05,510 --> 00:00:07,570 You must have heard of it or read about it. 3 00:00:08,750 --> 00:00:16,310 But if you are not clear that what it really is or do you know about it and bits and pieces, this section 4 00:00:16,430 --> 00:00:20,180 should help you put things in perspective and build some intuition around it. 5 00:00:22,460 --> 00:00:24,050 So let us first answer the question. 6 00:00:24,140 --> 00:00:25,280 What is machine learning? 7 00:00:26,610 --> 00:00:33,930 Machine learning is programming the computer to optimize that is to maximize or minimize the performance 8 00:00:33,930 --> 00:00:42,630 criteria based on past data, even more simply, based on past data, your machine will improve its 9 00:00:42,630 --> 00:00:44,790 effectiveness of performing a task. 10 00:00:46,700 --> 00:00:51,590 Below is an analogy of how machine learning is similar to human learning. 11 00:00:52,280 --> 00:00:56,210 Humans observe the actions and outcomes and keep them in memory. 12 00:00:57,110 --> 00:00:59,420 This learning turns into a skill for a human. 13 00:01:00,620 --> 00:01:07,520 Similarly, when we will give postictal to our machine, our computer will learn from that data and 14 00:01:07,520 --> 00:01:13,400 start performing better, and if its job is to predict, it will start predicting better. 15 00:01:14,740 --> 00:01:17,730 You can probably imagine why everyone is talking about it. 16 00:01:18,820 --> 00:01:24,370 Two things which have given it further boost are the ability of today's machines to handle huge datasets. 17 00:01:25,360 --> 00:01:28,470 And the second thing is the ability to collect so much data. 18 00:01:29,890 --> 00:01:36,280 Big companies like Google and Facebook have so much individual data with them, even political parties 19 00:01:36,280 --> 00:01:40,290 are collecting data to create effective campaigns and win elections. 20 00:01:42,180 --> 00:01:47,250 Good government are collecting and using data to create effective public schemes and meaningful policies. 21 00:01:49,380 --> 00:01:55,080 Within organizations, also, if you are a working professional, you will know that at lower levels, 22 00:01:55,230 --> 00:01:59,610 only basic data analysis is being done and that to poor is reporting. 23 00:02:00,000 --> 00:02:05,730 But at higher levels, predictive and prescriptive models are being built to help in decision making 24 00:02:05,730 --> 00:02:07,710 and business strategy formulation. 25 00:02:09,660 --> 00:02:14,310 Machine learning is a tool which is helping in creating this effective prediction models. 26 00:02:16,190 --> 00:02:20,190 There are few terms which are being used interchangeably these days. 27 00:02:20,880 --> 00:02:23,190 However, they have a minor difference between them. 28 00:02:24,820 --> 00:02:31,270 The difference between machine learning and statistics is that statistics estimates and it provides 29 00:02:31,270 --> 00:02:34,620 us with some mathematical models to analyze the data. 30 00:02:35,990 --> 00:02:39,250 But machine learning may not be based on a statistical model. 31 00:02:40,200 --> 00:02:47,280 Most of the times it is, but it is not necessary, and secondly, one major part of machine learning 32 00:02:47,280 --> 00:02:48,030 is programming. 33 00:02:49,870 --> 00:02:56,010 Which is why in this course, we will pick up one statistical model and then learn how to do it in popular 34 00:02:56,020 --> 00:02:57,360 data programming software. 35 00:02:58,530 --> 00:03:04,770 When we compare artificial intelligence and machine learning, you can think of it like machine learning 36 00:03:04,770 --> 00:03:12,140 is a part of artificial intelligence, the part where intelligence is being gathered by learning from 37 00:03:12,150 --> 00:03:13,650 experience of postictal. 38 00:03:14,690 --> 00:03:20,330 Comparing machine learning and data mining is difficult, you can think of data mining as a tool to 39 00:03:20,330 --> 00:03:26,290 be used by people to gather in process data, machine learning and also be part of this process. 40 00:03:27,620 --> 00:03:32,840 But there'll be some person doing data mining to achieve a particular goal, whereas machine learning 41 00:03:32,840 --> 00:03:39,390 can be a standalone organism performing a particular task effectively, let us see some examples. 42 00:03:39,410 --> 00:03:46,070 Now it is said that people uses machine learning to predict fraudulent transactions and prevent them 43 00:03:46,070 --> 00:03:46,730 before it happens. 44 00:03:47,270 --> 00:03:50,150 Similarly, this is something every one of us would have seen. 45 00:03:50,870 --> 00:03:57,170 Google is able to predict what we want to search versus what people around us are searching or is our 46 00:03:57,170 --> 00:04:00,110 own search history and numerous other predictors. 47 00:04:02,420 --> 00:04:08,420 There are several other use cases of machine learning, which I have categorized as the industry in 48 00:04:08,420 --> 00:04:14,870 banking, telecom and retail sector, companies are using machine learning to identify the prospective 49 00:04:15,020 --> 00:04:19,980 customers, good or bad customers, dissatisfaction level of each customer. 50 00:04:20,800 --> 00:04:25,460 They are also using machine learning to effectively use their advertising budget so that within the 51 00:04:25,460 --> 00:04:28,760 same budget they can get the maximum number of sales. 52 00:04:29,480 --> 00:04:36,620 They're using it to decrease their credit risk by identifying the good customer and then assigning appropriate 53 00:04:36,620 --> 00:04:38,040 credit limit to that customer. 54 00:04:38,450 --> 00:04:44,450 As I told you, PayPal is using it to identify fraud, so that will lead to fewer fraud. 55 00:04:45,940 --> 00:04:52,270 Machine learning can be used to identify potential traders, the customers which are going to stop using 56 00:04:52,270 --> 00:04:53,470 your product or service. 57 00:04:56,100 --> 00:05:02,520 And biomedical and biometrics machine learning is being used to diagnose a particular disease more effectively 58 00:05:03,360 --> 00:05:09,690 or for drug discovery, that is, to identify which drug works more effectively on which particular 59 00:05:09,690 --> 00:05:10,230 disease. 60 00:05:11,480 --> 00:05:18,200 And biometrics, you must have realized your facial recognition software in your mobile phones or your 61 00:05:18,200 --> 00:05:22,720 fingerprinting device in your mobile phones is getting better day by day. 62 00:05:23,820 --> 00:05:27,240 That is due to some machine learning algorithm running behind it. 63 00:05:29,140 --> 00:05:35,230 Computer and Internet have been at the forefront of machine learning, you may identify these devices, 64 00:05:35,260 --> 00:05:43,070 Google Home, Amazon Echo, they all have in machine learning algorithms and these boards such as CD, 65 00:05:43,540 --> 00:05:47,200 Alexa, all these have machine learning running behind it. 66 00:05:49,080 --> 00:05:57,030 Your Gmail can automatically filter spam mails, Google can automatically identify the language of every 67 00:05:57,030 --> 00:06:03,090 website, and so just as to translate it all, these are examples of machine learning. 68 00:06:05,240 --> 00:06:10,650 Now, let us enter the world of machine learning based on statistics here. 69 00:06:10,670 --> 00:06:17,270 Our agenda will be to find the relationship between input and output variables versus historic data. 70 00:06:18,690 --> 00:06:24,720 For example, consider a real estate agent who wants to put a price on a particular property. 71 00:06:26,410 --> 00:06:30,610 So the output variable for this person is going to be the price of the property. 72 00:06:32,070 --> 00:06:38,750 Next, this agent has to immediately decide which factors may be affecting the price of the property, 73 00:06:39,780 --> 00:06:46,530 some of them, which I could think of as the area covered number of bedrooms, proximity to landmark, 74 00:06:46,530 --> 00:06:53,820 proximity to market and so on, mathematically, the agent wants to establish price as a function of 75 00:06:54,030 --> 00:06:54,960 these variables. 76 00:06:57,400 --> 00:07:01,300 There can be two motivations behind estimating this function. 77 00:07:03,760 --> 00:07:10,660 One is prediction, as we saw, for a real estate agent who just wants to predict the value of the property 78 00:07:11,260 --> 00:07:17,290 prediction means we are just interested in getting the value of Y and not in the relationship of Y with 79 00:07:17,290 --> 00:07:18,550 each individual variable. 80 00:07:20,450 --> 00:07:27,950 Second, this inference here, we want to establish the relationship between each input, variable and 81 00:07:27,950 --> 00:07:32,830 output so that we know how output will change when we manipulate input. 82 00:07:34,590 --> 00:07:41,340 Suppose instead of a real estate agent, there was a builder, that person would probably like to know 83 00:07:41,340 --> 00:07:46,830 whether a building near a market, which is a better place or building near a school, that is a better 84 00:07:46,830 --> 00:07:47,230 place. 85 00:07:48,390 --> 00:07:52,450 So that guy's motivation would be inference and not just prediction. 86 00:07:55,050 --> 00:07:56,010 Is a motivation. 87 00:07:56,730 --> 00:08:01,530 Whether we want to infer or whether we want to predict, we will choose the model for analysis. 88 00:08:02,190 --> 00:08:08,310 Because when we want to predict, we want your prediction to be accurate and we do not care if we understand 89 00:08:08,310 --> 00:08:09,200 the model or not. 90 00:08:10,980 --> 00:08:17,100 There are some models which will develop better, however, are difficult to interpret such models, 91 00:08:17,190 --> 00:08:18,150 such prediction. 92 00:08:20,490 --> 00:08:24,900 But there are some models which may not predict as accurately. 93 00:08:25,290 --> 00:08:29,210 But we will easily make out the relationship of variables from these models. 94 00:08:29,670 --> 00:08:33,900 That is, they have more and such models. 95 00:08:33,930 --> 00:08:34,950 So inference. 96 00:08:36,170 --> 00:08:38,520 So is motivation will choose the model. 97 00:08:41,120 --> 00:08:47,720 Now there are two major classifications of models, some models that parametric in nature and some are 98 00:08:47,750 --> 00:08:55,610 non parametric in a parametric approach, we assume a functional form of relationship between input 99 00:08:55,610 --> 00:08:56,760 and output variables. 100 00:08:58,280 --> 00:09:06,140 For example, if you think that there is a linear relationship, we have already assigned dysfunctional 101 00:09:06,140 --> 00:09:09,110 form to the relationship. 102 00:09:10,400 --> 00:09:17,420 Now we will find the best values of Izidor, even a do so that the predictive values of praise are as 103 00:09:17,420 --> 00:09:18,830 close to the actual values. 104 00:09:20,710 --> 00:09:27,110 But in a non parametric method, we do not assign a functional form to this relationship. 105 00:09:28,780 --> 00:09:34,100 The functional form itself is estimated by the model and it could be quite complex. 106 00:09:36,040 --> 00:09:43,780 For example, in this image, a three dimensional plane is first drawn by the model, which is as close 107 00:09:43,780 --> 00:09:51,790 to the point and has a smooth surface, will later identify the function at each of these point, maybe 108 00:09:51,790 --> 00:09:55,570 very complex, and it may be very difficult to interpret that. 109 00:09:55,570 --> 00:10:00,580 What is the relationship between each input variable and the predicted output? 110 00:10:02,110 --> 00:10:07,990 So as you can imagine, it is going to be easy to interpret the result of parametric model. 111 00:10:08,910 --> 00:10:11,930 However, it may or may not fit the data as well. 112 00:10:13,460 --> 00:10:18,110 But non parametric model will be potentially able to fit the data better. 113 00:10:18,980 --> 00:10:24,440 However, it will be very difficult to interpret the relationship among the variables also known. 114 00:10:24,440 --> 00:10:28,040 Parametric approaches usually require a larger amount of data to train. 115 00:10:29,050 --> 00:10:31,210 So once you know what your motivation is. 116 00:10:32,120 --> 00:10:39,020 You know, the type of model we use now, there are two types of learning besides the data we have. 117 00:10:40,350 --> 00:10:48,180 When your data has a particular output variable and one or more input variables and model will learn 118 00:10:48,180 --> 00:10:56,670 from this data that what values of input give, what output, we have seen the examples of this type 119 00:10:56,670 --> 00:10:57,380 of data. 120 00:10:58,740 --> 00:11:01,500 And this type of lending is called supervised learning. 121 00:11:02,570 --> 00:11:09,050 However, if you do not have any outward variable, but you just have a set of variables and your model 122 00:11:09,050 --> 00:11:14,300 is learning the relationship between these variables, it is called unsupervised learning. 123 00:11:15,610 --> 00:11:21,270 This basically helps us find the underlying structures and patterns in our data. 124 00:11:23,010 --> 00:11:29,940 Let us look at these two types of learnings from an illustration, suppose we have this data, we have 125 00:11:29,940 --> 00:11:33,210 four images and corresponding levels against each. 126 00:11:35,050 --> 00:11:41,260 Since these images will be our input data and these levels will be output data, this is supervised 127 00:11:41,260 --> 00:11:41,680 learning. 128 00:11:44,790 --> 00:11:48,030 Now, let's say that the labels are category and we. 129 00:11:49,350 --> 00:11:50,940 Give this data to a learning model. 130 00:11:52,920 --> 00:12:01,350 Now, if I present this image to our model and ask what category it belongs to, this becomes a classification 131 00:12:01,350 --> 00:12:01,860 problem. 132 00:12:02,430 --> 00:12:08,880 The model has to learn from previous data and classify the newly presented data into a particular category. 133 00:12:09,720 --> 00:12:14,310 So this is a supervised learning and the problem is a classification problem. 134 00:12:16,430 --> 00:12:19,730 Another example of classification setting is credit scoring. 135 00:12:21,120 --> 00:12:28,740 If you want to classify a customer as high risk or low risk versus their income and savings. 136 00:12:30,020 --> 00:12:32,510 I can bring my model from a bastida. 137 00:12:33,350 --> 00:12:40,460 And maybe get a result like this where I have a particular value of income and savings above which I 138 00:12:40,460 --> 00:12:47,110 can categorize the customer as a low risk customer or else the customer belongs to a high risk category. 139 00:12:48,020 --> 00:12:53,540 So I am able to classify the customer into these two categories on a graph. 140 00:12:53,550 --> 00:12:55,040 This will look something like this. 141 00:12:56,150 --> 00:13:04,220 The customer's belonging to this area will be low risk customers since they will have income more than 142 00:13:04,220 --> 00:13:08,030 a particular value and savings more than some other particular value. 143 00:13:09,170 --> 00:13:13,690 The customers which are belonging to this region will be high risk. 144 00:13:14,420 --> 00:13:21,140 Other applications of classification problem that pattern recognition, face recognition, character 145 00:13:21,140 --> 00:13:24,950 recognition, medical diagnosis, web advertising. 146 00:13:26,540 --> 00:13:33,350 Now, instead of classifying if we are estimating the weight of product, which is a quantitative variable. 147 00:13:34,380 --> 00:13:36,330 This becomes an example of regression. 148 00:13:37,340 --> 00:13:39,980 Well, the model is based on supervised learning. 149 00:13:43,540 --> 00:13:49,390 Another example of immigration problem would be estimating the price of a used car. 150 00:13:50,550 --> 00:13:59,010 Here we will identify the attributes of God, which may affect the final price of that used car, and 151 00:13:59,010 --> 00:14:06,240 if we assume that there is a linear relationship between the price of data and the God attribute will 152 00:14:06,240 --> 00:14:10,890 have something like this bicycle to some function of X and some constant. 153 00:14:12,950 --> 00:14:19,220 We will use the past data to identify this leadership graphically, it will be represented like a straight 154 00:14:19,220 --> 00:14:25,190 line, and these are different data points that we have from our past data. 155 00:14:26,630 --> 00:14:33,680 And using this line and putting the card attribute, we can predict the price of some other used car. 156 00:14:34,370 --> 00:14:41,210 Other applications of regression analysis is weather forecasting, sales forecasting, advertising, 157 00:14:41,210 --> 00:14:43,370 budget allocation or product pricing. 158 00:14:46,360 --> 00:14:49,510 There are several algorithms under supervised learning. 159 00:14:51,340 --> 00:14:56,890 Most popular linear regression, logistic regression, artificial neural networks. 160 00:14:57,960 --> 00:14:59,930 Dangerous neighbors and so on. 161 00:15:03,590 --> 00:15:11,360 In unsupervised learning, we do not have any outward variable, we will just provide these five images 162 00:15:11,360 --> 00:15:19,240 and ask the model to group these very similar properties, maybe pieces of duct tape. 163 00:15:19,250 --> 00:15:20,780 It will make two categories. 164 00:15:22,100 --> 00:15:28,700 Of apples and bananas, maybe this is of the color, it will make three categories of red, green and 165 00:15:28,700 --> 00:15:33,200 yellow, or it may find some other important characteristic which I am missing. 166 00:15:35,430 --> 00:15:39,780 Here is a list of some popular unsupervised learning algorithms. 167 00:15:41,130 --> 00:15:48,220 What lasting one of the popular ones is gaming's, another popular unsupervised learning algorithm is 168 00:15:48,580 --> 00:15:53,800 that analysis or principal component analysis which is used in dimension reduction. 169 00:15:55,070 --> 00:16:01,180 And there are some other algorithms also, these algorithms will not be part of this cause.