1 00:00:00,166 --> 00:00:01,000 Hello my friends. 2 00:00:01,000 --> 00:00:03,400 Welcome to this new practical activity 3 00:00:03,400 --> 00:00:08,033 and the first practical activity actually of part three classification. 4 00:00:08,266 --> 00:00:11,266 I'm super excited as you might hear by the sound of my voice, 5 00:00:11,266 --> 00:00:13,633 because this is one of my favorite parts. 6 00:00:13,633 --> 00:00:15,000 And actually you will see that 7 00:00:15,000 --> 00:00:19,466 the case study we will do together is pretty fun, so I can't wait to start. 8 00:00:19,466 --> 00:00:20,400 Welcome back. 9 00:00:20,400 --> 00:00:21,600 And so now here we go. 10 00:00:21,600 --> 00:00:25,400 We are about to enter a new branch of machine learning where 11 00:00:25,433 --> 00:00:29,166 this time we want predict a continuous numerical value 12 00:00:29,166 --> 00:00:30,933 like in part two regression. 13 00:00:30,933 --> 00:00:33,666 But this time we will predict a category. 14 00:00:33,666 --> 00:00:38,866 You know, a class like for example, you know, a binary variable 0 or 1, one. 15 00:00:38,866 --> 00:00:41,933 The classic example is actually to do classification to predict 16 00:00:42,033 --> 00:00:45,466 a tumor, you know, whether a tumor is benign or malignant. 17 00:00:45,466 --> 00:00:48,666 And this is actually the case that we will do at the end, you know, 18 00:00:48,666 --> 00:00:52,700 when I will teach you on how to select the best classification model, 19 00:00:52,866 --> 00:00:54,333 but that will come at the end. 20 00:00:54,333 --> 00:00:56,400 First, we will do a fun case study 21 00:00:56,400 --> 00:01:00,233 with which we will learn how to build each of our classification models. 22 00:01:00,466 --> 00:01:01,800 And there you go my friends. 23 00:01:01,800 --> 00:01:06,000 It is in this tutorial that I will explain the problem of this case study. 24 00:01:06,666 --> 00:01:09,866 All right, so before we start, let's just make sure once again 25 00:01:09,866 --> 00:01:12,100 everyone here is on the same page. 26 00:01:12,100 --> 00:01:16,100 So right before this tutorial I give you the link to this folder 27 00:01:16,100 --> 00:01:19,133 containing all the codes and data sets in the ten parts. 28 00:01:19,266 --> 00:01:21,533 So make sure to connect to that link. 29 00:01:21,533 --> 00:01:24,300 And now now we should all be on the same page. 30 00:01:24,300 --> 00:01:25,200 And there you go. 31 00:01:25,200 --> 00:01:28,200 We're going to go into part three classification 32 00:01:28,300 --> 00:01:32,966 to tackle our first model logistic regression. 33 00:01:32,966 --> 00:01:37,533 I hope you enjoyed the intuition lectures and mostly that you are now ready 34 00:01:37,666 --> 00:01:40,966 to put your well learned theory into practice. 35 00:01:40,966 --> 00:01:45,166 And we're going to put it into practice first with Python, with which 36 00:01:45,166 --> 00:01:49,300 we're going to re-implement from scratch and step by step 37 00:01:49,300 --> 00:01:52,400 the whole logistic regression implementation. 38 00:01:52,800 --> 00:01:55,800 So as you can see in this Python folder, you have two files. 39 00:01:55,900 --> 00:02:00,633 First, well that logistic regression implementation in ipynb format 40 00:02:00,633 --> 00:02:04,600 which you can open with either Google Colaboratory or Jupyter Notebook, 41 00:02:04,966 --> 00:02:10,166 and you have the data set which is called social network ads. 42 00:02:10,366 --> 00:02:11,533 So let's open it. 43 00:02:11,533 --> 00:02:16,100 And now let me explain what the problem is about. 44 00:02:16,800 --> 00:02:17,333 All right. 45 00:02:17,333 --> 00:02:20,500 So let's imagine our favorite car company. 46 00:02:20,500 --> 00:02:24,266 I won't mention name here because I don't want to do any kind of advertising. 47 00:02:24,433 --> 00:02:27,633 But let's imagine your favorite car company. 48 00:02:27,800 --> 00:02:31,600 And let's imagine that you are a data scientist for that company. 49 00:02:32,133 --> 00:02:36,066 And your mission, should you choose to accept it, is to predict 50 00:02:36,166 --> 00:02:39,700 which of your previous customers will buy 51 00:02:39,933 --> 00:02:42,933 a brand new, beautiful SUV 52 00:02:42,933 --> 00:02:46,266 just created by your favorite car company. 53 00:02:46,700 --> 00:02:49,600 All right, so your favorite car company has just released 54 00:02:49,600 --> 00:02:53,000 this brand new, beautiful, irresistible SUV. 55 00:02:53,333 --> 00:02:56,766 And the general manager of this car company has asked you, 56 00:02:56,800 --> 00:02:59,733 you know, the most talented data scientists of the company 57 00:02:59,733 --> 00:03:03,600 to predict which customers will buy that new SUV. 58 00:03:03,600 --> 00:03:06,133 You know, with the highest conversion rate. 59 00:03:06,133 --> 00:03:07,800 And to help you, because, you know, 60 00:03:07,800 --> 00:03:11,566 this general manager has some minimum data science skills and knows 61 00:03:11,733 --> 00:03:15,700 and understands that in order to predict this, you need data right? 62 00:03:15,700 --> 00:03:19,500 Your data on which to train your classification model 63 00:03:19,666 --> 00:03:22,566 to predict what needs to be predicted, meaning 64 00:03:22,566 --> 00:03:25,566 which customers will buy that brand new SUV. 65 00:03:25,966 --> 00:03:26,733 And so there you go. 66 00:03:26,733 --> 00:03:30,833 That's exactly the data that your general manager gave you. 67 00:03:30,966 --> 00:03:32,266 And then this data set. 68 00:03:32,266 --> 00:03:36,600 Well, each row corresponds to different customers. 69 00:03:36,933 --> 00:03:39,300 And for each of these customers, there you go. 70 00:03:39,300 --> 00:03:42,200 I'm about to reveal the features and the dependent variable 71 00:03:42,200 --> 00:03:44,066 for each of these customers. 72 00:03:44,066 --> 00:03:47,066 Well, this general manager collected the age 73 00:03:47,333 --> 00:03:49,866 and it collected the estimated salary. 74 00:03:49,866 --> 00:03:53,566 Because, you know, when a customer buys a new car with some kind of credit 75 00:03:53,566 --> 00:03:57,266 or whatever, well, it has to provide estimated salary in the form. 76 00:03:57,400 --> 00:03:59,800 So that's how it got the estimated salary. 77 00:03:59,800 --> 00:04:02,533 And finally, that's your dependent variable. 78 00:04:02,533 --> 00:04:06,500 Of course, the purchased variable telling whether or not 79 00:04:06,666 --> 00:04:09,666 these customers have bought previously 80 00:04:09,700 --> 00:04:13,200 some older SUVs of this car company. Right. 81 00:04:13,200 --> 00:04:18,200 So this car company has basically many makes and models of SUVs and all the 82 00:04:18,200 --> 00:04:18,733 zeros. 83 00:04:18,733 --> 00:04:23,066 And the ones that you see here for each customer is seeing whether or not 84 00:04:23,066 --> 00:04:26,700 these customers have bought one of these previous SUV 85 00:04:26,933 --> 00:04:30,266 so that your model will be trained on this data set. 86 00:04:30,500 --> 00:04:34,200 And for new customers, you know, having a different age 87 00:04:34,200 --> 00:04:36,066 and a different estimated salary. 88 00:04:36,066 --> 00:04:40,633 Well, we will predict if yes or no, they will buy that new SUV. 89 00:04:41,166 --> 00:04:41,900 Okay. 90 00:04:41,900 --> 00:04:44,900 So of course, in this dependent variable purchased 91 00:04:45,000 --> 00:04:48,366 zero means that the customer didn't buy any previous SUV. 92 00:04:48,600 --> 00:04:52,966 And one means that the customer but some previous SUV's all right. 93 00:04:53,100 --> 00:04:57,466 And therefore all the future predictions that will be equal to one 94 00:04:57,600 --> 00:05:02,600 will probably mean that the customer has a high chance to buy the new SUV. 95 00:05:02,800 --> 00:05:04,966 If, of course, we offer a great deal. 96 00:05:04,966 --> 00:05:09,600 And finally, once we predict the customers that are going to buy the SUV, 97 00:05:09,733 --> 00:05:13,833 well, the final step of the strategy will be for the advertising team 98 00:05:13,833 --> 00:05:17,700 to post ads of this brand new SUV on social networks, 99 00:05:17,933 --> 00:05:22,166 and these ads will be targeted to the customers where we predict one, 100 00:05:22,166 --> 00:05:25,433 you know, where we predict that they're going to buy that new SUV. 101 00:05:25,600 --> 00:05:25,933 All right. 102 00:05:25,933 --> 00:05:29,866 So you see the idea, the predictive model will target your customers. 103 00:05:30,033 --> 00:05:33,800 And then the advertising team will use the results of this predictive model 104 00:05:33,800 --> 00:05:37,333 to optimize the targeting of future customers. 105 00:05:37,333 --> 00:05:38,366 And that is why, 106 00:05:38,366 --> 00:05:42,733 you know, the name of the data set is called Social Network AD dot CSV.