1 00:00:00,100 --> 00:00:00,966 Hello, my friends. 2 00:00:00,966 --> 00:00:03,066 All right, let's see if you got this right. 3 00:00:03,066 --> 00:00:04,366 So what did we have to do? 4 00:00:04,366 --> 00:00:08,300 Well, first, the first natural step was to go to our data preprocessing. 5 00:00:08,300 --> 00:00:09,966 Template. To grab all the. 6 00:00:09,966 --> 00:00:11,766 Essential tools, which are indeed. 7 00:00:11,766 --> 00:00:14,900 Well, the first steps of the data preprocessing phase. 8 00:00:14,900 --> 00:00:15,900 So let's do this. 9 00:00:15,900 --> 00:00:18,900 That's the first thing you were supposed to do. 10 00:00:19,133 --> 00:00:21,500 So I'm copying this cell. 11 00:00:21,500 --> 00:00:24,400 And then I'm. Basing it right here. 12 00:00:24,400 --> 00:00:24,900 In a new. 13 00:00:24,900 --> 00:00:27,633 Code cell for importing the libraries. 14 00:00:27,633 --> 00:00:29,600 There we go. First step done. 15 00:00:29,600 --> 00:00:31,800 Now to import. The data set. 16 00:00:31,800 --> 00:00:34,466 Well same. Let's go to our data preprocessing. 17 00:00:34,466 --> 00:00:36,766 Template which indeed is meant. 18 00:00:36,766 --> 00:00:38,566 For us to be. Efficient. 19 00:00:38,566 --> 00:00:39,800 So I'm. 20 00:00:39,800 --> 00:00:41,033 Copying this. And then. 21 00:00:41,033 --> 00:00:43,266 We'll see what we have to replace. 22 00:00:43,266 --> 00:00:45,433 But once again it is. Supposed to help you so. 23 00:00:45,433 --> 00:00:47,266 That you don't have to do much. 24 00:00:47,266 --> 00:00:49,233 All right. So just pasted it. 25 00:00:49,233 --> 00:00:50,433 And let's do you know. 26 00:00:50,433 --> 00:00:54,000 Just this for the last step of the data preprocessing phase. 27 00:00:54,000 --> 00:00:55,366 You know, in any situation. 28 00:00:55,366 --> 00:00:57,000 Regardless of whether or not. 29 00:00:57,000 --> 00:00:58,566 Is needed, feature scaling. 30 00:00:58,566 --> 00:00:58,933 All right. 31 00:00:58,933 --> 00:01:02,400 So let's take that cell here. And. 32 00:01:02,766 --> 00:01:06,500 Let's copy that right here. 33 00:01:06,500 --> 00:01:08,533 In a new. Code cell. 34 00:01:08,533 --> 00:01:10,633 All right. So that's the first essential steps. 35 00:01:10,633 --> 00:01:14,200 You know, which have been most of the time when building a machine learning model. 36 00:01:14,300 --> 00:01:15,733 All right. So now next step. 37 00:01:15,733 --> 00:01:17,700 What do we have. To change inside. 38 00:01:17,700 --> 00:01:19,066 This data preprocessing phase. 39 00:01:19,066 --> 00:01:20,866 Well let's take it step by step. 40 00:01:20,866 --> 00:01:21,600 First step. 41 00:01:21,600 --> 00:01:23,833 Of course we. Don't have anything to change. 42 00:01:23,833 --> 00:01:24,566 Now let's see. 43 00:01:24,566 --> 00:01:26,533 Second step. Well, there you go. 44 00:01:26,533 --> 00:01:28,533 Of course. Here we have. To change the name of the. 45 00:01:28,533 --> 00:01:29,400 Data set. 46 00:01:29,400 --> 00:01:33,033 It is not data dot CSV, but it is now 47 00:01:33,033 --> 00:01:36,200 social network ads dot CSV. 48 00:01:36,200 --> 00:01:36,966 So let's do this. 49 00:01:36,966 --> 00:01:38,500 Let's just replace this. 50 00:01:38,500 --> 00:01:40,900 That was what you had to do next. 51 00:01:40,900 --> 00:01:43,833 Social network underscore. 52 00:01:43,833 --> 00:01:45,900 Ads dot CSV. 53 00:01:45,900 --> 00:01:47,533 And now next question. 54 00:01:47,533 --> 00:01:49,666 Do we have to change anything here. 55 00:01:49,666 --> 00:01:52,566 Well of course the answer is no. 56 00:01:52,566 --> 00:01:54,000 No, because. This data. 57 00:01:54,000 --> 00:01:56,000 Preprocessing template was meant. 58 00:01:56,000 --> 00:02:00,233 For. You to not have to change anything in most situations, as. 59 00:02:00,233 --> 00:02:00,966 Long as of course, 60 00:02:00,966 --> 00:02:05,200 you make sure that in your data set you have the features in the. 61 00:02:05,200 --> 00:02:07,866 First columns, you know these are the. Two features here. 62 00:02:07,866 --> 00:02:10,466 And the dependent variable in the last column. 63 00:02:10,466 --> 00:02:11,400 And since. 64 00:02:11,400 --> 00:02:13,766 This automatically selects all. 65 00:02:13,766 --> 00:02:14,966 The columns except. 66 00:02:14,966 --> 00:02:17,666 The last. One and this automatically. Selects the. 67 00:02:17,666 --> 00:02:20,266 Last column in this. Of course, regardless of. 68 00:02:20,266 --> 00:02:22,933 The number of. Features you have in your data set. 69 00:02:22,933 --> 00:02:24,033 Well, there you go. 70 00:02:24,033 --> 00:02:26,366 This will simply select. 71 00:02:26,366 --> 00:02:27,300 The age and. 72 00:02:27,300 --> 00:02:29,866 Salary. In the matrix of. Features. 73 00:02:29,866 --> 00:02:32,966 And this line will simply select. 74 00:02:33,300 --> 00:02:33,833 Well. 75 00:02:33,833 --> 00:02:36,700 The dependent variable in a nice 1D. Vector. 76 00:02:36,700 --> 00:02:38,100 All right so there you go. 77 00:02:38,100 --> 00:02:41,100 That was the reason why I said it was so easy. 78 00:02:41,200 --> 00:02:44,266 You only had to change the name of the dataset here. 79 00:02:44,266 --> 00:02:46,033 Social network. 80 00:02:46,033 --> 00:02:47,733 Ads dot CSV. Okay. 81 00:02:47,733 --> 00:02:49,900 And actually speaking of the dataset, now that. 82 00:02:49,900 --> 00:02:51,566 We have the code to. 83 00:02:51,566 --> 00:02:53,700 Import. It, well let's not forget to. 84 00:02:53,700 --> 00:02:56,866 Upload it in a notebook because in the same tutorial. 85 00:02:56,866 --> 00:02:59,866 We will actually. Run these. For cells. 86 00:02:59,933 --> 00:03:01,900 But before let's make sure to. 87 00:03:01,900 --> 00:03:03,900 Finish implementing them. So here. 88 00:03:03,900 --> 00:03:06,533 We are connecting to a runtime to enable. 89 00:03:06,533 --> 00:03:08,866 File. Browsing. And in a second there you go. 90 00:03:08,866 --> 00:03:11,033 We should have the upload button. 91 00:03:11,033 --> 00:03:13,800 So now we're going to click. It to find our. 92 00:03:13,800 --> 00:03:16,233 Whole machine learning dataset. Folder. 93 00:03:16,233 --> 00:03:17,800 And we're going to go inside. 94 00:03:17,800 --> 00:03:20,333 Then we're going to go to part three classification. 95 00:03:20,333 --> 00:03:22,466 Then section 14. Logistic. 96 00:03:22,466 --> 00:03:24,266 Regression Python. 97 00:03:24,266 --> 00:03:27,333 And we're going to select our social. 98 00:03:27,333 --> 00:03:29,033 Network. Ads data set. 99 00:03:29,033 --> 00:03:32,433 It is executive same with of course the H here the estimated. 100 00:03:32,433 --> 00:03:34,833 Salary and the purchase. Columns. There we go. 101 00:03:34,833 --> 00:03:36,300 Let's click. Open. 102 00:03:36,300 --> 00:03:39,633 It is going to upload. It in our notebook. 103 00:03:39,900 --> 00:03:40,900 As we can see here. 104 00:03:40,900 --> 00:03:43,100 There you go. Perfect. We have it. 105 00:03:43,100 --> 00:03:45,433 So now we're ready to import it. But first. 106 00:03:45,433 --> 00:03:47,166 I want to finish the implementation. 107 00:03:47,166 --> 00:03:48,800 Of this data preprocessing phase. 108 00:03:48,800 --> 00:03:50,933 So let's. Move on here to the next. Step. 109 00:03:50,933 --> 00:03:53,933 Splitting the data set into the training set and test it. 110 00:03:54,000 --> 00:03:55,766 And here do we have anything to change. 111 00:03:55,766 --> 00:03:57,733 Well there you go again. Of course. 112 00:03:57,733 --> 00:03:58,800 No it is not. 113 00:03:58,800 --> 00:04:00,000 Compulsory to. 114 00:04:00,000 --> 00:04:01,800 Change anything here. This will. 115 00:04:01,800 --> 00:04:03,466 Automatically. Create your. 116 00:04:03,466 --> 00:04:04,666 Training set and a test. 117 00:04:04,666 --> 00:04:06,333 Set out of the whole. 118 00:04:06,333 --> 00:04:06,966 Data set. 119 00:04:06,966 --> 00:04:08,833 You know the whole data set composed of the. 120 00:04:08,833 --> 00:04:10,233 Matrix of features X. 121 00:04:10,233 --> 00:04:12,133 And the dependent. Variable vector y. 122 00:04:12,133 --> 00:04:14,500 We're going to do some. Prints right to show. You everything. 123 00:04:14,500 --> 00:04:16,433 Because then, you know the next. 124 00:04:16,433 --> 00:04:19,133 Step is to apply feature. Scaling. 125 00:04:19,133 --> 00:04:19,633 And before. 126 00:04:19,633 --> 00:04:22,566 We get to this well I just would like to. Change something here. 127 00:04:22,566 --> 00:04:24,400 But again that was. Absolute. 128 00:04:24,400 --> 00:04:25,233 Not necessary. 129 00:04:25,233 --> 00:04:27,266 You could totally leave it like that. 130 00:04:27,266 --> 00:04:28,700 But I. Just want to change the. 131 00:04:28,700 --> 00:04:30,133 Test size because. 132 00:04:30,133 --> 00:04:30,533 Since. 133 00:04:30,533 --> 00:04:34,066 Actually we have 400 observations. 134 00:04:34,066 --> 00:04:34,800 In total, you know, we. 135 00:04:34,800 --> 00:04:37,800 Have 400. Customers in this data. Set. 136 00:04:38,000 --> 00:04:40,833 Well, you know, if we choose a test size of. 137 00:04:40,833 --> 00:04:42,933 Oh point 25, it will be nice because. 138 00:04:42,933 --> 00:04:44,533 We will have actually three. 139 00:04:44,533 --> 00:04:48,766 Hundred customers in the training set and 100 customers in the test. 140 00:04:48,766 --> 00:04:51,133 Set right will. Have nice round numbers. 141 00:04:51,133 --> 00:04:53,166 So just exceptionally here, I. 142 00:04:53,166 --> 00:04:55,300 Usually recommend. Indeed oh. Point two. 143 00:04:55,300 --> 00:04:57,733 As a test size, but oh. Point 25 is. 144 00:04:57,733 --> 00:04:59,133 Actually totally fine. 145 00:04:59,133 --> 00:05:03,433 We will have a nice training set of 300 customers and one nice test. 146 00:05:03,433 --> 00:05:05,366 Set. Of 100 customers. 147 00:05:05,366 --> 00:05:06,200 So there. You go. 148 00:05:06,200 --> 00:05:08,466 Now we can move on to the next step. 149 00:05:08,466 --> 00:05:11,366 Feature scaling. Okay. 150 00:05:11,366 --> 00:05:11,766 All right. 151 00:05:11,766 --> 00:05:12,466 So how are. 152 00:05:12,466 --> 00:05:15,000 We going to implement feature scaling. Here. 153 00:05:15,000 --> 00:05:16,866 Well there is nothing more simple. 154 00:05:16,866 --> 00:05:20,333 Thanks to our data preprocessing toolkit which I've put here. 155 00:05:20,500 --> 00:05:21,466 And if you haven't. 156 00:05:21,466 --> 00:05:22,400 Well grab it. 157 00:05:22,400 --> 00:05:24,733 In part one data preprocessing. 158 00:05:24,733 --> 00:05:27,566 And now in our toolkit we're going to scroll down 159 00:05:27,566 --> 00:05:31,033 to the bottom to find that feature scaling tool, which. 160 00:05:31,033 --> 00:05:34,033 Is actually the. Last one right. Here. 161 00:05:34,033 --> 00:05:35,733 It is that feature scaling. 162 00:05:35,733 --> 00:05:38,266 So here we go. Let's grab it. 163 00:05:38,266 --> 00:05:40,066 You know you just have to grab the first cell. 164 00:05:40,066 --> 00:05:41,366 Implementing the tool. 165 00:05:41,366 --> 00:05:43,500 Now let's. Go back to our logistic regression. 166 00:05:43,500 --> 00:05:44,700 Implementation. 167 00:05:44,700 --> 00:05:47,766 Let's create a new cut cell here for feature scaling. 168 00:05:47,933 --> 00:05:50,866 And let's. Paste it. Right inside.