1 00:00:00,300 --> 00:00:01,700 So here we are in. 2 00:00:01,700 --> 00:00:04,366 And now that you understand feature scaling 3 00:00:04,366 --> 00:00:07,166 let's apply it to the training set. 4 00:00:07,166 --> 00:00:12,300 And the test set here we will just write two lines of code which are. 5 00:00:12,300 --> 00:00:15,300 So the first line is training sets. 6 00:00:15,766 --> 00:00:17,833 So two training sets already exists right. 7 00:00:17,833 --> 00:00:19,800 It's this one. The training set. 8 00:00:19,800 --> 00:00:21,633 As you can see it's not scaled. 9 00:00:21,633 --> 00:00:23,200 These are 10 00:00:23,200 --> 00:00:25,733 it contains the raw values. 11 00:00:25,733 --> 00:00:29,066 So training set equals and simply. 12 00:00:29,066 --> 00:00:32,066 Now we're going to write scale. 13 00:00:35,400 --> 00:00:37,733 And then training set. 14 00:00:37,733 --> 00:00:39,333 That's all that scales. 15 00:00:39,333 --> 00:00:41,600 Your training set is in a grid. 16 00:00:41,600 --> 00:00:42,766 And same for the test set. 17 00:00:42,766 --> 00:00:45,766 We will copy this line. 18 00:00:46,000 --> 00:00:47,233 Paste it here. 19 00:00:47,233 --> 00:00:50,233 And we will change training into test. 20 00:00:50,900 --> 00:00:53,900 And here as well. 21 00:00:54,866 --> 00:00:56,533 Okay so here I just wrote this. 22 00:00:56,533 --> 00:00:58,766 There is something important to understand. 23 00:00:58,766 --> 00:01:00,000 This will be the feature 24 00:01:00,000 --> 00:01:03,000 scaling block of code that we will be using in our template. 25 00:01:03,166 --> 00:01:07,933 However let's see what happens when I select this code and execute it. 26 00:01:08,800 --> 00:01:11,000 So let's select this execute 27 00:01:12,000 --> 00:01:12,833 okay. 28 00:01:12,833 --> 00:01:15,833 As you can see here I obtained two errors Eren. 29 00:01:15,833 --> 00:01:19,666 Code means x must be numeric for the training set, and the test set. 30 00:01:20,566 --> 00:01:22,800 Can you guess what the problem is? 31 00:01:22,800 --> 00:01:24,000 The problem is that. 32 00:01:24,000 --> 00:01:26,400 Okay, well, it tells us what the problem is. 33 00:01:26,400 --> 00:01:28,766 It tells us that X must be numeric. 34 00:01:28,766 --> 00:01:30,700 But what does that mean? 35 00:01:30,700 --> 00:01:32,066 Well, first what is this x? 36 00:01:32,066 --> 00:01:34,833 X is for this line of code, the training set. 37 00:01:34,833 --> 00:01:36,700 And for this line of code the test set. 38 00:01:36,700 --> 00:01:38,433 So let's forget the test set for a second. 39 00:01:38,433 --> 00:01:40,266 And let's focus on the training set. 40 00:01:40,266 --> 00:01:41,700 So x is the training set. 41 00:01:41,700 --> 00:01:44,700 And it says that the training set must be numeric. 42 00:01:44,700 --> 00:01:47,533 So let's look at our training set okay. 43 00:01:47,533 --> 00:01:49,300 Well the training set looks numeric. 44 00:01:49,300 --> 00:01:53,000 Right here we have numeric values numeric values numeric values numeric values. 45 00:01:53,533 --> 00:01:57,433 But no actually there are two columns that don't have numeric values. 46 00:01:58,100 --> 00:02:01,633 It's this one the country and this one purchased. 47 00:02:02,200 --> 00:02:03,933 And you remember why? 48 00:02:03,933 --> 00:02:06,966 Well, it's because before we had the country written in text 49 00:02:07,233 --> 00:02:10,233 and the purchase column written in text with yes or no, 50 00:02:10,233 --> 00:02:14,566 and we changed that by putting the categories as factors. 51 00:02:15,000 --> 00:02:16,600 That's what we did here. 52 00:02:16,600 --> 00:02:21,000 As you remember, data set country equals factor 53 00:02:21,300 --> 00:02:24,933 of the different levels and labels and a factor. 54 00:02:24,933 --> 00:02:27,866 And R is not a numeric number. 55 00:02:27,866 --> 00:02:31,300 And when you apply the scale here x must be numeric. 56 00:02:31,300 --> 00:02:36,300 That means that all the columns in x that is the training set must be numeric. 57 00:02:37,633 --> 00:02:39,133 So this time we are going to 58 00:02:39,133 --> 00:02:42,733 exclude the categories from the feature scaling. 59 00:02:42,966 --> 00:02:45,966 We're not going to apply feature scaling on those columns. 60 00:02:46,133 --> 00:02:47,366 So that's very simple. 61 00:02:47,366 --> 00:02:51,800 All we need to do is to take the columns we're interested in 62 00:02:52,500 --> 00:02:56,800 which are well the indexes of the column we want to scale which are indexes. 63 00:02:57,200 --> 00:02:59,066 So indexes in R start at one. 64 00:02:59,066 --> 00:03:01,566 So that's 123. 65 00:03:01,566 --> 00:03:04,233 So that's the second and third index 66 00:03:04,233 --> 00:03:07,633 we want to take to scale the age and the salary columns. 67 00:03:08,066 --> 00:03:10,966 So two and three let's input it. 68 00:03:10,966 --> 00:03:14,200 We need to specify here two column three. 69 00:03:14,466 --> 00:03:16,266 And that gets what we want. 70 00:03:16,266 --> 00:03:19,266 So now I'm going to copy this copy 71 00:03:20,100 --> 00:03:22,833 paste it here. 72 00:03:22,833 --> 00:03:25,366 Here. 73 00:03:25,366 --> 00:03:26,400 And here. 74 00:03:26,400 --> 00:03:27,900 All right. And now it's ready. 75 00:03:27,900 --> 00:03:30,400 Let's have a look at the training set and the test sets. 76 00:03:30,400 --> 00:03:33,000 Not scaled not scaled. 77 00:03:33,000 --> 00:03:34,466 Let's go back to our code. 78 00:03:34,466 --> 00:03:36,066 Let's select this. 79 00:03:36,066 --> 00:03:39,066 And now we shouldn't get an error. 80 00:03:39,866 --> 00:03:41,466 Come in a control percentage to execute. 81 00:03:42,600 --> 00:03:43,366 Here we go. 82 00:03:43,366 --> 00:03:44,933 It executed properly. 83 00:03:44,933 --> 00:03:47,800 And now let's look at the training set. 84 00:03:47,800 --> 00:03:49,800 All scaled properly. Perfect. 85 00:03:49,800 --> 00:03:51,866 And the test set 86 00:03:51,866 --> 00:03:54,033 all scaled properly perfect. 87 00:03:54,033 --> 00:03:57,300 And now our data is ready to offer a good precision 88 00:03:57,300 --> 00:04:00,833 and good accuracy and a fast work of the machine learning models. 89 00:04:00,833 --> 00:04:05,100 And by that I mean that the machine learning models will converge rapidly. 90 00:04:06,866 --> 00:04:08,600 Okay, so that's it for feature scaling. 91 00:04:08,600 --> 00:04:10,733 Now you know how to apply features. 92 00:04:10,733 --> 00:04:13,833 Getting to your data in Python in R congratulations. 93 00:04:13,833 --> 00:04:17,066 And mostly congratulations because we did 94 00:04:17,066 --> 00:04:20,366 all the required steps to preprocess our data. 95 00:04:20,366 --> 00:04:21,733 Feature scaling was the last one 96 00:04:21,733 --> 00:04:25,766 because the next tutorial will be about this data pre-processing template, 97 00:04:25,833 --> 00:04:29,666 and I will just explain how we are going to use it in our machine learning models. 98 00:04:29,666 --> 00:04:33,700 It's going to be very fast and practical, so we are done with the data 99 00:04:33,700 --> 00:04:34,366 pre-processing. 100 00:04:34,366 --> 00:04:37,000 Congratulations, you did the most difficult part 101 00:04:37,000 --> 00:04:38,700 and now it's time to have fun. 102 00:04:38,700 --> 00:04:42,733 It's time to start making the models and I can't wait to start them with you. 103 00:04:43,233 --> 00:04:45,300 So thank you for watching this tutorial. 104 00:04:45,300 --> 00:04:47,100 I look forward to seeing you on the next one. 105 00:04:47,100 --> 00:04:48,966 And until then, enjoy machine learning.