1 00:00:00,500 --> 00:00:03,800 So now we have imported our data. 2 00:00:04,430 --> 00:00:06,980 We have imputed the missing values. 3 00:00:07,460 --> 00:00:15,170 We have created dummy variables for our categorical variables and we also have divided over data and 4 00:00:15,170 --> 00:00:17,860 do X and Y and has Centerin. 5 00:00:19,520 --> 00:00:23,540 Now, the next step is to centralize our data. 6 00:00:25,210 --> 00:00:27,160 What do we mean by standardizing? 7 00:00:27,640 --> 00:00:37,420 It means that we will convert the mean and variance of each of the variable to be zero and one respectively. 8 00:00:38,080 --> 00:00:46,180 So we will try to convert our variables in such a way that the mean of all the variables should be zero 9 00:00:47,190 --> 00:00:51,940 and we will try to convert their variance as one. 10 00:00:52,820 --> 00:00:56,860 So we want to transform each variable by multiplying. 11 00:00:57,880 --> 00:01:04,420 Or adding some value to get zero as mean and one as variants. 12 00:01:06,050 --> 00:01:10,340 Now, there is no need to manually compute this numbers. 13 00:01:11,000 --> 00:01:13,880 The multiplier or the addition value. 14 00:01:15,000 --> 00:01:22,200 There is a separate function in a scale on that, but you do the same thing for all the variables. 15 00:01:24,270 --> 00:01:33,570 Now, standardizing is very important because SVM can only give us correct reserved when we centralize 16 00:01:33,570 --> 00:01:38,860 our data, since in SVM we are calculating distance and distance. 17 00:01:38,880 --> 00:01:41,430 Depends on the scale of each variable. 18 00:01:42,150 --> 00:01:47,200 So suppose if you are using one variable with is skills of posing. 19 00:01:47,550 --> 00:01:55,920 So the values are like 2000, 3000, 5000, etc. and there isn't another variable in which the scales 20 00:01:55,970 --> 00:01:56,950 is in decimal. 21 00:01:57,000 --> 00:02:01,420 So suppose point one point two point three point four, etc.. 22 00:02:02,760 --> 00:02:09,270 So when we are calculating the census, we should use the same scale for these two variables. 23 00:02:09,720 --> 00:02:16,950 Since if we do not standardize our data, the impact of the variable with larger scale will be much 24 00:02:16,950 --> 00:02:21,570 more than the impact of the variable with a smaller scale. 25 00:02:22,800 --> 00:02:28,170 So it is very important to standardize our data before running as a model. 26 00:02:29,900 --> 00:02:33,020 So there are several ways to standardize our data. 27 00:02:33,380 --> 00:02:41,120 The first one is to use the standard is scalar in which we will transform the mean of all the variables 28 00:02:41,510 --> 00:02:43,520 and variance to be one. 29 00:02:44,330 --> 00:02:52,430 Another way is to use min max scalar and which we will transform each variable so that the lowest value 30 00:02:52,430 --> 00:02:56,990 of each variable is zero and the highest value of each variable is one. 31 00:02:58,490 --> 00:03:05,000 You can use either of these two, but in this course we will discuss the standard scale at. 32 00:03:07,110 --> 00:03:12,090 I have provided the link of a standard Schallert function of a Skillern. 33 00:03:13,400 --> 00:03:17,450 You can use this link to learn more about this. 34 00:03:17,690 --> 00:03:22,190 So this is the Skillern official documentation of a standard scale. 35 00:03:23,290 --> 00:03:25,660 We are not discussing this in detail. 36 00:03:26,470 --> 00:03:32,440 But here you can find all the parameters and all the attributes of. 37 00:03:33,830 --> 00:03:35,660 This centered skill at function. 38 00:03:38,470 --> 00:03:43,350 So standardizing using a Skillern is very easy for us. 39 00:03:43,600 --> 00:03:46,560 We will import the standard scalar function. 40 00:03:47,020 --> 00:03:51,370 Then we create our scalar object using our X data. 41 00:03:53,220 --> 00:03:57,280 So here we are using Ascender Discolor function. 42 00:03:57,690 --> 00:04:00,570 And then we're fighting over ex data. 43 00:04:00,750 --> 00:04:03,060 And this is standard escala function. 44 00:04:03,810 --> 00:04:07,770 And we are assigning this object to a variable name at C. 45 00:04:09,120 --> 00:04:16,590 So now this AC will contain the information or the transformation for each of the variable. 46 00:04:17,740 --> 00:04:25,380 No Exadata, we can use this at sea to transform what Ekstrand data. 47 00:04:26,640 --> 00:04:34,720 So we are creating another variable that is extreme, underscore the standard and we are using at sea. 48 00:04:34,890 --> 00:04:40,530 And then we are using transform method of at sea to transform this Ekstrand data. 49 00:04:42,120 --> 00:04:47,120 So my at Segunda and all the information that we need to transform our data. 50 00:04:48,340 --> 00:04:51,490 And we are using DOT Transformatory to do this. 51 00:04:52,110 --> 00:04:55,050 We lose similar method for expressed also. 52 00:04:55,600 --> 00:05:04,030 We will create X, underscore tests, underscore sender and we will use at sea and the transform method 53 00:05:04,690 --> 00:05:06,610 to transform X data. 54 00:05:09,180 --> 00:05:13,350 So now might as standardized data them. 55 00:05:13,710 --> 00:05:15,030 These two variables. 56 00:05:16,290 --> 00:05:21,390 There is no need to standardize Vibert Abels, since we are predicting the value of Y. 57 00:05:21,540 --> 00:05:24,750 We only need to standardize our X data. 58 00:05:26,470 --> 00:05:26,950 Now. 59 00:05:28,020 --> 00:05:35,820 If you look at the values of Express's standard, you can see all of my values are in decimal points, 60 00:05:36,420 --> 00:05:43,650 why we are getting decimal points because we have converted our meaning to zero and variance to one. 61 00:05:45,240 --> 00:05:52,430 If I compare this values with a word X test in do to say. 62 00:05:58,700 --> 00:06:02,990 So here you can see that the scales of values are different. 63 00:06:03,750 --> 00:06:07,940 The marketing expense are in tens and hundreds. 64 00:06:08,570 --> 00:06:11,570 The multiplex coverage is in the form of decimal point. 65 00:06:11,720 --> 00:06:15,890 But here, if you see our the data of. 66 00:06:16,840 --> 00:06:21,040 Uniform is scale, all the values are in decimal. 67 00:06:23,190 --> 00:06:27,580 We will use over a standardized data to train our more than.