1
00:00:00,500 --> 00:00:03,800
So now we have imported our data.

2
00:00:04,430 --> 00:00:06,980
We have imputed the missing values.

3
00:00:07,460 --> 00:00:15,170
We have created dummy variables for our categorical variables and we also have divided over data and

4
00:00:15,170 --> 00:00:17,860
do X and Y and has Centerin.

5
00:00:19,520 --> 00:00:23,540
Now, the next step is to centralize our data.

6
00:00:25,210 --> 00:00:27,160
What do we mean by standardizing?

7
00:00:27,640 --> 00:00:37,420
It means that we will convert the mean and variance of each of the variable to be zero and one respectively.

8
00:00:38,080 --> 00:00:46,180
So we will try to convert our variables in such a way that the mean of all the variables should be zero

9
00:00:47,190 --> 00:00:51,940
and we will try to convert their variance as one.

10
00:00:52,820 --> 00:00:56,860
So we want to transform each variable by multiplying.

11
00:00:57,880 --> 00:01:04,420
Or adding some value to get zero as mean and one as variants.

12
00:01:06,050 --> 00:01:10,340
Now, there is no need to manually compute this numbers.

13
00:01:11,000 --> 00:01:13,880
The multiplier or the addition value.

14
00:01:15,000 --> 00:01:22,200
There is a separate function in a scale on that, but you do the same thing for all the variables.

15
00:01:24,270 --> 00:01:33,570
Now, standardizing is very important because SVM can only give us correct reserved when we centralize

16
00:01:33,570 --> 00:01:38,860
our data, since in SVM we are calculating distance and distance.

17
00:01:38,880 --> 00:01:41,430
Depends on the scale of each variable.

18
00:01:42,150 --> 00:01:47,200
So suppose if you are using one variable with is skills of posing.

19
00:01:47,550 --> 00:01:55,920
So the values are like 2000, 3000, 5000, etc. and there isn't another variable in which the scales

20
00:01:55,970 --> 00:01:56,950
is in decimal.

21
00:01:57,000 --> 00:02:01,420
So suppose point one point two point three point four, etc..

22
00:02:02,760 --> 00:02:09,270
So when we are calculating the census, we should use the same scale for these two variables.

23
00:02:09,720 --> 00:02:16,950
Since if we do not standardize our data, the impact of the variable with larger scale will be much

24
00:02:16,950 --> 00:02:21,570
more than the impact of the variable with a smaller scale.

25
00:02:22,800 --> 00:02:28,170
So it is very important to standardize our data before running as a model.

26
00:02:29,900 --> 00:02:33,020
So there are several ways to standardize our data.

27
00:02:33,380 --> 00:02:41,120
The first one is to use the standard is scalar in which we will transform the mean of all the variables

28
00:02:41,510 --> 00:02:43,520
and variance to be one.

29
00:02:44,330 --> 00:02:52,430
Another way is to use min max scalar and which we will transform each variable so that the lowest value

30
00:02:52,430 --> 00:02:56,990
of each variable is zero and the highest value of each variable is one.

31
00:02:58,490 --> 00:03:05,000
You can use either of these two, but in this course we will discuss the standard scale at.

32
00:03:07,110 --> 00:03:12,090
I have provided the link of a standard Schallert function of a Skillern.

33
00:03:13,400 --> 00:03:17,450
You can use this link to learn more about this.

34
00:03:17,690 --> 00:03:22,190
So this is the Skillern official documentation of a standard scale.

35
00:03:23,290 --> 00:03:25,660
We are not discussing this in detail.

36
00:03:26,470 --> 00:03:32,440
But here you can find all the parameters and all the attributes of.

37
00:03:33,830 --> 00:03:35,660
This centered skill at function.

38
00:03:38,470 --> 00:03:43,350
So standardizing using a Skillern is very easy for us.

39
00:03:43,600 --> 00:03:46,560
We will import the standard scalar function.

40
00:03:47,020 --> 00:03:51,370
Then we create our scalar object using our X data.

41
00:03:53,220 --> 00:03:57,280
So here we are using Ascender Discolor function.

42
00:03:57,690 --> 00:04:00,570
And then we're fighting over ex data.

43
00:04:00,750 --> 00:04:03,060
And this is standard escala function.

44
00:04:03,810 --> 00:04:07,770
And we are assigning this object to a variable name at C.

45
00:04:09,120 --> 00:04:16,590
So now this AC will contain the information or the transformation for each of the variable.

46
00:04:17,740 --> 00:04:25,380
No Exadata, we can use this at sea to transform what Ekstrand data.

47
00:04:26,640 --> 00:04:34,720
So we are creating another variable that is extreme, underscore the standard and we are using at sea.

48
00:04:34,890 --> 00:04:40,530
And then we are using transform method of at sea to transform this Ekstrand data.

49
00:04:42,120 --> 00:04:47,120
So my at Segunda and all the information that we need to transform our data.

50
00:04:48,340 --> 00:04:51,490
And we are using DOT Transformatory to do this.

51
00:04:52,110 --> 00:04:55,050
We lose similar method for expressed also.

52
00:04:55,600 --> 00:05:04,030
We will create X, underscore tests, underscore sender and we will use at sea and the transform method

53
00:05:04,690 --> 00:05:06,610
to transform X data.

54
00:05:09,180 --> 00:05:13,350
So now might as standardized data them.

55
00:05:13,710 --> 00:05:15,030
These two variables.

56
00:05:16,290 --> 00:05:21,390
There is no need to standardize Vibert Abels, since we are predicting the value of Y.

57
00:05:21,540 --> 00:05:24,750
We only need to standardize our X data.

58
00:05:26,470 --> 00:05:26,950
Now.

59
00:05:28,020 --> 00:05:35,820
If you look at the values of Express's standard, you can see all of my values are in decimal points,

60
00:05:36,420 --> 00:05:43,650
why we are getting decimal points because we have converted our meaning to zero and variance to one.

61
00:05:45,240 --> 00:05:52,430
If I compare this values with a word X test in do to say.

62
00:05:58,700 --> 00:06:02,990
So here you can see that the scales of values are different.

63
00:06:03,750 --> 00:06:07,940
The marketing expense are in tens and hundreds.

64
00:06:08,570 --> 00:06:11,570
The multiplex coverage is in the form of decimal point.

65
00:06:11,720 --> 00:06:15,890
But here, if you see our the data of.

66
00:06:16,840 --> 00:06:21,040
Uniform is scale, all the values are in decimal.

67
00:06:23,190 --> 00:06:27,580
We will use over a standardized data to train our more than.