1
00:00:00,800 --> 00:00:07,800
OK, so now let's see how to transform this crime rate with evil so that it has a Moodley no relationship

2
00:00:07,890 --> 00:00:08,570
with price.

3
00:00:10,200 --> 00:00:14,460
That's bloddy scatterplot once again between crime rate and price.

4
00:00:15,690 --> 00:00:17,370
We can do this in two ways.

5
00:00:17,970 --> 00:00:21,270
One is which we saw earlier, is Beyer's

6
00:00:24,120 --> 00:00:29,190
and bracket after the right price plus crime rate

7
00:00:32,760 --> 00:00:34,410
price plus crime rate.

8
00:00:38,350 --> 00:00:39,610
Commentators you call to be.

9
00:00:45,670 --> 00:00:51,340
So you can see it gives us the plot of the to where he was that we have taken.

10
00:00:52,520 --> 00:00:56,030
This we used to plot multiple scatter plots at the same time.

11
00:00:57,560 --> 00:01:02,990
But if you have only two variables, you can plot scatterplot using this and other command, which is

12
00:01:03,110 --> 00:01:11,330
plot and within records will write D.F. Dollard place, comma de lower crime rate.

13
00:01:18,960 --> 00:01:19,710
You've done this.

14
00:01:21,690 --> 00:01:25,380
So this is a lot of two variables, and it is a normal scatterplot.

15
00:01:26,280 --> 00:01:32,640
Now, if you look at this Plourde, the relationship seems to be a curve like this.

16
00:01:35,970 --> 00:01:41,220
This girl looks like a logarithmic function, which is a little bit translated and rotated.

17
00:01:42,300 --> 00:01:47,940
Therefore, we just need to take a log of crime rate to have a Moodley no relationship with price.

18
00:01:50,350 --> 00:01:58,450
Also note that a lot of values of crime rate are nearly zero and log of zero is negative infinity.

19
00:01:59,320 --> 00:02:02,200
And we know we do not want such values in our data.

20
00:02:02,890 --> 00:02:08,260
So we'll simply add one to crime rate and then we take a lot of it.

21
00:02:08,680 --> 00:02:10,810
So let's do the change and crime rate variable.

22
00:02:13,000 --> 00:02:15,410
Simply laid the F dollar crime rate.

23
00:02:19,780 --> 00:02:28,730
Get logoff one plus B if the lower crime rate one plus the F dollar grinded.

24
00:02:33,800 --> 00:02:38,210
To the values of the variable crime rate has been changed.

25
00:02:39,660 --> 00:02:41,550
Now, let us plot this graph again.

26
00:02:42,170 --> 00:02:46,930
We'll just go to the plot on land or net again.

27
00:02:48,710 --> 00:02:55,220
Now you can see that the variable has transformed and now there is some linear relationship that we

28
00:02:55,220 --> 00:02:57,890
can identify in these two variables.

29
00:02:58,490 --> 00:03:02,900
That is all we need you to do to have a model in a relationship between two variables.

30
00:03:04,880 --> 00:03:07,040
So now this is an to be put into a model.

31
00:03:08,120 --> 00:03:14,720
But remember, once we ran the model that we have taken a log off a variable and we'll have to handle

32
00:03:14,720 --> 00:03:15,260
it later.

33
00:03:15,530 --> 00:03:16,700
Then we get the analysis.

34
00:03:18,530 --> 00:03:25,190
Next thing we are going to do is transformation of the forward distance variables into one average distance

35
00:03:25,190 --> 00:03:25,640
variable.

36
00:03:26,840 --> 00:03:33,380
For that, we need to create a new variable and that variable will take the values some of the four

37
00:03:33,380 --> 00:03:35,060
dissenters the way before.

38
00:03:36,870 --> 00:03:45,320
So we rate the F dollar average, underscore, distance it with underscore, disked

39
00:03:48,350 --> 00:03:49,020
this gate.

40
00:03:51,340 --> 00:03:54,690
So this variable is not part of DEEDI, does it?

41
00:03:55,470 --> 00:03:56,700
So this will be created.

42
00:03:57,660 --> 00:04:05,220
And what value really get aggregate the sum of all the standard variables which are which are based

43
00:04:05,220 --> 00:04:06,290
on this to this three.

44
00:04:06,290 --> 00:04:06,800
This four.

45
00:04:07,120 --> 00:04:09,790
So will within brackets relate this one day of dollar?

46
00:04:09,830 --> 00:04:10,290
This one.

47
00:04:13,560 --> 00:04:17,700
The if the largest one plus.

48
00:04:20,840 --> 00:04:22,020
They have told this to.

49
00:04:35,420 --> 00:04:44,720
And just for new it before when we done this, a new variable is created.

50
00:04:44,870 --> 00:04:49,280
If you remember that earlier our data had 19 variables.

51
00:04:49,310 --> 00:04:50,780
Now it has 20 variables.

52
00:04:52,460 --> 00:04:54,950
And this variable is having the average value.

53
00:04:55,010 --> 00:04:59,120
If we want to look at the new variable created, you can just click on this.

54
00:05:00,890 --> 00:05:02,730
And if you scroll to delayed.

55
00:05:03,560 --> 00:05:04,970
This is the new variable created.

56
00:05:05,630 --> 00:05:12,470
And this value, this 4.0 eight is actually the average of these four values.

57
00:05:16,990 --> 00:05:23,710
So we will be using this average just very well in our analysis and will not be using defore distance

58
00:05:23,710 --> 00:05:24,250
variables.

59
00:05:25,150 --> 00:05:33,190
So let us see how to delete these four variables from my data to delete a variable we need to get says

60
00:05:33,540 --> 00:05:35,160
its position in the dataset.

61
00:05:35,730 --> 00:05:41,260
So we will go back to the data and we'll look at where this discovery will is.

62
00:05:42,550 --> 00:05:48,010
So this is the first Gollum's again, third, fourth, fifth, sixth, seventh.

63
00:05:48,820 --> 00:05:54,160
So we want we do not want seven, eight, nine, ten, ten columns in our data.

64
00:05:56,190 --> 00:05:58,690
So will rate beef get?

65
00:06:04,150 --> 00:06:11,340
Beef and within square brackets, this first barometer is for the number of rules.

66
00:06:11,520 --> 00:06:14,100
We want all the rules, so we'll not mention anything.

67
00:06:14,400 --> 00:06:15,930
And we'll simply put a comma here.

68
00:06:18,030 --> 00:06:24,510
After this coma, the second perimeter is about the number of columns that you want, if you want all

69
00:06:24,510 --> 00:06:25,080
the columns.

70
00:06:25,410 --> 00:06:26,670
Do not mention anything here.

71
00:06:26,970 --> 00:06:34,380
If you want some of the columns, you can mention that column numbers if you want to remove some columns.

72
00:06:34,470 --> 00:06:36,060
We will use minus sign here.

73
00:06:36,390 --> 00:06:40,470
So we'll rate minus seven to minus 10.

74
00:06:40,770 --> 00:06:46,260
So we'll use this column operator to give that.

75
00:06:46,380 --> 00:06:49,590
It is a series, so it is a series from minus seven to minus 10.

76
00:06:50,400 --> 00:06:57,380
If you are not sure about whether this will remove the correct columns, you can also create a new variable

77
00:06:57,570 --> 00:07:01,650
instead of assigning this value to the if you can assign it to be of two.

78
00:07:02,250 --> 00:07:03,940
So let us change it to be of two.

79
00:07:09,080 --> 00:07:11,450
So I'd be off to variable is created.

80
00:07:11,540 --> 00:07:17,800
Now you can see that it has 16 variables for all the variables have been deleted.

81
00:07:18,710 --> 00:07:20,360
To look at that will click on it.

82
00:07:22,780 --> 00:07:28,220
Let's see if before that it was this one, this two, this three have been deleted.

83
00:07:29,210 --> 00:07:30,830
You can see we cannot find them.

84
00:07:30,980 --> 00:07:32,740
They are deleted from our dataset.

85
00:07:34,460 --> 00:07:40,760
So we can assign the value of these two to be, if you will, just eight be a physical to be of two.

86
00:07:49,800 --> 00:07:58,260
The city of two and really the lead of two will late autumn Blackett leave to

87
00:08:01,730 --> 00:08:02,830
be two is gone.

88
00:08:02,920 --> 00:08:10,840
We have D.F. if you remember there was one categorical variable called bus terminal, which had only,

89
00:08:10,840 --> 00:08:11,140
yes.

90
00:08:11,140 --> 00:08:14,280
Values, and we decided that we would be diluting that variable also.

91
00:08:15,590 --> 00:08:20,170
And if you look at the data said, this is the variable I'm talking about.

92
00:08:20,350 --> 00:08:21,670
It has only yes.

93
00:08:21,780 --> 00:08:22,120
In it.

94
00:08:22,960 --> 00:08:25,390
And we decided that we'll be deleting it.

95
00:08:26,020 --> 00:08:29,790
So let us go and delete this variable also from our say.

96
00:08:30,250 --> 00:08:33,280
So let us calculate the position of this column.

97
00:08:34,150 --> 00:08:40,660
So the position we have identified is the 14th column will write that same statement again, D.F. Gate

98
00:08:41,240 --> 00:08:47,950
B, if I was in square brackets, we will not take the 14th column.

99
00:08:48,580 --> 00:08:49,750
So minus 14.

100
00:08:51,920 --> 00:08:56,380
Let's fundus let's go back to data and check.

101
00:09:00,650 --> 00:09:04,160
You can see that bus terminal variable is also deleted.

102
00:09:05,190 --> 00:09:11,310
So after doing all these operations, if you remember, if we go back to the four observations that

103
00:09:11,310 --> 00:09:16,690
we identified when we did univariate analysis, I think we have covered all these four.

104
00:09:17,340 --> 00:09:19,290
We have handled the outliers.

105
00:09:19,980 --> 00:09:22,110
We have imputed the missing values.

106
00:09:22,180 --> 00:09:26,200
And Osbert, we ability the bus terminal very well.

107
00:09:26,550 --> 00:09:33,090
And we have manipulated the crime rate variable so that it has a more linear relationship with the price

108
00:09:33,090 --> 00:09:33,490
variable.

109
00:09:33,990 --> 00:09:37,440
All the things we identified from univariate analysis we have handled.