1 00:00:00,800 --> 00:00:07,800 OK, so now let's see how to transform this crime rate with evil so that it has a Moodley no relationship 2 00:00:07,890 --> 00:00:08,570 with price. 3 00:00:10,200 --> 00:00:14,460 That's bloddy scatterplot once again between crime rate and price. 4 00:00:15,690 --> 00:00:17,370 We can do this in two ways. 5 00:00:17,970 --> 00:00:21,270 One is which we saw earlier, is Beyer's 6 00:00:24,120 --> 00:00:29,190 and bracket after the right price plus crime rate 7 00:00:32,760 --> 00:00:34,410 price plus crime rate. 8 00:00:38,350 --> 00:00:39,610 Commentators you call to be. 9 00:00:45,670 --> 00:00:51,340 So you can see it gives us the plot of the to where he was that we have taken. 10 00:00:52,520 --> 00:00:56,030 This we used to plot multiple scatter plots at the same time. 11 00:00:57,560 --> 00:01:02,990 But if you have only two variables, you can plot scatterplot using this and other command, which is 12 00:01:03,110 --> 00:01:11,330 plot and within records will write D.F. Dollard place, comma de lower crime rate. 13 00:01:18,960 --> 00:01:19,710 You've done this. 14 00:01:21,690 --> 00:01:25,380 So this is a lot of two variables, and it is a normal scatterplot. 15 00:01:26,280 --> 00:01:32,640 Now, if you look at this Plourde, the relationship seems to be a curve like this. 16 00:01:35,970 --> 00:01:41,220 This girl looks like a logarithmic function, which is a little bit translated and rotated. 17 00:01:42,300 --> 00:01:47,940 Therefore, we just need to take a log of crime rate to have a Moodley no relationship with price. 18 00:01:50,350 --> 00:01:58,450 Also note that a lot of values of crime rate are nearly zero and log of zero is negative infinity. 19 00:01:59,320 --> 00:02:02,200 And we know we do not want such values in our data. 20 00:02:02,890 --> 00:02:08,260 So we'll simply add one to crime rate and then we take a lot of it. 21 00:02:08,680 --> 00:02:10,810 So let's do the change and crime rate variable. 22 00:02:13,000 --> 00:02:15,410 Simply laid the F dollar crime rate. 23 00:02:19,780 --> 00:02:28,730 Get logoff one plus B if the lower crime rate one plus the F dollar grinded. 24 00:02:33,800 --> 00:02:38,210 To the values of the variable crime rate has been changed. 25 00:02:39,660 --> 00:02:41,550 Now, let us plot this graph again. 26 00:02:42,170 --> 00:02:46,930 We'll just go to the plot on land or net again. 27 00:02:48,710 --> 00:02:55,220 Now you can see that the variable has transformed and now there is some linear relationship that we 28 00:02:55,220 --> 00:02:57,890 can identify in these two variables. 29 00:02:58,490 --> 00:03:02,900 That is all we need you to do to have a model in a relationship between two variables. 30 00:03:04,880 --> 00:03:07,040 So now this is an to be put into a model. 31 00:03:08,120 --> 00:03:14,720 But remember, once we ran the model that we have taken a log off a variable and we'll have to handle 32 00:03:14,720 --> 00:03:15,260 it later. 33 00:03:15,530 --> 00:03:16,700 Then we get the analysis. 34 00:03:18,530 --> 00:03:25,190 Next thing we are going to do is transformation of the forward distance variables into one average distance 35 00:03:25,190 --> 00:03:25,640 variable. 36 00:03:26,840 --> 00:03:33,380 For that, we need to create a new variable and that variable will take the values some of the four 37 00:03:33,380 --> 00:03:35,060 dissenters the way before. 38 00:03:36,870 --> 00:03:45,320 So we rate the F dollar average, underscore, distance it with underscore, disked 39 00:03:48,350 --> 00:03:49,020 this gate. 40 00:03:51,340 --> 00:03:54,690 So this variable is not part of DEEDI, does it? 41 00:03:55,470 --> 00:03:56,700 So this will be created. 42 00:03:57,660 --> 00:04:05,220 And what value really get aggregate the sum of all the standard variables which are which are based 43 00:04:05,220 --> 00:04:06,290 on this to this three. 44 00:04:06,290 --> 00:04:06,800 This four. 45 00:04:07,120 --> 00:04:09,790 So will within brackets relate this one day of dollar? 46 00:04:09,830 --> 00:04:10,290 This one. 47 00:04:13,560 --> 00:04:17,700 The if the largest one plus. 48 00:04:20,840 --> 00:04:22,020 They have told this to. 49 00:04:35,420 --> 00:04:44,720 And just for new it before when we done this, a new variable is created. 50 00:04:44,870 --> 00:04:49,280 If you remember that earlier our data had 19 variables. 51 00:04:49,310 --> 00:04:50,780 Now it has 20 variables. 52 00:04:52,460 --> 00:04:54,950 And this variable is having the average value. 53 00:04:55,010 --> 00:04:59,120 If we want to look at the new variable created, you can just click on this. 54 00:05:00,890 --> 00:05:02,730 And if you scroll to delayed. 55 00:05:03,560 --> 00:05:04,970 This is the new variable created. 56 00:05:05,630 --> 00:05:12,470 And this value, this 4.0 eight is actually the average of these four values. 57 00:05:16,990 --> 00:05:23,710 So we will be using this average just very well in our analysis and will not be using defore distance 58 00:05:23,710 --> 00:05:24,250 variables. 59 00:05:25,150 --> 00:05:33,190 So let us see how to delete these four variables from my data to delete a variable we need to get says 60 00:05:33,540 --> 00:05:35,160 its position in the dataset. 61 00:05:35,730 --> 00:05:41,260 So we will go back to the data and we'll look at where this discovery will is. 62 00:05:42,550 --> 00:05:48,010 So this is the first Gollum's again, third, fourth, fifth, sixth, seventh. 63 00:05:48,820 --> 00:05:54,160 So we want we do not want seven, eight, nine, ten, ten columns in our data. 64 00:05:56,190 --> 00:05:58,690 So will rate beef get? 65 00:06:04,150 --> 00:06:11,340 Beef and within square brackets, this first barometer is for the number of rules. 66 00:06:11,520 --> 00:06:14,100 We want all the rules, so we'll not mention anything. 67 00:06:14,400 --> 00:06:15,930 And we'll simply put a comma here. 68 00:06:18,030 --> 00:06:24,510 After this coma, the second perimeter is about the number of columns that you want, if you want all 69 00:06:24,510 --> 00:06:25,080 the columns. 70 00:06:25,410 --> 00:06:26,670 Do not mention anything here. 71 00:06:26,970 --> 00:06:34,380 If you want some of the columns, you can mention that column numbers if you want to remove some columns. 72 00:06:34,470 --> 00:06:36,060 We will use minus sign here. 73 00:06:36,390 --> 00:06:40,470 So we'll rate minus seven to minus 10. 74 00:06:40,770 --> 00:06:46,260 So we'll use this column operator to give that. 75 00:06:46,380 --> 00:06:49,590 It is a series, so it is a series from minus seven to minus 10. 76 00:06:50,400 --> 00:06:57,380 If you are not sure about whether this will remove the correct columns, you can also create a new variable 77 00:06:57,570 --> 00:07:01,650 instead of assigning this value to the if you can assign it to be of two. 78 00:07:02,250 --> 00:07:03,940 So let us change it to be of two. 79 00:07:09,080 --> 00:07:11,450 So I'd be off to variable is created. 80 00:07:11,540 --> 00:07:17,800 Now you can see that it has 16 variables for all the variables have been deleted. 81 00:07:18,710 --> 00:07:20,360 To look at that will click on it. 82 00:07:22,780 --> 00:07:28,220 Let's see if before that it was this one, this two, this three have been deleted. 83 00:07:29,210 --> 00:07:30,830 You can see we cannot find them. 84 00:07:30,980 --> 00:07:32,740 They are deleted from our dataset. 85 00:07:34,460 --> 00:07:40,760 So we can assign the value of these two to be, if you will, just eight be a physical to be of two. 86 00:07:49,800 --> 00:07:58,260 The city of two and really the lead of two will late autumn Blackett leave to 87 00:08:01,730 --> 00:08:02,830 be two is gone. 88 00:08:02,920 --> 00:08:10,840 We have D.F. if you remember there was one categorical variable called bus terminal, which had only, 89 00:08:10,840 --> 00:08:11,140 yes. 90 00:08:11,140 --> 00:08:14,280 Values, and we decided that we would be diluting that variable also. 91 00:08:15,590 --> 00:08:20,170 And if you look at the data said, this is the variable I'm talking about. 92 00:08:20,350 --> 00:08:21,670 It has only yes. 93 00:08:21,780 --> 00:08:22,120 In it. 94 00:08:22,960 --> 00:08:25,390 And we decided that we'll be deleting it. 95 00:08:26,020 --> 00:08:29,790 So let us go and delete this variable also from our say. 96 00:08:30,250 --> 00:08:33,280 So let us calculate the position of this column. 97 00:08:34,150 --> 00:08:40,660 So the position we have identified is the 14th column will write that same statement again, D.F. Gate 98 00:08:41,240 --> 00:08:47,950 B, if I was in square brackets, we will not take the 14th column. 99 00:08:48,580 --> 00:08:49,750 So minus 14. 100 00:08:51,920 --> 00:08:56,380 Let's fundus let's go back to data and check. 101 00:09:00,650 --> 00:09:04,160 You can see that bus terminal variable is also deleted. 102 00:09:05,190 --> 00:09:11,310 So after doing all these operations, if you remember, if we go back to the four observations that 103 00:09:11,310 --> 00:09:16,690 we identified when we did univariate analysis, I think we have covered all these four. 104 00:09:17,340 --> 00:09:19,290 We have handled the outliers. 105 00:09:19,980 --> 00:09:22,110 We have imputed the missing values. 106 00:09:22,180 --> 00:09:26,200 And Osbert, we ability the bus terminal very well. 107 00:09:26,550 --> 00:09:33,090 And we have manipulated the crime rate variable so that it has a more linear relationship with the price 108 00:09:33,090 --> 00:09:33,490 variable. 109 00:09:33,990 --> 00:09:37,440 All the things we identified from univariate analysis we have handled.