1 00:00:00,830 --> 00:00:02,990 So this is the data that we have. 2 00:00:04,610 --> 00:00:11,600 This is a dependent variable that is valuation of business, we want to calculate the value generated 3 00:00:11,600 --> 00:00:12,470 by our store. 4 00:00:15,020 --> 00:00:23,510 So using these independent variables, we want to predict this dependent variable, we have around seven 5 00:00:23,510 --> 00:00:24,710 independent variables. 6 00:00:26,320 --> 00:00:33,700 The first one is tape of the store, the second one is tape of the location, the next place to tape, 7 00:00:34,300 --> 00:00:39,390 then we have the investment, then we have the dollar value of investment. 8 00:00:39,610 --> 00:00:47,290 Then we have a number of competitors, then the estimated population in the vicinity of our store, 9 00:00:47,620 --> 00:00:52,470 and then we have the average household income in the vicinity of that is store. 10 00:00:54,380 --> 00:01:01,590 So this is the past data of our stores, so we have the data of our own thirteen hundred and thirty 11 00:01:01,700 --> 00:01:02,330 stores. 12 00:01:03,470 --> 00:01:09,230 Now, let's look at this independent variable, the first one is the type of the store. 13 00:01:10,770 --> 00:01:19,440 Now, let's apply a filter to our data, so select all the data points by holding control shift and 14 00:01:19,440 --> 00:01:20,550 then the Iraqis. 15 00:01:21,570 --> 00:01:22,170 And then. 16 00:01:23,570 --> 00:01:32,420 Go to this DataLab and click on this filter icon, or else you can also use old HP shortcut. 17 00:01:35,840 --> 00:01:43,220 So once you have applied the filters, you will be able to see this dropdown buttons and inside this 18 00:01:43,220 --> 00:01:48,060 drop down menu you will see what all the values or variables are taking. 19 00:01:48,620 --> 00:01:51,210 So let's first look at the type of a store. 20 00:01:51,590 --> 00:01:53,440 This is the size of this store. 21 00:01:53,930 --> 00:02:00,000 So whether we have opened a large store or a medium store or a small store. 22 00:02:00,530 --> 00:02:04,910 So if you can see there are only three categories in this variable. 23 00:02:05,930 --> 00:02:08,550 The second variable is the type of the location. 24 00:02:08,930 --> 00:02:15,020 So when we are opening our stores, whether we are opening in a commercial space or a residential space, 25 00:02:15,770 --> 00:02:19,460 so you can see there are only two values, commercial and residential. 26 00:02:21,170 --> 00:02:25,670 And this value is telling us what is the type of location of the store. 27 00:02:27,040 --> 00:02:34,030 Then we have the city type city type means whether the city in which we have opened a store is a metro 28 00:02:34,030 --> 00:02:35,790 city or a non metro city. 29 00:02:36,160 --> 00:02:39,130 So there are only two allus micro and macro. 30 00:02:39,610 --> 00:02:44,920 And this variable is signifying the city type of our iStore. 31 00:02:46,640 --> 00:02:53,660 So these three are the only categorical variables that we have, the next four variables are continuous 32 00:02:53,660 --> 00:02:54,130 very well. 33 00:02:54,650 --> 00:02:56,580 The first variable is investment. 34 00:02:57,050 --> 00:03:04,280 This is the amount of money we invested in, say, building the furniture or the infrastructure for 35 00:03:04,280 --> 00:03:05,030 overstored. 36 00:03:06,300 --> 00:03:11,130 And as you can see, this is a continuous variable, that's why we are getting multiple values in our 37 00:03:11,130 --> 00:03:19,290 filter option, then we have the number of competitors, the number of competitors located in the vicinity 38 00:03:19,290 --> 00:03:20,670 of our store. 39 00:03:22,960 --> 00:03:25,300 You can see that there are four values. 40 00:03:26,320 --> 00:03:28,810 So maximum, we have 40 stores. 41 00:03:30,930 --> 00:03:34,150 Then we have the estimated population in the vicinity. 42 00:03:34,920 --> 00:03:39,410 So this is the number of people that are living near that store. 43 00:03:41,070 --> 00:03:48,030 Then we have the average household income near the store, so for all these people, what is the average 44 00:03:48,030 --> 00:03:50,580 household income of families? 45 00:03:52,400 --> 00:03:56,840 And at last, we have the dependent variable that is the valuation of the business. 46 00:03:58,700 --> 00:04:05,270 Now, using our linear regression model, we want to use these independent variables to predict the 47 00:04:05,270 --> 00:04:06,650 valuation of business. 48 00:04:08,040 --> 00:04:15,360 After building such model, we will be able to identify valuation of the store from this independent 49 00:04:15,360 --> 00:04:15,900 variable. 50 00:04:17,220 --> 00:04:25,190 So in a way, before investing in a property, we will be able to identify its value and if that iStore 51 00:04:25,200 --> 00:04:30,900 valuation is higher than the investment needed, then we can go ahead and open that store. 52 00:04:32,660 --> 00:04:37,460 So this is our data, we have three categorical independent variable. 53 00:04:37,910 --> 00:04:42,800 We have four continuous independent variable and then we have one dependent variable. 54 00:04:43,370 --> 00:04:47,660 In the next lecture we will apply preprocessing and this data. 55 00:04:48,900 --> 00:04:56,330 There are no missing values or outliers in this data, we just have to convert this categorical variables. 56 00:04:57,030 --> 00:05:03,570 We have already processed this data for outliers and missing value treatment since we have already discussed 57 00:05:03,570 --> 00:05:06,270 those concepts in our previous case study. 58 00:05:09,060 --> 00:05:14,880 One more important thing to note here is that this is a fictitious data, we have created this data 59 00:05:14,880 --> 00:05:21,630 only for practice, and you should not use this data for any professional work and you should also not 60 00:05:21,630 --> 00:05:25,050 use the result of this data in your professional life.