1 00:00:01,290 --> 00:00:05,850 So here is the data which we are going to use to build our regression tree. 2 00:00:08,010 --> 00:00:09,480 Let me tell you something about this data. 3 00:00:10,750 --> 00:00:13,510 In the columns, we have all the variables. 4 00:00:14,860 --> 00:00:20,870 So for this data, we have 18 columns, which means we have 18 different variables. 5 00:00:22,920 --> 00:00:25,410 The last column of this data is collection. 6 00:00:26,700 --> 00:00:33,360 This is a dependent, but even that is this is the variable that we want to predict values for. 7 00:00:35,210 --> 00:00:40,850 And we'll be using the other 17 variables, which we will also call as predictive variables. 8 00:00:41,870 --> 00:00:44,030 To the value of collection. 9 00:00:46,330 --> 00:00:52,630 So as you can guess, from looking at the variable names, this is a date of movies. 10 00:00:54,430 --> 00:00:55,540 This is a simulated data. 11 00:00:55,600 --> 00:00:57,280 That is this is not true. 12 00:00:57,500 --> 00:00:58,540 They dolf movies. 13 00:01:00,600 --> 00:01:04,360 And this dataset, we have variables like how much? 14 00:01:05,390 --> 00:01:08,480 Was the marketing expense during making of the movie? 15 00:01:08,930 --> 00:01:10,790 How much was the production expenses? 16 00:01:12,590 --> 00:01:14,540 How many multiplexes were covered? 17 00:01:15,710 --> 00:01:17,810 What does the budget of the movie and so on? 18 00:01:21,540 --> 00:01:26,290 And we have this data for five hundred six different movies. 19 00:01:26,970 --> 00:01:30,810 So in this data table, we have 506 observations. 20 00:01:31,410 --> 00:01:34,040 The observations are in the rules. 21 00:01:34,680 --> 00:01:40,650 So if you look at the number of rows, we have 507, which includes de ADIRU. 22 00:01:44,410 --> 00:01:52,940 So using this date of 506 movies in which we already have the data of these predictors, 17 variables 23 00:01:53,540 --> 00:02:00,920 and the data of how much those movies actually collected, we will be creating a model that will help 24 00:02:00,920 --> 00:02:06,530 us predictive value of collection, given the values of other 17 variables. 25 00:02:07,400 --> 00:02:14,150 That is, if you are creating a new movie and you have the values of all these 17 variables, you can 26 00:02:14,150 --> 00:02:17,840 predict how much will your movie collect at the box office? 27 00:02:19,620 --> 00:02:26,220 Most of the variables in the database are quantitative, but there are two variables which have quantitatively 28 00:02:26,240 --> 00:02:26,850 dolto. 29 00:02:28,830 --> 00:02:32,490 This 3D available column has only. 30 00:02:32,550 --> 00:02:33,810 Yes, no time values. 31 00:02:35,490 --> 00:02:40,000 And this Johna column has four categories. 32 00:02:40,020 --> 00:02:43,680 That is thriller, drama, comedy and action. 33 00:02:45,300 --> 00:02:48,060 So these two are categorical variables and order. 34 00:02:48,180 --> 00:02:49,980 All of that are quantitative variables. 35 00:02:51,390 --> 00:02:57,760 In the next video, we will see how to import this data into our software so that we can use it to create 36 00:02:57,760 --> 00:02:58,200 our model.