1
00:00:01,290 --> 00:00:05,850
So here is the data which we are going to use to build our regression tree.

2
00:00:08,010 --> 00:00:09,480
Let me tell you something about this data.

3
00:00:10,750 --> 00:00:13,510
In the columns, we have all the variables.

4
00:00:14,860 --> 00:00:20,870
So for this data, we have 18 columns, which means we have 18 different variables.

5
00:00:22,920 --> 00:00:25,410
The last column of this data is collection.

6
00:00:26,700 --> 00:00:33,360
This is a dependent, but even that is this is the variable that we want to predict values for.

7
00:00:35,210 --> 00:00:40,850
And we'll be using the other 17 variables, which we will also call as predictive variables.

8
00:00:41,870 --> 00:00:44,030
To the value of collection.

9
00:00:46,330 --> 00:00:52,630
So as you can guess, from looking at the variable names, this is a date of movies.

10
00:00:54,430 --> 00:00:55,540
This is a simulated data.

11
00:00:55,600 --> 00:00:57,280
That is this is not true.

12
00:00:57,500 --> 00:00:58,540
They dolf movies.

13
00:01:00,600 --> 00:01:04,360
And this dataset, we have variables like how much?

14
00:01:05,390 --> 00:01:08,480
Was the marketing expense during making of the movie?

15
00:01:08,930 --> 00:01:10,790
How much was the production expenses?

16
00:01:12,590 --> 00:01:14,540
How many multiplexes were covered?

17
00:01:15,710 --> 00:01:17,810
What does the budget of the movie and so on?

18
00:01:21,540 --> 00:01:26,290
And we have this data for five hundred six different movies.

19
00:01:26,970 --> 00:01:30,810
So in this data table, we have 506 observations.

20
00:01:31,410 --> 00:01:34,040
The observations are in the rules.

21
00:01:34,680 --> 00:01:40,650
So if you look at the number of rows, we have 507, which includes de ADIRU.

22
00:01:44,410 --> 00:01:52,940
So using this date of 506 movies in which we already have the data of these predictors, 17 variables

23
00:01:53,540 --> 00:02:00,920
and the data of how much those movies actually collected, we will be creating a model that will help

24
00:02:00,920 --> 00:02:06,530
us predictive value of collection, given the values of other 17 variables.

25
00:02:07,400 --> 00:02:14,150
That is, if you are creating a new movie and you have the values of all these 17 variables, you can

26
00:02:14,150 --> 00:02:17,840
predict how much will your movie collect at the box office?

27
00:02:19,620 --> 00:02:26,220
Most of the variables in the database are quantitative, but there are two variables which have quantitatively

28
00:02:26,240 --> 00:02:26,850
dolto.

29
00:02:28,830 --> 00:02:32,490
This 3D available column has only.

30
00:02:32,550 --> 00:02:33,810
Yes, no time values.

31
00:02:35,490 --> 00:02:40,000
And this Johna column has four categories.

32
00:02:40,020 --> 00:02:43,680
That is thriller, drama, comedy and action.

33
00:02:45,300 --> 00:02:48,060
So these two are categorical variables and order.

34
00:02:48,180 --> 00:02:49,980
All of that are quantitative variables.

35
00:02:51,390 --> 00:02:57,760
In the next video, we will see how to import this data into our software so that we can use it to create

36
00:02:57,760 --> 00:02:58,200
our model.