1
00:00:00,750 --> 00:00:06,840
Now, usually we do not cream our model on all the available data.

2
00:00:07,710 --> 00:00:12,650
We take a small portion of our available data to test.

3
00:00:12,680 --> 00:00:13,410
So what are more than.

4
00:00:15,500 --> 00:00:22,850
This is small does it will help us to estimate how our model will perform on the real life data?

5
00:00:23,690 --> 00:00:27,010
The data, which is not used to screen our more than.

6
00:00:29,160 --> 00:00:37,940
So in practice, we generally use 80 percent of our available data and we take 20 percent of our available

7
00:00:37,950 --> 00:00:39,900
data as testing data.

8
00:00:41,060 --> 00:00:48,470
The only test so more than on their data and compare the performance of our different models on that

9
00:00:48,510 --> 00:00:53,100
test data to evaluate our models on real live data.

10
00:00:54,600 --> 00:00:57,820
So we have five hundred and six rows.

11
00:00:58,440 --> 00:01:00,030
That is our available data.

12
00:01:01,300 --> 00:01:09,250
We will keep 20 percent of this data as our test data and we will train our model on the 80 percent

13
00:01:09,260 --> 00:01:10,040
of this data.

14
00:01:11,030 --> 00:01:17,210
This segregating of data and two best and Krien part is known as the screen split.

15
00:01:19,050 --> 00:01:27,480
And it is very easy to perform test strength is split using a Skillern will force import train underscored

16
00:01:27,480 --> 00:01:31,260
Besse underscored split from Escalon more than selection.

17
00:01:35,870 --> 00:01:40,240
Now, this train underscored tests, underscored split matter.

18
00:01:41,270 --> 00:01:50,000
Take this barometer's, which is what X will do, our Y value the size of our test data.

19
00:01:50,660 --> 00:01:57,230
Since I have told you that you usually get 20 percent of data as our test data, so you can provide

20
00:01:57,230 --> 00:01:58,310
zero point two here.

21
00:01:59,450 --> 00:02:03,020
Then there is another parameter that is random ESADE.

22
00:02:04,570 --> 00:02:12,730
Since we are randomly assigning our data and to test and train to get the same test data every time

23
00:02:12,940 --> 00:02:18,940
so that we can compare performance of our model, we can use this and demonstrate variable.

24
00:02:19,600 --> 00:02:21,190
This is just a random number.

25
00:02:21,310 --> 00:02:26,320
You can take zero one or any other value you want.

26
00:02:28,460 --> 00:02:31,310
The advantage of using this round number suit is.

27
00:02:32,950 --> 00:02:40,270
If I keep this right, no mistake concern throughout my analysis, I will get the exact same split off

28
00:02:40,350 --> 00:02:41,050
as screen.

29
00:02:42,290 --> 00:02:49,130
So even if you are running, your testing is played with three a.m. to zero, you will get that same

30
00:02:49,130 --> 00:02:50,750
test strain split as me.

31
00:02:51,290 --> 00:02:59,130
For example, you have 10 rows and suppose you are third and fourth rows are going in to test.

32
00:02:59,840 --> 00:03:03,500
And rest of the eight groups are going in to train data dataset.

33
00:03:04,600 --> 00:03:13,490
If you keep that random is constant, you will always get your third and fourth value as best set and

34
00:03:13,490 --> 00:03:15,700
dress of values in your train set.

35
00:03:17,560 --> 00:03:25,600
And this will help us to compare the performance of our model across different methods and to keep the

36
00:03:25,750 --> 00:03:28,420
output of our model always concerned.

37
00:03:30,760 --> 00:03:34,590
So always stick to a single value of no.

38
00:03:34,950 --> 00:03:36,540
Don't change this value.

39
00:03:37,390 --> 00:03:42,610
Stick to the number of your choice if you want to get the same split as me.

40
00:03:43,480 --> 00:03:45,700
Select random insert equal to zero.

41
00:03:47,660 --> 00:03:51,590
Now we get four outputs from this function.

42
00:03:52,940 --> 00:03:55,700
The first output should be more extreme.

43
00:03:56,450 --> 00:03:59,960
So I have named my variable as X underscore crane.

44
00:04:00,740 --> 00:04:04,190
The second output is your best X data.

45
00:04:05,000 --> 00:04:07,200
I have named it X and School Crane.

46
00:04:07,970 --> 00:04:11,690
Then we have Y of school Crane and VI underscore test.

47
00:04:12,560 --> 00:04:14,210
So if I run this.

48
00:04:17,880 --> 00:04:19,590
I have four more variables.

49
00:04:20,790 --> 00:04:22,050
I can check the head.

50
00:04:22,220 --> 00:04:23,360
Awful lot extreme.

51
00:04:23,850 --> 00:04:25,890
And the shape of the extreme due to.

52
00:04:35,980 --> 00:04:41,960
You can see December looks exactly same as the sample from our X date frame.

53
00:04:42,880 --> 00:04:44,530
We don't have any way variable here.

54
00:04:44,560 --> 00:04:47,140
We only have X.

55
00:04:47,920 --> 00:04:51,850
And one thing to notice here are this indexes.

56
00:04:52,750 --> 00:04:59,680
You can see our indexes are sharp for long since some of the rules are going in to screen test data

57
00:04:59,830 --> 00:05:08,140
and some of the observations are coming into our external data, we can check the shape of our extreme.

58
00:05:11,060 --> 00:05:19,250
This should contain 80 percent of the values or for all the values, that is 506 and 2.8, which comes

59
00:05:19,250 --> 00:05:20,770
out to be four hundred and four.

60
00:05:20,930 --> 00:05:27,860
So we have four hundred and four values in our extreme and we have a test of the values and our X test.

61
00:05:35,110 --> 00:05:40,620
So a hundred and two values and test and 400 Ford values in Crane total.

62
00:05:40,810 --> 00:05:47,950
We have 506 observation will use this extreme to create our model and we will use X test.

63
00:05:49,220 --> 00:05:51,170
To evaluate performance of water, Martin.

64
00:05:53,200 --> 00:05:57,250
Similarly, you can check the shape of future via Train and Vytas.

65
00:05:58,550 --> 00:06:04,040
The normal observation should be seen as extreme and expressed, respectively.