1
00:00:02,560 --> 00:00:03,280
Hello, everyone.

2
00:00:03,820 --> 00:00:10,600
In this video, we will learn how to split our available data and do best and green site.

3
00:00:13,680 --> 00:00:23,590
Then we will train our more than on training set and we'll find Artist Square on our test site to create

4
00:00:23,590 --> 00:00:24,660
a screen is split.

5
00:00:24,950 --> 00:00:27,360
First, we need to import the function.

6
00:00:27,440 --> 00:00:29,580
That string is split from a skillern.

7
00:00:30,290 --> 00:00:34,080
So we'll write from a scalar Todmorden selection.

8
00:00:43,660 --> 00:00:44,440
Import.

9
00:00:48,050 --> 00:00:49,610
Green essed split.

10
00:00:58,490 --> 00:00:59,900
Not to use this function.

11
00:01:00,740 --> 00:01:04,220
We first need to define what, four variables.

12
00:01:04,490 --> 00:01:07,040
That is our independent green variable.

13
00:01:07,490 --> 00:01:15,900
A word independent, best variable, the number dependent green variable and then dependent variable.

14
00:01:17,030 --> 00:01:19,760
So we cleared this forward variable.

15
00:01:20,300 --> 00:01:24,220
Will write X underscored crème.

16
00:01:28,110 --> 00:01:30,060
Will rate X and the school best.

17
00:01:34,530 --> 00:01:35,730
And best school crane.

18
00:01:38,900 --> 00:01:40,460
And why underscore best?

19
00:01:41,970 --> 00:01:51,330
These are our four day doorframes, and this will get the value of the output of our grain test, split

20
00:01:51,330 --> 00:01:55,410
functions will right grain underscore test.

21
00:02:03,080 --> 00:02:09,230
And and record will first mention the word independent variable, which is x multi.

22
00:02:13,460 --> 00:02:20,420
Then our dependent variable, which is why mighty few remember we cleared this very well, no last lecture.

23
00:02:22,190 --> 00:02:25,070
Then the next parameter is first size.

24
00:02:27,330 --> 00:02:34,680
As mentioned in our two lecture, we are splitting our data and it between data issue.

25
00:02:34,980 --> 00:02:41,620
So what, a two percent of data will go and do training set and or 20 percent of data will go in to

26
00:02:41,620 --> 00:02:42,100
test site.

27
00:02:42,900 --> 00:02:45,960
That's where here we will mention zero point two.

28
00:02:47,830 --> 00:02:53,020
This zero point two means 20 percent of data will go into S.A.S..

29
00:02:55,470 --> 00:02:58,360
And the last parameter is randomizer.

30
00:02:58,800 --> 00:03:01,950
There's the random number you can give any integer value.

31
00:03:03,140 --> 00:03:08,100
We are providing this number just to get the same sample every time.

32
00:03:08,620 --> 00:03:14,580
So if I mentioned random, say, Dick, we do one every time while running this string site.

33
00:03:15,980 --> 00:03:20,020
Every time my training and test site will remain the same.

34
00:03:20,830 --> 00:03:28,180
So even if you are using the same random instead as we are using, you will also get the same training

35
00:03:28,180 --> 00:03:28,530
site.

36
00:03:29,530 --> 00:03:34,180
So we'll just mention random estate, this Siedel.

37
00:03:35,880 --> 00:03:41,100
If you're on Bilger, the same training set, you should also mention random as soon as zero.

38
00:03:44,080 --> 00:03:49,420
Now, just to check the number of rows and columns in our training and tests, it will, right, Brent?

39
00:03:51,700 --> 00:03:52,120
Ex.

40
00:03:53,280 --> 00:03:57,120
Underscore green dot ship, dot ship.

41
00:03:57,350 --> 00:04:00,480
Will you give me the number of Lauzen columns?

42
00:04:20,520 --> 00:04:24,830
Now you can see it, three percent of our data is ingraining set.

43
00:04:25,770 --> 00:04:33,210
So four hundred and four observations are in our training set and rest of one zero two observations

44
00:04:33,480 --> 00:04:34,490
are an asset.

45
00:04:39,860 --> 00:04:46,660
No, we will follow the standard process of creating a linear regression model will first create an

46
00:04:46,660 --> 00:04:47,170
object.

47
00:04:48,220 --> 00:04:51,160
This time we will name it LMA underscore a.

48
00:04:56,660 --> 00:04:58,670
And will equate it only needed.

49
00:05:10,050 --> 00:05:17,950
Now we will train our model from our training set that is extreme and by train, right?

50
00:05:21,030 --> 00:05:29,660
Lemon scored a dog fit, then we will mention our training set, which is X under school train and went

51
00:05:29,660 --> 00:05:30,330
to school in.

52
00:05:40,060 --> 00:05:43,900
This statement will fit a lot more than on our training site.

53
00:05:46,620 --> 00:05:51,810
Now, let's clear the predicted value of Y using this model.

54
00:05:53,220 --> 00:05:53,950
So we will, right.

55
00:05:54,240 --> 00:05:55,710
Why underscore test?

56
00:06:00,000 --> 00:06:01,050
Underscored a.

57
00:06:03,560 --> 00:06:07,240
Equal to L.M. a dot.

58
00:06:07,280 --> 00:06:07,760
Predict.

59
00:06:12,730 --> 00:06:19,360
Here I am predicting way that's dependent variable, so I will give my best independent variable.

60
00:06:20,140 --> 00:06:20,860
So I will rate.

61
00:06:23,840 --> 00:06:25,050
X underscored test.

62
00:06:29,000 --> 00:06:29,800
But on this.

63
00:06:33,460 --> 00:06:40,270
I have my predicted value of test set, why underscore best underscore a similarly we will create by

64
00:06:40,300 --> 00:06:44,260
underscore Crane underscore a deal that the predicted will use.

65
00:06:45,370 --> 00:06:46,450
Of Overclaiming said.

66
00:06:55,850 --> 00:06:58,850
This time, we will use X and a school crane.

67
00:07:06,080 --> 00:07:11,240
Now to check the artist square value for our training and test data.

68
00:07:12,080 --> 00:07:13,850
We will import another function.

69
00:07:17,260 --> 00:07:21,010
Well, write from a skill under my tricks import.

70
00:07:31,030 --> 00:07:32,500
Import, I do.

71
00:07:32,600 --> 00:07:33,640
And let's go to Scott.

72
00:07:37,800 --> 00:07:44,850
You don't have to learn on this in Texas, you can just save a copy of this notebook or you can search

73
00:07:45,450 --> 00:07:45,990
online.

74
00:07:46,090 --> 00:07:48,420
The Syntex are readily available online.

75
00:07:51,140 --> 00:07:57,470
Now, to get the artist squirrelly, we just need to mention art to underscored his score.

76
00:07:58,130 --> 00:07:59,950
This is a function we imported.

77
00:08:02,640 --> 00:08:11,080
If we want to get more detail about dysfunction, we can just certitude and help by using Questionmark

78
00:08:11,080 --> 00:08:11,520
operator.

79
00:08:12,020 --> 00:08:13,670
You can just write Questionmark.

80
00:08:14,050 --> 00:08:14,820
And if we.

81
00:08:17,540 --> 00:08:19,210
We will get all the details.

82
00:08:19,690 --> 00:08:21,730
So here, if you see.

83
00:08:23,230 --> 00:08:24,250
In this index.

84
00:08:25,660 --> 00:08:32,770
We need to mention our why underscore grue, which is the reason why we lose, then we need to mention

85
00:08:32,890 --> 00:08:34,390
the way predicted values.

86
00:08:35,360 --> 00:08:38,200
Would he do the same and just close this?

87
00:08:40,350 --> 00:08:42,740
Right out to underscore school to school.

88
00:08:46,080 --> 00:08:47,720
And then record will first rate.

89
00:08:48,840 --> 00:08:50,160
Why underscore?

90
00:08:54,410 --> 00:08:54,690
This.

91
00:09:00,110 --> 00:09:06,440
So why underscore test is our original value and why underscore tests underscore A is the predicted

92
00:09:06,440 --> 00:09:07,420
value of 581.

93
00:09:07,970 --> 00:09:09,290
So we'll just run this.

94
00:09:12,690 --> 00:09:15,810
So the artist's good value is zero point five for.

95
00:09:17,240 --> 00:09:22,070
Now, let's get the artist good value for our training, said.

96
00:09:35,120 --> 00:09:39,980
We on this artist, good value for our training set is zero point seven five.

97
00:09:41,720 --> 00:09:48,950
You can also see that the artist's good value for our test data is less than the artist's good value

98
00:09:48,950 --> 00:09:49,850
for our training, said.

99
00:09:52,010 --> 00:09:59,690
As we discuss in the world to be lectures that test artists good value is of more importance as compared

100
00:09:59,690 --> 00:10:00,560
to training said.

101
00:10:01,930 --> 00:10:09,550
And we should always look at our test score instead of training discrete values to evaluate the performance

102
00:10:09,610 --> 00:10:10,330
of our modern.

103
00:10:13,020 --> 00:10:17,850
That so you split your due time to test Trent Green in Biton.