1 00:00:02,560 --> 00:00:03,280 Hello, everyone. 2 00:00:03,820 --> 00:00:10,600 In this video, we will learn how to split our available data and do best and green site. 3 00:00:13,680 --> 00:00:23,590 Then we will train our more than on training set and we'll find Artist Square on our test site to create 4 00:00:23,590 --> 00:00:24,660 a screen is split. 5 00:00:24,950 --> 00:00:27,360 First, we need to import the function. 6 00:00:27,440 --> 00:00:29,580 That string is split from a skillern. 7 00:00:30,290 --> 00:00:34,080 So we'll write from a scalar Todmorden selection. 8 00:00:43,660 --> 00:00:44,440 Import. 9 00:00:48,050 --> 00:00:49,610 Green essed split. 10 00:00:58,490 --> 00:00:59,900 Not to use this function. 11 00:01:00,740 --> 00:01:04,220 We first need to define what, four variables. 12 00:01:04,490 --> 00:01:07,040 That is our independent green variable. 13 00:01:07,490 --> 00:01:15,900 A word independent, best variable, the number dependent green variable and then dependent variable. 14 00:01:17,030 --> 00:01:19,760 So we cleared this forward variable. 15 00:01:20,300 --> 00:01:24,220 Will write X underscored crème. 16 00:01:28,110 --> 00:01:30,060 Will rate X and the school best. 17 00:01:34,530 --> 00:01:35,730 And best school crane. 18 00:01:38,900 --> 00:01:40,460 And why underscore best? 19 00:01:41,970 --> 00:01:51,330 These are our four day doorframes, and this will get the value of the output of our grain test, split 20 00:01:51,330 --> 00:01:55,410 functions will right grain underscore test. 21 00:02:03,080 --> 00:02:09,230 And and record will first mention the word independent variable, which is x multi. 22 00:02:13,460 --> 00:02:20,420 Then our dependent variable, which is why mighty few remember we cleared this very well, no last lecture. 23 00:02:22,190 --> 00:02:25,070 Then the next parameter is first size. 24 00:02:27,330 --> 00:02:34,680 As mentioned in our two lecture, we are splitting our data and it between data issue. 25 00:02:34,980 --> 00:02:41,620 So what, a two percent of data will go and do training set and or 20 percent of data will go in to 26 00:02:41,620 --> 00:02:42,100 test site. 27 00:02:42,900 --> 00:02:45,960 That's where here we will mention zero point two. 28 00:02:47,830 --> 00:02:53,020 This zero point two means 20 percent of data will go into S.A.S.. 29 00:02:55,470 --> 00:02:58,360 And the last parameter is randomizer. 30 00:02:58,800 --> 00:03:01,950 There's the random number you can give any integer value. 31 00:03:03,140 --> 00:03:08,100 We are providing this number just to get the same sample every time. 32 00:03:08,620 --> 00:03:14,580 So if I mentioned random, say, Dick, we do one every time while running this string site. 33 00:03:15,980 --> 00:03:20,020 Every time my training and test site will remain the same. 34 00:03:20,830 --> 00:03:28,180 So even if you are using the same random instead as we are using, you will also get the same training 35 00:03:28,180 --> 00:03:28,530 site. 36 00:03:29,530 --> 00:03:34,180 So we'll just mention random estate, this Siedel. 37 00:03:35,880 --> 00:03:41,100 If you're on Bilger, the same training set, you should also mention random as soon as zero. 38 00:03:44,080 --> 00:03:49,420 Now, just to check the number of rows and columns in our training and tests, it will, right, Brent? 39 00:03:51,700 --> 00:03:52,120 Ex. 40 00:03:53,280 --> 00:03:57,120 Underscore green dot ship, dot ship. 41 00:03:57,350 --> 00:04:00,480 Will you give me the number of Lauzen columns? 42 00:04:20,520 --> 00:04:24,830 Now you can see it, three percent of our data is ingraining set. 43 00:04:25,770 --> 00:04:33,210 So four hundred and four observations are in our training set and rest of one zero two observations 44 00:04:33,480 --> 00:04:34,490 are an asset. 45 00:04:39,860 --> 00:04:46,660 No, we will follow the standard process of creating a linear regression model will first create an 46 00:04:46,660 --> 00:04:47,170 object. 47 00:04:48,220 --> 00:04:51,160 This time we will name it LMA underscore a. 48 00:04:56,660 --> 00:04:58,670 And will equate it only needed. 49 00:05:10,050 --> 00:05:17,950 Now we will train our model from our training set that is extreme and by train, right? 50 00:05:21,030 --> 00:05:29,660 Lemon scored a dog fit, then we will mention our training set, which is X under school train and went 51 00:05:29,660 --> 00:05:30,330 to school in. 52 00:05:40,060 --> 00:05:43,900 This statement will fit a lot more than on our training site. 53 00:05:46,620 --> 00:05:51,810 Now, let's clear the predicted value of Y using this model. 54 00:05:53,220 --> 00:05:53,950 So we will, right. 55 00:05:54,240 --> 00:05:55,710 Why underscore test? 56 00:06:00,000 --> 00:06:01,050 Underscored a. 57 00:06:03,560 --> 00:06:07,240 Equal to L.M. a dot. 58 00:06:07,280 --> 00:06:07,760 Predict. 59 00:06:12,730 --> 00:06:19,360 Here I am predicting way that's dependent variable, so I will give my best independent variable. 60 00:06:20,140 --> 00:06:20,860 So I will rate. 61 00:06:23,840 --> 00:06:25,050 X underscored test. 62 00:06:29,000 --> 00:06:29,800 But on this. 63 00:06:33,460 --> 00:06:40,270 I have my predicted value of test set, why underscore best underscore a similarly we will create by 64 00:06:40,300 --> 00:06:44,260 underscore Crane underscore a deal that the predicted will use. 65 00:06:45,370 --> 00:06:46,450 Of Overclaiming said. 66 00:06:55,850 --> 00:06:58,850 This time, we will use X and a school crane. 67 00:07:06,080 --> 00:07:11,240 Now to check the artist square value for our training and test data. 68 00:07:12,080 --> 00:07:13,850 We will import another function. 69 00:07:17,260 --> 00:07:21,010 Well, write from a skill under my tricks import. 70 00:07:31,030 --> 00:07:32,500 Import, I do. 71 00:07:32,600 --> 00:07:33,640 And let's go to Scott. 72 00:07:37,800 --> 00:07:44,850 You don't have to learn on this in Texas, you can just save a copy of this notebook or you can search 73 00:07:45,450 --> 00:07:45,990 online. 74 00:07:46,090 --> 00:07:48,420 The Syntex are readily available online. 75 00:07:51,140 --> 00:07:57,470 Now, to get the artist squirrelly, we just need to mention art to underscored his score. 76 00:07:58,130 --> 00:07:59,950 This is a function we imported. 77 00:08:02,640 --> 00:08:11,080 If we want to get more detail about dysfunction, we can just certitude and help by using Questionmark 78 00:08:11,080 --> 00:08:11,520 operator. 79 00:08:12,020 --> 00:08:13,670 You can just write Questionmark. 80 00:08:14,050 --> 00:08:14,820 And if we. 81 00:08:17,540 --> 00:08:19,210 We will get all the details. 82 00:08:19,690 --> 00:08:21,730 So here, if you see. 83 00:08:23,230 --> 00:08:24,250 In this index. 84 00:08:25,660 --> 00:08:32,770 We need to mention our why underscore grue, which is the reason why we lose, then we need to mention 85 00:08:32,890 --> 00:08:34,390 the way predicted values. 86 00:08:35,360 --> 00:08:38,200 Would he do the same and just close this? 87 00:08:40,350 --> 00:08:42,740 Right out to underscore school to school. 88 00:08:46,080 --> 00:08:47,720 And then record will first rate. 89 00:08:48,840 --> 00:08:50,160 Why underscore? 90 00:08:54,410 --> 00:08:54,690 This. 91 00:09:00,110 --> 00:09:06,440 So why underscore test is our original value and why underscore tests underscore A is the predicted 92 00:09:06,440 --> 00:09:07,420 value of 581. 93 00:09:07,970 --> 00:09:09,290 So we'll just run this. 94 00:09:12,690 --> 00:09:15,810 So the artist's good value is zero point five for. 95 00:09:17,240 --> 00:09:22,070 Now, let's get the artist good value for our training, said. 96 00:09:35,120 --> 00:09:39,980 We on this artist, good value for our training set is zero point seven five. 97 00:09:41,720 --> 00:09:48,950 You can also see that the artist's good value for our test data is less than the artist's good value 98 00:09:48,950 --> 00:09:49,850 for our training, said. 99 00:09:52,010 --> 00:09:59,690 As we discuss in the world to be lectures that test artists good value is of more importance as compared 100 00:09:59,690 --> 00:10:00,560 to training said. 101 00:10:01,930 --> 00:10:09,550 And we should always look at our test score instead of training discrete values to evaluate the performance 102 00:10:09,610 --> 00:10:10,330 of our modern. 103 00:10:13,020 --> 00:10:17,850 That so you split your due time to test Trent Green in Biton.