1 00:00:00,840 --> 00:00:08,120 In this lecture, we will learn how to split our dataset into two parts test and train data set. 2 00:00:08,910 --> 00:00:16,050 Then we will train our logistic regression model on training dataset and we will evaluate the performance 3 00:00:16,050 --> 00:00:18,590 of that model using the test dataset. 4 00:00:20,010 --> 00:00:26,910 I have already written all of the codes, my data going new to write all of these quotes on your own 5 00:00:27,010 --> 00:00:27,990 while you are practicing. 6 00:00:30,050 --> 00:00:33,110 First, to split our data into test and. 7 00:00:34,010 --> 00:00:42,470 We need this function green test split and you can import this function from Escalon, not model selection. 8 00:00:45,790 --> 00:00:50,380 The output of this function is in the form of four variables. 9 00:00:51,730 --> 00:00:55,050 First variable should be your independent green data. 10 00:00:56,170 --> 00:01:04,400 Second variable should be you're independent based data and then you are dependent on data and dependent 11 00:01:04,400 --> 00:01:09,040 test data that are for argument for this function. 12 00:01:09,580 --> 00:01:13,990 The first one should be the independent variables from your or is no other doorframe. 13 00:01:14,890 --> 00:01:16,900 We have already created X variable. 14 00:01:17,140 --> 00:01:18,630 Therefore, we are using X. 15 00:01:19,660 --> 00:01:23,260 The second argument here is the dependent variable. 16 00:01:23,920 --> 00:01:25,330 We have already created Y. 17 00:01:25,690 --> 00:01:27,170 That's why we are using Y. 18 00:01:28,480 --> 00:01:31,270 Then the next argument is further test size. 19 00:01:32,110 --> 00:01:40,450 So as we discuss in our two Reflektor ideal split for this strain, is it being 20 percent or two percent 20 00:01:40,450 --> 00:01:43,540 for Crean and 20 percent for test? 21 00:01:44,920 --> 00:01:50,950 And here you can mention the portion of your whole dataset you want in your test data. 22 00:01:51,730 --> 00:01:56,660 Since we won 20 percent of test data, therefore we will read zero point two. 23 00:01:57,460 --> 00:02:04,750 If you want to do percent of tests today, you can read zero point three and sort of zero point to the 24 00:02:04,750 --> 00:02:06,960 next argument is for random assert. 25 00:02:08,290 --> 00:02:10,360 Here you can provide any value. 26 00:02:10,720 --> 00:02:18,450 But if you grow eight zero, you will get the same desert as I'm getting from this randomly split. 27 00:02:19,840 --> 00:02:26,480 If you write one or any other value, this function will work exactly the same. 28 00:02:28,450 --> 00:02:35,260 But the only difference will be that you will not get the exact same result as me because you are random 29 00:02:35,260 --> 00:02:37,690 numbers will be different from my random numbers. 30 00:02:39,720 --> 00:02:47,550 So if you want to get the same split every time, you'll have to stick to only one random state. 31 00:02:47,630 --> 00:02:50,370 No, I'm using zero. 32 00:02:50,550 --> 00:02:53,640 And for all future test this state, I will use zero 33 00:02:56,200 --> 00:02:59,220 not initially in our X and Y variable. 34 00:02:59,580 --> 00:03:01,590 There were five hundred and six rows. 35 00:03:02,250 --> 00:03:08,760 Now let's look at the shape of extreme expressed by train and by test. 36 00:03:14,010 --> 00:03:19,840 If you are using updated version of Biton, you need to put parenthesis with the print. 37 00:03:19,870 --> 00:03:20,220 Come on. 38 00:03:22,240 --> 00:03:28,760 I feed on this, you can see that 80 percent of data is an extreme. 39 00:03:29,030 --> 00:03:38,330 And the remaining 20 percent is in its best 404 observation are an hour extended to and under. 40 00:03:38,330 --> 00:03:40,040 And two observations are in order. 41 00:03:40,220 --> 00:03:41,060 Experts data. 42 00:03:44,960 --> 00:03:48,960 Now, let's fit a logistic regression model on our training dataset. 43 00:03:49,970 --> 00:03:51,200 We'll follow the same step. 44 00:03:51,530 --> 00:03:53,330 First, we will create the object. 45 00:03:53,660 --> 00:04:01,500 Seacliff underscore a lot and then we will fit our extreme and white dream in this object. 46 00:04:06,750 --> 00:04:08,010 We have fitted our model. 47 00:04:08,540 --> 00:04:12,570 Now let's predict the value on what X test does say. 48 00:04:16,590 --> 00:04:23,130 You can notice here I am using expressed and sort of extreme because I want to credit the values on 49 00:04:23,130 --> 00:04:24,160 my test data set. 50 00:04:25,710 --> 00:04:33,720 Now let's look at the accuracy and the confusion metrics of this model to get the accuracy score and 51 00:04:33,720 --> 00:04:34,710 confusion metrics. 52 00:04:34,710 --> 00:04:39,040 We will import these, too, from Escalon metrics. 53 00:04:40,420 --> 00:04:47,090 We will run it, then will clear the confusion metrics for our via test and why test predict. 54 00:04:47,490 --> 00:04:50,250 These are the actual values of our white test. 55 00:04:50,370 --> 00:04:52,910 And these are the predicted value of a lot of test. 56 00:04:54,890 --> 00:04:56,550 Here are what rules represent. 57 00:04:56,550 --> 00:05:02,130 The actual class and the column represent the predicted class. 58 00:05:03,730 --> 00:05:09,240 So for the statistics, no, we are seeing that the actual value was not sold. 59 00:05:10,090 --> 00:05:14,430 And the predicted value was also not sold for this 36 records. 60 00:05:15,940 --> 00:05:18,640 If you remember, these are known as crude negatives. 61 00:05:19,420 --> 00:05:21,700 Similarly here, our actual value is one. 62 00:05:22,360 --> 00:05:24,250 But the predicted value is zero. 63 00:05:25,150 --> 00:05:27,640 These are known as false negatives. 64 00:05:28,550 --> 00:05:33,520 And for this 22, the actual value was zero and the predicted value was one. 65 00:05:34,240 --> 00:05:36,100 These are known as false positives. 66 00:05:36,840 --> 00:05:38,350 So when the ladies are positive. 67 00:05:41,130 --> 00:05:45,900 Now, the accuracy score is the percentage of observations. 68 00:05:46,050 --> 00:05:54,300 My model is able to predict correctly, so the accuracy score will be the sum of 36 plus 31 divided 69 00:05:54,300 --> 00:05:56,610 by the total number of observations. 70 00:05:58,020 --> 00:06:03,660 You can do that manually also, but there is a small function to calculate it phonetically. 71 00:06:07,230 --> 00:06:12,270 The accuracy of forward model on our test dataset is zero point six five. 72 00:06:12,510 --> 00:06:15,040 It's not great, but it is good enough. 73 00:06:17,200 --> 00:06:24,670 And it is always coming day to use your test data set to create the confusion metrics and to calculate 74 00:06:24,670 --> 00:06:25,930 the accuracy score. 75 00:06:27,520 --> 00:06:34,330 Now I have calculated this accuracy score and confusion metrics for logistic regression for a. 76 00:06:34,630 --> 00:06:41,680 You can grade on your own and you can also compare the accuracy score between the two models and the 77 00:06:41,680 --> 00:06:42,250 next video. 78 00:06:42,280 --> 00:06:45,640 We will learn how to create game and model and Biton.