1 00:00:01,520 --> 00:00:08,300 Next, important thing we need to know is how to handle the categorical variables in a set of independent 2 00:00:08,300 --> 00:00:08,840 variables. 3 00:00:10,450 --> 00:00:16,240 If you remember it, in our data dataset, we have variables like IT board, which has values. 4 00:00:16,420 --> 00:00:17,170 Yes and no. 5 00:00:18,070 --> 00:00:22,540 And there is a variable called Water Body, which has values such as lake. 6 00:00:22,720 --> 00:00:23,800 There were several. 7 00:00:25,410 --> 00:00:27,780 No immigration model works on real Limbert's. 8 00:00:28,980 --> 00:00:33,110 It really don't understand a nonmedical evil such as airport. 9 00:00:35,260 --> 00:00:36,550 Oh, can we handle this problem? 10 00:00:38,860 --> 00:00:44,770 It is simple, we just need to assign a numerical value to each of the categories of that variable. 11 00:00:46,820 --> 00:00:49,960 If the category names are numbers, the creation model can run. 12 00:00:51,540 --> 00:00:55,980 But we cannot assign any number to the categories as this will impact the modern. 13 00:00:57,210 --> 00:00:59,130 We need to follow proper process. 14 00:01:00,310 --> 00:01:08,200 So this process of properly creating a dummy variable, which will represent an actual variable containing 15 00:01:08,200 --> 00:01:11,680 non numerical values, is called dummy variable creation. 16 00:01:13,890 --> 00:01:18,420 The way we do it, because regression analysis cannot handle it, non numeric data. 17 00:01:19,720 --> 00:01:20,520 And how we do it. 18 00:01:21,250 --> 00:01:24,630 We first create a new variable or a set of variables. 19 00:01:24,900 --> 00:01:26,250 That is the dummy variables. 20 00:01:27,570 --> 00:01:33,790 Then these new variables are given values of either zero or one and not anything else. 21 00:01:36,060 --> 00:01:36,600 But why? 22 00:01:37,140 --> 00:01:40,860 Why not we assign values like one, two and three or tetra. 23 00:01:44,250 --> 00:01:47,410 Let's just look at the categories in this example for classification. 24 00:01:49,340 --> 00:01:54,170 On the left, I have students and their favorite subject on the right. 25 00:01:54,260 --> 00:01:55,580 I'm calling this object. 26 00:01:56,640 --> 00:02:02,040 The first step of dummy variable creation is creating dummy variables. 27 00:02:04,230 --> 00:02:05,270 All many dummy variables. 28 00:02:05,290 --> 00:02:12,130 Do we need to remember this, the number of dummy variables required to replace every one is number 29 00:02:12,130 --> 00:02:13,340 of categories in that video. 30 00:02:13,390 --> 00:02:14,500 One minus one. 31 00:02:16,060 --> 00:02:20,470 That is, if there are two categories, we need to minus one, dummy variables. 32 00:02:20,500 --> 00:02:22,150 That is only one dummy variable. 33 00:02:22,840 --> 00:02:27,010 If we have three categories, like an example, we need three minus one. 34 00:02:27,190 --> 00:02:28,540 That is two dummy variables. 35 00:02:30,250 --> 00:02:33,610 So I have to resort to science, English and Max. 36 00:02:34,990 --> 00:02:40,120 So I have created two dummy variables, one for science and one for Max. 37 00:02:41,140 --> 00:02:42,920 Science has value one. 38 00:02:43,300 --> 00:02:46,390 If the subject is science and it has value Z2. 39 00:02:47,750 --> 00:02:51,210 So if you look at the first two, it has scientists one. 40 00:02:52,300 --> 00:02:54,820 And the other two were these objectives, English and math. 41 00:02:55,060 --> 00:02:56,100 It has value zero. 42 00:02:57,750 --> 00:02:59,010 Matt has value one. 43 00:02:59,760 --> 00:03:03,130 If this object as bad as it has value, do so. 44 00:03:03,220 --> 00:03:09,240 The fourth rule as one in the match column and all the other rules as Z2. 45 00:03:11,370 --> 00:03:14,730 This means if voters do you, then this object is English. 46 00:03:16,380 --> 00:03:21,480 So the third rule as both zero, which is representing the third subject. 47 00:03:24,150 --> 00:03:28,110 Really, these zeros and ones are representing each category. 48 00:03:29,170 --> 00:03:29,850 I'll be favorite. 49 00:03:29,960 --> 00:03:30,570 Plus, even. 50 00:03:33,990 --> 00:03:39,810 Now, to answer the question why we created two dummy variables, then we could have created a single 51 00:03:39,810 --> 00:03:44,490 one with value, such as one for signs, two formats and three English. 52 00:03:46,140 --> 00:03:49,140 This is because one, two, three have more meaning. 53 00:03:49,740 --> 00:03:50,220 That is. 54 00:03:51,990 --> 00:03:54,620 They do is actually two a fun. 55 00:03:56,100 --> 00:03:58,560 But match is not higher than our modern science. 56 00:04:00,800 --> 00:04:03,470 Math, science and English are nominal data. 57 00:04:04,250 --> 00:04:07,160 It just says the name of the favorite subject. 58 00:04:08,090 --> 00:04:11,200 So we cannot assign numerical values which can be ordered. 59 00:04:12,490 --> 00:04:16,740 That has been viewed as ending zero and one it only two percent, true or false? 60 00:04:16,960 --> 00:04:17,830 What do you signal? 61 00:04:19,450 --> 00:04:22,810 Therefore, we create in minus one dummy variables. 62 00:04:24,720 --> 00:04:31,770 The value of each dummy variable is one for one single category and the net category as all. 63 00:04:33,470 --> 00:04:37,040 As zero value for all the N minus one dummy variables. 64 00:04:39,910 --> 00:04:45,570 I'll do interpret the results of a regression analysis containing dummy variables ordered like this 65 00:04:45,960 --> 00:04:48,810 will be discussed after we run the ignition on our data.