1 00:00:01,430 --> 00:00:08,330 Next, important thing we need to know is how to handle the categorical variables in a set of independent 2 00:00:08,330 --> 00:00:08,840 variables. 3 00:00:10,360 --> 00:00:18,250 If you remember, in our dataset, we have variables like airport, which has values yes and no, and 4 00:00:18,250 --> 00:00:22,580 there is a variable called waterboarding, which has values such as lake. 5 00:00:22,640 --> 00:00:23,760 There were several. 6 00:00:25,290 --> 00:00:27,740 No regression model works on real numbers. 7 00:00:28,860 --> 00:00:33,120 If you don't understand a non numerical variable such as airport. 8 00:00:35,140 --> 00:00:36,570 How can we handle this problem? 9 00:00:38,770 --> 00:00:44,770 It is simple, we just need to assign a numerical value to each of the categories of that variable. 10 00:00:46,720 --> 00:00:49,960 If the category names are numbers, the models can run. 11 00:00:51,490 --> 00:00:55,990 But we cannot assign any number to the categories as this will impact the modern. 12 00:00:57,150 --> 00:00:59,100 We need to follow proper process. 13 00:01:00,280 --> 00:01:08,200 So this process of properly creating a dummy variable, which will represent an actual variable containing 14 00:01:08,200 --> 00:01:11,710 non numerical values, is called a dummy variable creation. 15 00:01:13,830 --> 00:01:18,440 So why we do it, because regression analysis cannot handle non numeric data. 16 00:01:19,590 --> 00:01:26,250 And how we do it, we first create a new variable or set of variables, that is the dummy variable. 17 00:01:27,490 --> 00:01:33,750 And these new variables are given values of either zero or one and not anything else. 18 00:01:36,000 --> 00:01:36,600 But why? 19 00:01:37,050 --> 00:01:40,880 Why not reassign values like one, two and three, etc.? 20 00:01:44,230 --> 00:01:47,440 Let us look at the categories in this example for clarification. 21 00:01:49,220 --> 00:01:55,550 On the left, I have students and their favorite subject, on the right, I'm calling this subject. 22 00:01:56,610 --> 00:02:02,010 The first step of the military creation is creating dummy variables. 23 00:02:04,120 --> 00:02:10,960 How many variables do we need to remember this, the number of dummy variables required to replace a 24 00:02:10,960 --> 00:02:14,470 variable is number of categories in that variable, minus one. 25 00:02:15,970 --> 00:02:22,120 That is, if there are two categories we need to minus one, the military rules, that is only one variable. 26 00:02:22,750 --> 00:02:27,070 If you have three categories, like in our example, we need three minus one. 27 00:02:27,100 --> 00:02:28,520 That is to the variables. 28 00:02:30,190 --> 00:02:33,580 So I have three subjects here, science, English and maths. 29 00:02:34,930 --> 00:02:40,090 So I have created two dummy variables, one for science and one for Max. 30 00:02:41,110 --> 00:02:46,360 Science has value of one if the subject is science and it has value zero. 31 00:02:47,700 --> 00:02:51,180 So if you look at the first two, it has scientism one. 32 00:02:52,180 --> 00:02:54,840 And the other two are the subject is English and math. 33 00:02:54,940 --> 00:02:55,960 It has value zero. 34 00:02:57,690 --> 00:03:06,750 Matt has value one, if this object as bad as it has value zero, so the fourth rule as one in the match 35 00:03:06,750 --> 00:03:09,180 column and all the other rules as zero. 36 00:03:11,280 --> 00:03:20,040 This means if both are you, then this object is English, so the third row, as both you do with representing 37 00:03:20,340 --> 00:03:21,510 the third subject. 38 00:03:24,090 --> 00:03:28,070 Clearly, these zeros and ones are representing each category. 39 00:03:29,100 --> 00:03:30,570 Of the favorite classmate, Evelyn. 40 00:03:33,930 --> 00:03:40,050 Now, to answer the question why we created two dummy rebels, then we could have created a single one 41 00:03:40,170 --> 00:03:44,460 with value, such as one for science, two formats and three for English. 42 00:03:46,050 --> 00:03:49,100 This is because one, two, three have more meaning. 43 00:03:49,650 --> 00:03:50,220 That is. 44 00:03:51,920 --> 00:03:54,620 So it, too, is actually two days of fun. 45 00:03:56,010 --> 00:03:58,530 But match is not higher than our modern times. 46 00:04:00,740 --> 00:04:03,470 Math, science and English are nominal data. 47 00:04:04,160 --> 00:04:07,160 It just says the name of the favorite subject. 48 00:04:08,030 --> 00:04:11,180 So we cannot assign numerical values which can be ordered. 49 00:04:12,430 --> 00:04:17,830 Whereas when we are sending zero and one, it only represents true or false or and signal. 50 00:04:19,390 --> 00:04:22,780 Therefore, we create in minus one dummy variables. 51 00:04:24,630 --> 00:04:31,740 The value of each dummy variable is one for one single category, and the category has all. 52 00:04:33,350 --> 00:04:37,040 As zero value for all the and minus one dummy variables. 53 00:04:39,840 --> 00:04:46,110 How to interpret the results of a regression analysis containing dummy variables ordered like this will 54 00:04:46,110 --> 00:04:48,870 be discussed after we ran the aggression on our data.