1 00:00:02,300 --> 00:00:09,000 As told in the teary lecture, we need to change the non numerical values that base our categorical 2 00:00:09,000 --> 00:00:12,060 variables into numerical values. 3 00:00:12,810 --> 00:00:15,360 We will do that by creating dummy variables. 4 00:00:17,580 --> 00:00:21,250 Creating dummy variables in Python is very easy. 5 00:00:23,780 --> 00:00:28,100 Remember, we have two variables that are categorical. 6 00:00:28,880 --> 00:00:32,690 The first one is airport, which has two categories. 7 00:00:34,090 --> 00:00:35,800 Either yes or no. 8 00:00:36,460 --> 00:00:38,780 And the other one is the waterboarding. 9 00:00:40,510 --> 00:00:46,830 We test for categories Lake, River, Lake and River, and then. 10 00:00:50,730 --> 00:00:53,460 So let's create dummy variables and. 11 00:00:56,900 --> 00:00:58,780 Will Wright be ethically to. 12 00:01:01,640 --> 00:01:08,570 Bebe not got dummies Guide Dummies is a function of Banda's. 13 00:01:08,990 --> 00:01:10,010 That's why we are writing. 14 00:01:10,070 --> 00:01:11,750 We don't get dummies. 15 00:01:14,780 --> 00:01:20,210 And the record will mention or data from that SBF if on this. 16 00:01:22,940 --> 00:01:25,520 A dummy variable, seven all created. 17 00:01:28,230 --> 00:01:30,510 Let's have a look at our data. 18 00:01:31,070 --> 00:01:34,680 Where do they go first, by values, by using head function? 19 00:01:40,410 --> 00:01:42,510 You can see if we scrollable, right? 20 00:01:45,340 --> 00:01:49,630 For airport, we now have two columns, one. 21 00:01:50,540 --> 00:01:52,980 We're airport variable values. 22 00:01:53,050 --> 00:01:53,500 Yes. 23 00:01:53,830 --> 00:01:57,610 And another one where airport variables had values. 24 00:01:57,640 --> 00:01:58,030 No. 25 00:02:02,100 --> 00:02:05,180 If you see a leer everywhere where airport? 26 00:02:05,210 --> 00:02:06,180 Well, it was yes. 27 00:02:06,570 --> 00:02:07,920 Now we have one. 28 00:02:08,250 --> 00:02:09,630 An airport, the US column. 29 00:02:10,230 --> 00:02:13,230 And everywhere we are, the airport really was. 30 00:02:13,230 --> 00:02:13,610 No. 31 00:02:13,980 --> 00:02:15,620 We have one in airport. 32 00:02:15,660 --> 00:02:15,890 No. 33 00:02:15,960 --> 00:02:16,290 Wadham. 34 00:02:19,140 --> 00:02:26,640 Similarly, if you look at the waterboardings novia for numerical variables for for robotic categorical 35 00:02:26,640 --> 00:02:31,050 variable similar to airport variable water body. 36 00:02:32,160 --> 00:02:34,390 Is also divided in a similar fashion. 37 00:02:35,890 --> 00:02:43,360 So wherever the water body was, lake, now we have one in water, body under school, lake variable. 38 00:02:43,960 --> 00:02:49,390 Wherever the water body was, the river, we have one in the water. 39 00:02:49,390 --> 00:02:49,560 What? 40 00:02:49,560 --> 00:02:50,980 They were variable. 41 00:02:55,420 --> 00:03:01,390 But if you remember, we told you that the number of dummy variable should be one less than the number 42 00:03:01,390 --> 00:03:02,340 of categories. 43 00:03:05,290 --> 00:03:07,120 The reasoning behind that is. 44 00:03:08,110 --> 00:03:10,050 Let's take example of airport. 45 00:03:11,940 --> 00:03:18,790 So if the airport value is yes, we have one and the airport value is is no. 46 00:03:18,900 --> 00:03:21,180 We have zero in the airport, this column. 47 00:03:21,900 --> 00:03:30,270 So in a way, this single variable is conveying all the information that we had in our airport variable. 48 00:03:31,080 --> 00:03:33,510 There is no need of airport, no variable. 49 00:03:36,220 --> 00:03:39,920 We are going to see that this kind of relationship. 50 00:03:40,100 --> 00:03:42,440 We call it full negative correlation. 51 00:03:43,570 --> 00:03:48,730 So everywhere the value of airport noise, zero, we have one in their 40s value. 52 00:03:48,910 --> 00:03:49,750 And everywhere. 53 00:03:50,110 --> 00:03:54,610 The airport, well, you know, is one we have zero in the airport. 54 00:03:54,670 --> 00:03:54,880 Yes. 55 00:03:54,880 --> 00:03:55,150 Value. 56 00:03:55,900 --> 00:03:59,260 So both of these variables are conveying the same information. 57 00:03:59,610 --> 00:04:00,080 That's right. 58 00:04:00,130 --> 00:04:02,290 We are going to delete one of this variable. 59 00:04:03,160 --> 00:04:08,910 Similarly, we are also going to delete waterboarding, underscore non variable sense. 60 00:04:09,270 --> 00:04:12,020 If we have zero in all the three variables of our. 61 00:04:12,610 --> 00:04:17,690 That is water over the lake, waterboarding river and water body lake underscore river. 62 00:04:18,040 --> 00:04:22,090 This means that there should be one in water body none. 63 00:04:22,840 --> 00:04:24,760 So again, waterboarding none. 64 00:04:24,820 --> 00:04:28,180 Here is a redundant variable and we will delete that also. 65 00:04:29,230 --> 00:04:32,050 So let's delete these two variables. 66 00:04:33,360 --> 00:04:34,580 Some Texas then. 67 00:04:38,910 --> 00:04:43,810 If and in the squared record will mention airport under escort, no. 68 00:04:52,660 --> 00:04:54,630 Next, let's believe waterboarding. 69 00:04:54,820 --> 00:04:55,750 Underscoring then. 70 00:05:05,170 --> 00:05:08,250 Remember that and its capital and waterboarding then. 71 00:05:08,530 --> 00:05:10,630 And by 10:00, this case sensitive language. 72 00:05:12,830 --> 00:05:16,310 Well, let's look at our beef. 73 00:05:21,780 --> 00:05:25,860 You can see now we have only one variable for the airport, and that is airport. 74 00:05:25,890 --> 00:05:26,340 Yes. 75 00:05:26,670 --> 00:05:32,610 And the one where you represent that airport is present and zero and you represent that airport is not 76 00:05:32,610 --> 00:05:33,090 present. 77 00:05:33,520 --> 00:05:42,060 Similarly, we have three separate variables for water, body and presence of number one, represent 78 00:05:42,090 --> 00:05:44,730 that that water what is present near the city.