1 00:00:02,300 --> 00:00:09,000 As told in the teary lecture, we need to change the non numerical values that base our categorical 2 00:00:09,000 --> 00:00:12,060 variables into numerical values. 3 00:00:12,810 --> 00:00:15,360 We will do that by creating dummy variables. 4 00:00:17,570 --> 00:00:21,260 Creating dummy variables in Python is very easy. 5 00:00:23,780 --> 00:00:28,090 Remember, we have two variables that are categorical. 6 00:00:28,880 --> 00:00:32,690 The first one is airport, which has two categories. 7 00:00:34,090 --> 00:00:35,800 Either yes or no. 8 00:00:36,460 --> 00:00:38,790 And the other one is the waterboarding. 9 00:00:40,520 --> 00:00:46,830 We test for categories Lake, River, Lake and River, and then. 10 00:00:50,730 --> 00:00:53,970 So let's create dummy variables, and Biton 11 00:00:56,860 --> 00:00:58,770 will right be difficult to. 12 00:01:01,640 --> 00:01:08,520 Bebe not got dummies Guide Dummies is a function of Banda's. 13 00:01:08,990 --> 00:01:10,010 That's why we are writing. 14 00:01:10,070 --> 00:01:11,750 We don't get dummies. 15 00:01:14,790 --> 00:01:17,270 And the record will mention or Dataplan. 16 00:01:17,550 --> 00:01:18,350 That is D.F.. 17 00:01:19,310 --> 00:01:20,220 If you'd done this. 18 00:01:22,340 --> 00:01:25,490 What a dummy variable, the seven all created. 19 00:01:28,230 --> 00:01:30,510 Let's have a look at our data. 20 00:01:31,050 --> 00:01:34,680 Where do they go first, by values, by using head function? 21 00:01:40,410 --> 00:01:42,510 You can see if you scroll to the right. 22 00:01:45,340 --> 00:01:49,630 For airport, we now have two columns, one. 23 00:01:50,540 --> 00:01:52,990 We're airport variable values. 24 00:01:53,050 --> 00:01:53,500 Yes. 25 00:01:53,830 --> 00:01:57,610 And another one where airport variables had values. 26 00:01:57,650 --> 00:01:58,060 No. 27 00:02:02,100 --> 00:02:05,190 If you see a leer everywhere where airport. 28 00:02:05,370 --> 00:02:06,170 It was yes. 29 00:02:06,570 --> 00:02:07,890 Now we have one. 30 00:02:08,250 --> 00:02:13,220 An airport, the US column and everywhere we are, the airport really was. 31 00:02:13,220 --> 00:02:13,570 No. 32 00:02:13,980 --> 00:02:16,290 We have one in airport, no volume. 33 00:02:19,140 --> 00:02:26,640 Similarly, if you look at the waterboardings novia for numerical variables for for robotic categorical 34 00:02:26,640 --> 00:02:31,050 variable similar to airport variable water body. 35 00:02:32,160 --> 00:02:34,400 Is also divided in a similar fashion. 36 00:02:35,890 --> 00:02:43,330 So wherever the water body was, lake, now we have one in water, body under school, lake variable. 37 00:02:43,960 --> 00:02:49,390 Wherever the water body was, the river, we have one in the water. 38 00:02:49,390 --> 00:02:49,560 What? 39 00:02:49,560 --> 00:02:50,980 They were variable. 40 00:02:55,420 --> 00:03:01,390 But if you remember, we told you that the number of dummy variable should be one less than the number 41 00:03:01,390 --> 00:03:02,340 of categories. 42 00:03:05,290 --> 00:03:07,120 The reasoning behind that is. 43 00:03:08,090 --> 00:03:10,070 Let's take example of airport. 44 00:03:11,930 --> 00:03:20,120 So if the airport value is yes, we have one end of the airport value is is no, we have zero in the 45 00:03:20,120 --> 00:03:21,170 airport, the US column. 46 00:03:21,890 --> 00:03:30,260 So in a way, this single variable is conveying all the information that we had in our airport variable. 47 00:03:31,070 --> 00:03:33,500 There is no need of airport, no variable. 48 00:03:36,220 --> 00:03:39,920 We are going to see that this kind of relationship. 49 00:03:40,100 --> 00:03:42,440 We call it full negative correlation. 50 00:03:43,570 --> 00:03:48,730 So everywhere the value of airport noise, zero, we have one in their 40s value. 51 00:03:48,880 --> 00:03:49,750 And everywhere. 52 00:03:50,110 --> 00:03:54,610 The airport, well, you know, is one we have zero in the airport. 53 00:03:54,620 --> 00:03:54,870 Yes. 54 00:03:54,880 --> 00:03:55,200 Value. 55 00:03:55,900 --> 00:03:59,260 So both of these variables are conveying the same information. 56 00:03:59,610 --> 00:03:59,990 That's right. 57 00:04:00,130 --> 00:04:02,300 We are going to delete one of this variable. 58 00:04:03,160 --> 00:04:10,960 Similarly, we are also going to delete what underscore non variable sense if we have zero in all the 59 00:04:10,960 --> 00:04:16,900 three variables of our body, that is water over the lake, waterboarding river and water will be Lake 60 00:04:16,900 --> 00:04:17,680 Underscore River. 61 00:04:18,040 --> 00:04:22,090 This means that there should be one in water body none. 62 00:04:22,840 --> 00:04:24,760 So again, waterboarding none. 63 00:04:24,820 --> 00:04:28,180 Here is a redundant variable and we will delete that also. 64 00:04:29,230 --> 00:04:32,050 So let's delete these two variables. 65 00:04:33,360 --> 00:04:34,610 Some Texas then. 66 00:04:39,040 --> 00:04:43,790 If and then the squared record will mention airport underscored not. 67 00:04:52,660 --> 00:04:54,740 Next, let's believe waterboarding. 68 00:04:54,820 --> 00:04:55,750 Underscoring then. 69 00:05:05,180 --> 00:05:08,200 Remember that and its capital and waterboarding then. 70 00:05:08,510 --> 00:05:10,630 And by Biton, this case sensitive language. 71 00:05:14,860 --> 00:05:16,170 Let's look at the beef. 72 00:05:21,780 --> 00:05:25,860 You can see now we have only one variable for the airport, and that is airport. 73 00:05:26,670 --> 00:05:32,610 And the one where you represent that airport is present and zero and you represent that airport is not 74 00:05:32,610 --> 00:05:33,090 present. 75 00:05:33,500 --> 00:05:42,060 Similarly, we have three separate variables for water, body and presence of number one, represent 76 00:05:42,090 --> 00:05:44,720 that that water what is present near the city.