1 00:00:00,910 --> 00:00:07,990 As to the need to re-elect him, we need to change the nonmedical values in that categorical variables 2 00:00:08,180 --> 00:00:15,370 to numerical values by creating the military build, creating a dummy variable in software package like 3 00:00:15,460 --> 00:00:17,320 art is very easy. 4 00:00:18,790 --> 00:00:22,120 So we have two variables that are categorical in our dataset. 5 00:00:22,960 --> 00:00:25,450 One is airport, which has categories. 6 00:00:25,750 --> 00:00:26,500 Yes and no. 7 00:00:27,160 --> 00:00:28,460 Let us open the database again. 8 00:00:30,770 --> 00:00:33,080 When is this airport, which has two categories. 9 00:00:33,710 --> 00:00:38,630 The other is water boarding, which has four categories like River Lake none. 10 00:00:38,690 --> 00:00:39,200 And both. 11 00:00:40,390 --> 00:00:43,280 Now, to create dummy variables with numerical values. 12 00:00:43,830 --> 00:00:44,880 What did the data? 13 00:00:46,220 --> 00:00:50,660 That is, for all the categorical variables of your dataset in one go. 14 00:00:51,290 --> 00:00:55,650 You'll install a package called Demis to install a package. 15 00:00:56,360 --> 00:01:04,610 We can write installed our package within Blackard and double quotation marks relate Demis. 16 00:01:07,860 --> 00:01:14,260 And this to this package is a start, and you can look at this package if you go to the packages tab 17 00:01:14,740 --> 00:01:15,280 on the date. 18 00:01:16,390 --> 00:01:19,440 There is this dummy's package to Lawder. 19 00:01:19,510 --> 00:01:21,880 You just need to click it on this checkbook's. 20 00:01:23,050 --> 00:01:26,590 So basically, this package makes dummy variable creation easy. 21 00:01:28,090 --> 00:01:36,160 Next, we need to write one single line of code to get the dummy variables relate D.F. get. 22 00:01:38,680 --> 00:01:39,820 Dumi door bita. 23 00:01:42,360 --> 00:01:43,100 Da, da, da. 24 00:01:44,990 --> 00:01:45,710 I remember that good. 25 00:01:45,770 --> 00:01:51,470 We just need to specify the desert, which is D.F. and we'll run this. 26 00:01:52,480 --> 00:01:55,750 So Lawler, just click this variable to viewer. 27 00:01:58,540 --> 00:02:02,320 So if, you know, scroll to the trade, you can see for airport. 28 00:02:02,380 --> 00:02:08,500 We now have two columns, one where Edward where he will add value. 29 00:02:08,500 --> 00:02:08,930 Yes. 30 00:02:09,190 --> 00:02:10,120 And other for no. 31 00:02:11,260 --> 00:02:12,880 So basically in this. 32 00:02:13,030 --> 00:02:13,310 Yes. 33 00:02:13,330 --> 00:02:13,690 Valuable. 34 00:02:14,470 --> 00:02:17,970 This contains one when the actual value of it would very well was. 35 00:02:18,070 --> 00:02:18,540 Yes. 36 00:02:19,840 --> 00:02:25,600 And in this other column, it contains one if the actual value of it what it bought was No. 37 00:02:27,010 --> 00:02:30,230 Similarly in the position of water body variable. 38 00:02:31,900 --> 00:02:35,850 We have four new variables where there was a leak. 39 00:02:36,430 --> 00:02:40,720 We have one in this lake column where there was revolt. 40 00:02:40,780 --> 00:02:44,380 We have one in this river column and so on. 41 00:02:46,300 --> 00:02:52,270 But if you remember, I told you that number of dummy variables is actually one less than number of 42 00:02:52,270 --> 00:02:52,960 categories. 43 00:02:54,070 --> 00:02:57,700 So as airport variable has two categories, yes and no. 44 00:02:57,970 --> 00:02:59,980 We need only one dummy variable for this. 45 00:03:01,480 --> 00:03:02,590 So this airport. 46 00:03:02,680 --> 00:03:02,940 Yes. 47 00:03:03,020 --> 00:03:07,900 Variable can alone serve the purpose as one will represent. 48 00:03:08,070 --> 00:03:08,440 Yes. 49 00:03:08,530 --> 00:03:09,670 And zero will represent. 50 00:03:09,730 --> 00:03:10,030 No. 51 00:03:11,350 --> 00:03:18,000 Similarly, for water money, we can keep these three variables, which are water, body lake, water 52 00:03:18,010 --> 00:03:22,920 boarding lake and river and waterboarding river and will not need these. 53 00:03:23,220 --> 00:03:23,470 This. 54 00:03:23,730 --> 00:03:24,160 Very well. 55 00:03:24,190 --> 00:03:25,000 Waterboarding then. 56 00:03:26,200 --> 00:03:28,840 So now we need to believe these two variables. 57 00:03:29,680 --> 00:03:30,660 Waterboarding, none. 58 00:03:30,820 --> 00:03:32,890 An airport. 59 00:03:32,960 --> 00:03:33,340 No. 60 00:03:34,300 --> 00:03:35,740 To delete these two variables. 61 00:03:36,490 --> 00:03:40,840 We need to get depletion of these two columns, the position of of airport. 62 00:03:40,870 --> 00:03:41,640 Nobody able. 63 00:03:42,310 --> 00:03:44,800 Is if we have over this, I. 64 00:03:45,220 --> 00:03:47,700 We can see it is eight column. 65 00:03:49,570 --> 00:03:51,400 And the other variable that we want to delete. 66 00:03:52,780 --> 00:03:55,960 The position of this waterboarding column is 14 column. 67 00:03:57,970 --> 00:04:00,640 So eight, 10 to 14 column will be deleting. 68 00:04:03,510 --> 00:04:05,550 We will write the F get. 69 00:04:08,070 --> 00:04:09,770 D.F. Square Records. 70 00:04:09,890 --> 00:04:14,090 We want all the comma, minus eight. 71 00:04:16,520 --> 00:04:19,920 Run this one video a little later. 72 00:04:21,030 --> 00:04:28,200 Not since we deleted it column, the column which we counted as 14 cuddlier will be No. 73 00:04:28,270 --> 00:04:30,170 13 to a few. 74 00:04:30,210 --> 00:04:32,370 Jake, I mean, it is column number 13. 75 00:04:34,530 --> 00:04:35,250 So you're right. 76 00:04:35,950 --> 00:04:39,500 Bill Gates, D.F.. 77 00:04:40,750 --> 00:04:47,410 Square records, comma, minus lead on this. 78 00:04:48,470 --> 00:04:49,580 Another variable is delivered. 79 00:04:51,140 --> 00:04:54,710 And it does look at the data to confirm waterboarding. 80 00:04:54,810 --> 00:04:57,240 None is deleted from this set. 81 00:04:58,850 --> 00:05:01,320 An airport noise deleted. 82 00:05:04,560 --> 00:05:12,030 So now all the categorical variables are converted to dummy variables and these dummy variables contain 83 00:05:12,030 --> 00:05:12,570 numbers. 84 00:05:13,260 --> 00:05:16,470 So now we can then add analysis on this dataset.