1 00:00:00,180 --> 00:00:06,270 Hello, all so in all the previous session, we have analyzed our data and we all have understand our 2 00:00:06,270 --> 00:00:07,830 data in a very proper way. 3 00:00:08,430 --> 00:00:14,190 This is exactly the trend from where my other guests are, what exactly the distribution of the prize 4 00:00:14,430 --> 00:00:20,090 and how exactly the prices of a room that I am in, which month my guest is higher. 5 00:00:20,250 --> 00:00:23,140 We all have analyzed this trend as well. 6 00:00:23,730 --> 00:00:27,550 So that's a time for your machine learning aspect. 7 00:00:27,870 --> 00:00:35,790 Now, the very first task with respect to your machine learning approach is exactly you have to select 8 00:00:36,150 --> 00:00:41,970 some important features using correlation concept for your machine learning model. 9 00:00:42,420 --> 00:00:49,110 So if you guys don't know what exactly the correlation, kindly follow my all the previous video that 10 00:00:49,110 --> 00:00:50,300 I have uploaded. 11 00:00:50,520 --> 00:00:58,150 So let me call ahead on my data so that I will get a quick overview of how exactly my data looks like. 12 00:00:58,500 --> 00:01:03,040 So this is exactly my data on which I have to find our correlations. 13 00:01:03,040 --> 00:01:08,560 So to find correlation, you have to just call a function which is exactly C Odawara. 14 00:01:08,880 --> 00:01:10,650 So just execute it. 15 00:01:10,770 --> 00:01:15,930 So it will give us this amazing metrics with respect to each and every feature. 16 00:01:15,940 --> 00:01:18,060 You have some correlation. 17 00:01:18,180 --> 00:01:24,740 And this is exactly like, of course, you will see with respect to this is cancer, to cancer. 18 00:01:24,840 --> 00:01:31,050 You have a coalition of one because it makes no sense at all having correlation between same features. 19 00:01:31,080 --> 00:01:36,400 Whereas with respect to is cancer, you have a coalition of zero point two nine. 20 00:01:36,630 --> 00:01:44,670 It means if this lead-time value increases, there is a twenty nine percent probability that this is 21 00:01:44,670 --> 00:01:46,200 cancer and also gets increased. 22 00:01:46,230 --> 00:01:49,710 That exactly is an exact meaning of this correlation. 23 00:01:50,160 --> 00:01:57,390 So now what I have to do, basically, I just need my correlation with respect to this is cancer. 24 00:01:57,600 --> 00:01:58,140 So far. 25 00:01:58,140 --> 00:02:00,660 This what we can do here? 26 00:02:00,660 --> 00:02:04,830 I'm just going to say data don't call correlation. 27 00:02:04,830 --> 00:02:07,860 And here I have to just paste over there. 28 00:02:07,920 --> 00:02:14,340 And if I'm going to execute it, it will hit on this amazing stage with respect to all the correlation 29 00:02:14,340 --> 00:02:21,540 value with respect to this is canceled because it's cancer is exactly my independent feature that we 30 00:02:21,540 --> 00:02:25,160 have to predict whether booking is going to be canceled or not. 31 00:02:25,470 --> 00:02:33,150 So we can easily figure out over here how all these variables are going to impact on this is canceled 32 00:02:33,150 --> 00:02:33,710 feature. 33 00:02:33,720 --> 00:02:40,320 It means how all these variables are going to impact how my bookings are going to cancel or not. 34 00:02:40,590 --> 00:02:42,830 So this is the exact meaning of this coalition. 35 00:02:43,230 --> 00:02:44,280 So let's see. 36 00:02:44,280 --> 00:02:46,730 This is exactly my very first over here. 37 00:02:46,770 --> 00:02:49,830 Let's say this is exactly my correlation. 38 00:02:49,830 --> 00:02:54,030 And after it now we have to simply pretend this coordination. 39 00:02:54,030 --> 00:02:54,540 That's it. 40 00:02:54,540 --> 00:03:00,330 And this is exactly the final correlation value that you have to take care of it. 41 00:03:00,570 --> 00:03:04,410 Let's say I have a shot this so far, this what we guys can do. 42 00:03:04,410 --> 00:03:07,740 Very first, let me just add to this call this correlation. 43 00:03:08,040 --> 00:03:16,080 And on this, if I'm going to call this absolute, because I have to neglect this negative and positive 44 00:03:16,080 --> 00:03:16,520 as well. 45 00:03:16,710 --> 00:03:20,940 So I have to just call this absolute and let me just execute all this. 46 00:03:21,210 --> 00:03:25,260 You will see all your positive and negative gas disappeared. 47 00:03:25,470 --> 00:03:32,790 And now on this, I have to just call this short on a score, Riluzole, just to execute all your stuff. 48 00:03:32,810 --> 00:03:36,480 So let's say I have to shorten in my descending order. 49 00:03:36,490 --> 00:03:38,730 So for this, I would say ascending. 50 00:03:38,970 --> 00:03:40,420 It goes to four. 51 00:03:40,470 --> 00:03:41,930 That set just execute. 52 00:03:41,940 --> 00:03:49,590 And it is your amazing steps over here that you really need to take care of it, from which you have 53 00:03:49,590 --> 00:03:53,520 to extract some meaningful and amazing insights. 54 00:03:53,580 --> 00:03:59,820 So from this correlation value, you can directly say that this time, this total number of special 55 00:03:59,820 --> 00:04:08,040 request, these required car, all these things are my most important features, that we have to take 56 00:04:08,040 --> 00:04:11,920 care of it because its coordination is definitely high. 57 00:04:12,090 --> 00:04:15,180 Let's say I have to understand this data is too. 58 00:04:15,360 --> 00:04:22,230 So for this, what we guys can do, let's say with respect to this dependent variable, I have to know 59 00:04:22,950 --> 00:04:31,560 with respect to this is concerned how many gas we have that have Check-Out that have canceled and that 60 00:04:31,710 --> 00:04:33,420 doesn't have any showboating. 61 00:04:33,870 --> 00:04:38,630 So for this what we can do, we can see DataDot will buy. 62 00:04:38,880 --> 00:04:46,070 So basically we have to group our data on the basis of this is underscore cancel. 63 00:04:46,080 --> 00:04:52,590 So I'm going to say I have to group on the basis of this one, then I have to accept this reservation 64 00:04:52,590 --> 00:04:59,570 is status, which is exactly a column, which is I would say reservation and a score is. 65 00:05:00,130 --> 00:05:07,740 And on this, I have to simply call my value, so I'm just going to say, well, you underscore council 66 00:05:07,870 --> 00:05:11,410 are just executed and it is this amazing. 67 00:05:11,730 --> 00:05:15,820 You will see when your booking isn't going to cancel. 68 00:05:16,300 --> 00:05:19,000 And reservationist, who does this check out? 69 00:05:19,000 --> 00:05:20,770 You have that much number of guest. 70 00:05:21,100 --> 00:05:27,720 And when your booking is going to cancel and your reservation status is canceled, then you have that 71 00:05:27,730 --> 00:05:33,040 much number of guests and it is no sure you have that much number of guests. 72 00:05:33,250 --> 00:05:35,660 So you can simply understand your data. 73 00:05:35,830 --> 00:05:41,110 Yeah, there are a higher number of customers there, a higher number of guests who are going to check 74 00:05:41,110 --> 00:05:41,550 out. 75 00:05:41,560 --> 00:05:50,050 Let's say I have to fetch my numerical features and my categorical features separately from all these 76 00:05:50,050 --> 00:05:50,650 features. 77 00:05:50,860 --> 00:05:54,160 So very first, I have to exclude some variables. 78 00:05:54,310 --> 00:05:57,970 So I'm going to do final list, let's say list on a score. 79 00:05:57,970 --> 00:05:58,840 Not so. 80 00:05:58,840 --> 00:06:07,090 The very first variable that I have to exclude in my modeling purposes is exactly my days in waiting 81 00:06:07,090 --> 00:06:07,450 list. 82 00:06:07,780 --> 00:06:15,190 So the question that will come across your mind why I have to exclude this, because this is that feature 83 00:06:15,460 --> 00:06:20,350 that is never going to contribute, whether a booking is going to cancel or not. 84 00:06:20,560 --> 00:06:22,560 And you can definitely see what here. 85 00:06:22,840 --> 00:06:24,720 It is also a very low correlation value. 86 00:06:24,770 --> 00:06:29,860 It means it is not going to impact that much on my model. 87 00:06:29,860 --> 00:06:32,410 So I have to basically excluded after it. 88 00:06:32,410 --> 00:06:37,730 We have to exclude, let's say, arrival, underscore date in a school year. 89 00:06:37,740 --> 00:06:39,940 So I have to exclude this one as well. 90 00:06:40,330 --> 00:06:44,350 After what I have to do now, I have to just execute it. 91 00:06:44,350 --> 00:06:50,890 Now we have to fetch how many numerical features we have or how many numerical columns we have. 92 00:06:51,310 --> 00:06:58,090 So for this, I'm just going to say for column in DataDot columns, I have to basically iterate on each 93 00:06:58,090 --> 00:06:58,810 and every column. 94 00:06:59,140 --> 00:07:01,180 Then I'm going to put a condition over here. 95 00:07:01,250 --> 00:07:06,820 Rather, the data type of that particular column must not be object. 96 00:07:06,820 --> 00:07:14,140 So I'm going to say data column dot data type, doesn't it, to object. 97 00:07:14,140 --> 00:07:15,720 That is exact meaning of this one. 98 00:07:16,060 --> 00:07:21,030 And if this commission will satisfy, I'm basically going to consider that column. 99 00:07:21,430 --> 00:07:24,370 But still, I have to let's say I have to exclude this one. 100 00:07:24,610 --> 00:07:33,670 So I'm going to add the condition as a. column, not in whatever list I have defined, which is exactly 101 00:07:33,670 --> 00:07:34,770 list and a score. 102 00:07:34,820 --> 00:07:40,210 Not so this is exactly my entire code of list comprehension. 103 00:07:40,450 --> 00:07:47,290 And if you guys are not that much comfortable with this list comprehension code, so you what you can 104 00:07:47,290 --> 00:07:50,680 do, you can say it is for data and columns. 105 00:07:50,680 --> 00:07:51,760 Let me show it there. 106 00:07:52,030 --> 00:07:53,290 So you have two very first. 107 00:07:53,290 --> 00:08:00,160 Add this condition after eight, you have to add this one and after it you have this condition and this 108 00:08:00,160 --> 00:08:00,370 one. 109 00:08:00,640 --> 00:08:04,030 So here I am going to say this is nothing but this one. 110 00:08:04,030 --> 00:08:09,690 And once I have all this stuff, then I am simply going to append in my list very first. 111 00:08:09,730 --> 00:08:11,110 I have to define some list. 112 00:08:11,320 --> 00:08:16,090 Let's say these are my let's say I'm going to define list as calls. 113 00:08:16,450 --> 00:08:22,390 So I'm going to say calls, dot, pen, whatever column I have. 114 00:08:22,750 --> 00:08:27,760 And now I have to simply print my course list. 115 00:08:28,060 --> 00:08:34,400 So it is basically for those one who are not that much confortable with this list comprehension code, 116 00:08:34,420 --> 00:08:37,930 because whenever you are going to form from real words, then add use. 117 00:08:38,200 --> 00:08:40,840 You have to always write your optimised code. 118 00:08:40,850 --> 00:08:45,070 You can't write down lines of code, one hundred lines of code. 119 00:08:45,190 --> 00:08:51,730 You have to always optimize your code because more the number of lines of code have more the resources 120 00:08:51,730 --> 00:08:52,560 it will take. 121 00:08:52,570 --> 00:08:59,910 So let's say I'm going to name this list as numerical underscore features just executed. 122 00:09:00,140 --> 00:09:02,990 Let's say I have to print my numerical features as well. 123 00:09:03,310 --> 00:09:10,180 So now you will see these are all the features that are exactly my numerical features. 124 00:09:10,270 --> 00:09:13,760 So on this feature, I can do a lot of people sitting. 125 00:09:13,780 --> 00:09:17,590 Let's say that there is some outliers in or not. 126 00:09:17,860 --> 00:09:21,320 So on these features, we can still do lots of people. 127 00:09:21,910 --> 00:09:29,110 And if you guys still not much comfortable with this code, you can definitely go ahead with this logic. 128 00:09:29,350 --> 00:09:36,190 But try to work on this list comprehension code, because once you will win in your industry use case, 129 00:09:36,340 --> 00:09:40,050 you have to always write such a optimized code. 130 00:09:40,060 --> 00:09:43,330 That's why I'm going to recommend you follow this approach. 131 00:09:43,690 --> 00:09:47,590 So let me cross the cell. 132 00:09:47,590 --> 00:09:54,520 And once I have all this stuff, what I have to do, let's say I just need my some categorical features 133 00:09:54,520 --> 00:09:54,910 as well. 134 00:09:54,910 --> 00:09:58,810 Let's say I have to exclude some categorical columns as well. 135 00:09:59,080 --> 00:09:59,470 So. 136 00:09:59,690 --> 00:10:05,380 Me, show me all the columns, so DataDot columns, these are exactly all the columns over here that 137 00:10:05,390 --> 00:10:08,710 make the final list very false and colonoscope not. 138 00:10:09,020 --> 00:10:16,880 And in this list, the very first is exactly my arrival on a score date and a score here, which is 139 00:10:16,880 --> 00:10:18,580 exactly this one. 140 00:10:18,590 --> 00:10:25,260 So I'm going to say this is the very first feature that I don't have to consider in my model building, 141 00:10:25,280 --> 00:10:29,930 because this is exactly that feature that now would impact your model that much. 142 00:10:30,180 --> 00:10:37,430 So the second feature is exactly your assigned room type, which is exactly this one. 143 00:10:37,430 --> 00:10:40,820 So I'm going to say this is exactly my second feature. 144 00:10:40,820 --> 00:10:47,210 After it, we have our booking changes column, which is exactly this one. 145 00:10:47,210 --> 00:10:54,080 So I'm going to say this is exactly my, let's say, third column that I have to exclude after what 146 00:10:54,080 --> 00:10:54,820 we have to do. 147 00:10:55,010 --> 00:11:00,620 We have some other columns, which is exactly my reiterations status. 148 00:11:00,620 --> 00:11:06,110 So I'm going to say this is nothing but my reservation is status after eight. 149 00:11:06,170 --> 00:11:12,950 What we have to do, we have something known as a country because country never impacts that much on 150 00:11:12,950 --> 00:11:14,540 my model building up. 151 00:11:14,540 --> 00:11:17,930 To some extent it will impact, but they are up to a greater extent. 152 00:11:17,930 --> 00:11:19,750 It will now impact that much. 153 00:11:20,180 --> 00:11:27,430 And the second column that I have to exclude is exactly my days in waiting list. 154 00:11:27,440 --> 00:11:31,160 So yeah, that will never impact that much. 155 00:11:31,310 --> 00:11:33,950 So I have to just place over there once. 156 00:11:33,950 --> 00:11:37,210 I will please let me just print my all these things. 157 00:11:37,520 --> 00:11:41,060 So these are all the columns that I have to exclude. 158 00:11:41,270 --> 00:11:45,080 So now what I have to do, I just need some categorical features. 159 00:11:45,080 --> 00:11:55,060 So far this I'm just going to say for column in data dot columns and I have to put a condition if data 160 00:11:55,220 --> 00:12:03,000 of column dot daytimes whosever column has a data type of object. 161 00:12:03,000 --> 00:12:09,950 So I'm going to say equally cause to object, if this condition will get satisfied, then only I am 162 00:12:09,950 --> 00:12:12,530 going to consider it in my list. 163 00:12:12,980 --> 00:12:17,930 And still you have to add one more condition and column. 164 00:12:18,290 --> 00:12:19,630 Not in it. 165 00:12:19,650 --> 00:12:24,800 Me, that column must not be in my cat on score, not list. 166 00:12:25,040 --> 00:12:29,060 So that is my entire Codal again list comprehension. 167 00:12:29,570 --> 00:12:36,890 So if you guys are not that much comfortable, you can definitely write that code that I have written 168 00:12:36,890 --> 00:12:42,790 over just one or two minutes before using my tools that I have shown you. 169 00:12:43,130 --> 00:12:50,480 So here I have to say, let's say here I'm going to say this is exactly my categorical Alesco features 170 00:12:50,480 --> 00:12:51,740 just executed. 171 00:12:52,070 --> 00:12:53,960 So now I have to simply print. 172 00:12:54,260 --> 00:12:59,720 So these are my all the categorical features that we have to take care of it. 173 00:12:59,720 --> 00:13:04,760 So hope you love the session and you find this session very valuable and helpful for you. 174 00:13:05,040 --> 00:13:05,720 Thank you. 175 00:13:05,750 --> 00:13:06,700 Have a nice day. 176 00:13:07,190 --> 00:13:08,060 Keep learning. 177 00:13:08,060 --> 00:13:08,990 Keep growing. 178 00:13:08,990 --> 00:13:09,830 Keep practicing.