1 00:00:00,180 --> 00:00:03,930 Hello, welcome to the very first session of this project. 2 00:00:04,080 --> 00:00:10,680 So this is exactly my data set on which I have to do certain kind of analysis on which I have to do 3 00:00:10,830 --> 00:00:13,940 certain kind of privacy, lots of analysis. 4 00:00:14,160 --> 00:00:19,950 And at the end, I have to come up with some beautiful insights, some meaningful insight from this 5 00:00:19,950 --> 00:00:21,320 huge chunk of data. 6 00:00:21,660 --> 00:00:25,220 So you will see these are all different, different features in many different. 7 00:00:25,500 --> 00:00:29,610 So very first, I have to load this data frame in my Jupiter notebook. 8 00:00:30,120 --> 00:00:32,420 So this is exactly my general guide. 9 00:00:32,460 --> 00:00:36,580 And by force, I'm just going to report some basic models over here. 10 00:00:36,870 --> 00:00:42,310 So very first, let's say I'm just going to import my partner's module and I'm going to create Elías 11 00:00:42,330 --> 00:00:42,960 as Espina. 12 00:00:43,290 --> 00:00:51,000 So PARNAS is exactly that library that is extensively used in case of data manipulation, data extraction 13 00:00:51,150 --> 00:00:54,460 and data modification and much, much task as well. 14 00:00:54,870 --> 00:01:01,140 So after it, if you have to perform some numerical task on data, you guys can import no library as 15 00:01:01,140 --> 00:01:01,380 well. 16 00:01:01,740 --> 00:01:07,230 So here I'm going to say import number as ENPI after it for vacation stuff. 17 00:01:07,500 --> 00:01:13,440 I'm going to import my matplotlib and from this matplotlib, I'm going to import my daughter and I'm 18 00:01:13,440 --> 00:01:18,660 going to create a Elías BLT after it for better presentation. 19 00:01:18,660 --> 00:01:24,060 You guys can also import and interlibrary, which is exactly your Seabourne library. 20 00:01:24,060 --> 00:01:26,960 So I'm going to say import Seabourne SSME. 21 00:01:27,210 --> 00:01:32,450 So just executer using all this enter and it will execute this set. 22 00:01:32,700 --> 00:01:36,680 And after what you have to do very first, you have to read your data. 23 00:01:36,930 --> 00:01:38,580 So I'm going to say the speed. 24 00:01:38,850 --> 00:01:46,920 And here I have a function which is exactly my read underscore CSB and here you have to pass your where 25 00:01:46,920 --> 00:01:49,380 exactly your dataset is available. 26 00:01:49,530 --> 00:01:55,080 You will see over here, this is exactly my dataset and it is available at this spot. 27 00:01:55,080 --> 00:01:58,710 So I'm just going to copy this part and I'm just going to paste over here. 28 00:01:58,920 --> 00:02:04,890 And after it, if you will press tab, you will get all the stuff I will will add this particular part. 29 00:02:04,920 --> 00:02:08,610 So this is exactly the data fed that you need. 30 00:02:08,800 --> 00:02:15,570 So I'm just going to append are over it so that it will make us to get rid of all the issue that will 31 00:02:15,570 --> 00:02:19,190 come across after importing this data. 32 00:02:19,440 --> 00:02:27,150 So I'm just going to say it is exactly my div and let's see if I want our preview of it to just call 33 00:02:27,150 --> 00:02:28,860 had ordered that set. 34 00:02:28,860 --> 00:02:33,660 And you will see this is exactly the data frame that you need on this. 35 00:02:33,660 --> 00:02:41,550 You have to perform also preprocessing lots of analysis and at the end you have to conclude your visalo. 36 00:02:41,910 --> 00:02:48,870 So you will see for this particular session, I have this assignment in which I have to perform data 37 00:02:48,870 --> 00:02:55,650 planning on our data, as well as lots of data on our data, because whenever you are going to work 38 00:02:55,650 --> 00:03:00,210 for some real work project, your data will be always raw. 39 00:03:00,330 --> 00:03:03,110 You will always have your massive data. 40 00:03:03,390 --> 00:03:09,870 So you have to make radio data for the analysis, but for the modeling purposes, that's why you have 41 00:03:09,870 --> 00:03:13,520 to perform lots of data, including lots of people sitting on your data. 42 00:03:13,840 --> 00:03:18,150 So let's see if I'm going to check what exactly is the shape of my data frame. 43 00:03:18,160 --> 00:03:21,630 You will see it has that much rules and that much column. 44 00:03:21,840 --> 00:03:27,090 And let's see if I have to check whether I have any null value in the frame or not. 45 00:03:27,270 --> 00:03:34,680 So for this, I'm going to say this is null dot values dot any. 46 00:03:34,680 --> 00:03:42,900 And if I'm going to execute it, you will see it has to it means it has some kind of missing value available 47 00:03:42,900 --> 00:03:43,830 in your data. 48 00:03:44,340 --> 00:03:50,110 So let's say I'm going to say I just want summation of all the missing values over here. 49 00:03:50,130 --> 00:03:54,660 So for this, I'm just going to call this is Nelda's some. 50 00:03:54,660 --> 00:04:00,330 You will see this has all these types of different different values in your data available over here. 51 00:04:00,570 --> 00:04:01,440 You will see this. 52 00:04:01,590 --> 00:04:03,690 This country has that much missing value. 53 00:04:03,700 --> 00:04:08,070 This all these features has huge number of missing valleys over here. 54 00:04:08,520 --> 00:04:11,730 So if you will observe over here in this agent. 55 00:04:11,970 --> 00:04:23,160 So if Ajin is not given, so it means booking was most likely made by a person or you can see booking 56 00:04:23,160 --> 00:04:27,960 was mostly made without one and similarly were here incomplete. 57 00:04:28,290 --> 00:04:34,920 So if none is given or I can say wherever I have missing value, it means it is mostly like private. 58 00:04:34,920 --> 00:04:42,540 Similarly, and all the other features we can explain or I can say we can manage it according to our 59 00:04:42,780 --> 00:04:43,230 need. 60 00:04:43,530 --> 00:04:50,790 So here I'm going to say let's say I'm just going to say they've got Filani and here I'm going to say 61 00:04:51,150 --> 00:04:56,500 just fill it with zero and I'm just going to update my data frame as well. 62 00:04:56,520 --> 00:04:59,820 So here I have to pass my in-place parameter as. 63 00:05:00,040 --> 00:05:07,870 Troops were just executed and after it, if again, I'm going to call this some over there exactly this 64 00:05:07,870 --> 00:05:08,520 summer. 65 00:05:08,900 --> 00:05:12,820 Now, you will observe don't have any missing value in your data. 66 00:05:12,850 --> 00:05:19,430 It means up to some extent you data some already for your analysis purpose for the prepossessing purpose. 67 00:05:19,960 --> 00:05:26,110 So let's say let's say yeah, in this in this meeting feature, let's say in this middle feature, I'm 68 00:05:26,110 --> 00:05:28,010 just going to call ya. 69 00:05:28,270 --> 00:05:31,840 What exactly is account of each and every category in this? 70 00:05:32,110 --> 00:05:36,540 So for this, I'm just going to call this value on a school counselor there. 71 00:05:36,550 --> 00:05:41,250 And if you will execute the cell, you will see backfills and that number of count. 72 00:05:41,260 --> 00:05:45,570 Similarly, each and every category has that much number of count. 73 00:05:45,910 --> 00:05:52,690 So after it, let's say if I'm just going to access these children over there and in these children, 74 00:05:52,690 --> 00:06:00,370 if I'm going to call my unique or now you'll see it has all these unique, unique children available 75 00:06:00,370 --> 00:06:02,650 in your data frame in a similar way. 76 00:06:02,980 --> 00:06:08,890 Let's see if I'm going to call this adults, if I'm going to access this adult had on this, if I'm 77 00:06:08,890 --> 00:06:15,240 going to call my Unicode here, you will see it has this unique number of adults in a similar way. 78 00:06:15,250 --> 00:06:21,520 If I'm going to access my BBS and on this, if I'm going to call my Unicode unit, you will observe 79 00:06:21,520 --> 00:06:26,080 it has that unique number of babies available in your data frame. 80 00:06:26,290 --> 00:06:34,240 And from all the stuff that we have fach for here to form this, we can see in our data my children 81 00:06:34,240 --> 00:06:37,680 and adult and babies can't be zero at the time. 82 00:06:38,020 --> 00:06:41,130 So far, this what I'm going to do, I'm just going to create a filter. 83 00:06:41,170 --> 00:06:43,360 Let's end going to create a filter. 84 00:06:43,360 --> 00:06:46,840 And in this filter, I'm going to say very first I have to exit the children. 85 00:06:47,200 --> 00:06:52,650 For the children, it goes to zero so that if this is exactly my very first condition. 86 00:06:52,930 --> 00:06:58,120 So I'm going to say this is my very first condition and I'd say my second condition is the thing would 87 00:06:58,270 --> 00:07:01,520 be half of adults equally close to zero. 88 00:07:01,540 --> 00:07:04,270 So this is exactly my second condition. 89 00:07:04,270 --> 00:07:11,030 And my third condition is exactly where my babies are equally close to zero. 90 00:07:11,050 --> 00:07:13,750 So this is exactly my condition. 91 00:07:13,900 --> 00:07:20,320 And if I'm going to pass this filter, if I'm going to pass this filter in my data frame now, you will 92 00:07:20,320 --> 00:07:21,220 think over here. 93 00:07:21,250 --> 00:07:28,930 You will see over here, it has that much a number of rules and that it means these are exactly you're 94 00:07:29,080 --> 00:07:35,190 wrong entries in your data because you will observe at a time, at a time these values are going to 95 00:07:35,200 --> 00:07:35,580 be zero. 96 00:07:35,590 --> 00:07:39,240 It means you have to remove all these stuff for four days. 97 00:07:39,250 --> 00:07:43,570 What I'm going to do, let's say let's say I'm just going to store it somewhere else. 98 00:07:43,570 --> 00:07:47,460 Let's say this is the frame that I have to store very first or what you can do. 99 00:07:47,710 --> 00:07:51,570 You guys can call you guys can call negation of this filter. 100 00:07:51,580 --> 00:07:53,620 So I'm just going to copy from here. 101 00:07:53,950 --> 00:07:58,320 Just going to paste over here and if you will call negation of this. 102 00:07:58,510 --> 00:08:01,200 So in bonders, you have to use this operator over here. 103 00:08:01,300 --> 00:08:02,490 So just execute it. 104 00:08:02,500 --> 00:08:09,170 And it has that much number of rules in your data frame that doesn't contain any wrong entry. 105 00:08:09,490 --> 00:08:11,100 So that said, this is my letter. 106 00:08:11,110 --> 00:08:12,280 This is my final data frame. 107 00:08:12,280 --> 00:08:14,490 So I'm going to say this is exactly my data. 108 00:08:14,500 --> 00:08:15,940 So just executed. 109 00:08:16,120 --> 00:08:20,400 And if I'm going to call my hat over there, you will observe over here. 110 00:08:20,680 --> 00:08:28,330 This is a data frame on which you have to perform certain kind of analysis, because this data is almost 111 00:08:28,330 --> 00:08:31,780 I can say it is almost your preprocessed data. 112 00:08:31,780 --> 00:08:35,620 And on this data, you have to come up with some meaningful insight. 113 00:08:35,620 --> 00:08:38,290 You have to come up with some beautiful insights. 114 00:08:38,470 --> 00:08:40,210 So that is exactly what I mean. 115 00:08:40,570 --> 00:08:43,720 So that's all about the session hope of this session very much. 116 00:08:43,970 --> 00:08:44,610 Thank you. 117 00:08:44,620 --> 00:08:45,620 Have a nice day. 118 00:08:45,970 --> 00:08:46,780 Keep learning. 119 00:08:46,780 --> 00:08:47,620 Keep growing. 120 00:08:47,950 --> 00:08:48,670 Keep practicing.