1 00:00:00,180 --> 00:00:06,000 Hello and welcome to the very first session of our New York Trip Data Analysis Project. 2 00:00:06,130 --> 00:00:09,330 So this is exactly the very first session of our project. 3 00:00:09,510 --> 00:00:14,130 And in this session, we have this assignment to which we have to deal with that. 4 00:00:14,280 --> 00:00:22,290 So the very first problem statement is exactly we have to load and collect our entire data so that we 5 00:00:22,290 --> 00:00:29,010 can perform some sort of analysis on our data and then we will end up having some conclusions from our 6 00:00:29,010 --> 00:00:29,490 data. 7 00:00:29,880 --> 00:00:34,160 So let me minimize this and let me open my chapter notebook. 8 00:00:34,590 --> 00:00:37,140 And this is exactly my Jupiter notebook. 9 00:00:37,860 --> 00:00:44,330 Very first, I have to import some basic modules like what is bonders number and all the basic modules 10 00:00:44,340 --> 00:00:52,250 that will definitely help us to create some beautiful data, to manipulate data, to analyze data in 11 00:00:52,410 --> 00:00:53,970 much more efficient way. 12 00:00:54,180 --> 00:01:00,210 So let me very first import my phone numbers, which is extensively used for data manipulation, data 13 00:01:00,210 --> 00:01:05,760 extraction, data filtering, and much, much more so that if I'm just going to import my bonders and 14 00:01:05,760 --> 00:01:11,940 I'm going to create its allies as peaty after it, we have to import my nonmarital python, which is 15 00:01:11,940 --> 00:01:19,320 exactly my name by which is extensively used for numerical computations after we have to import some 16 00:01:19,320 --> 00:01:27,920 data library like matplotlib dot plot and I have to create its Eliade BLT. 17 00:01:27,930 --> 00:01:34,650 After that, I'm also going to import my Seabourne module, which definitely helps for battle validation 18 00:01:34,650 --> 00:01:36,540 for user friendly visualization. 19 00:01:36,720 --> 00:01:39,870 And I'm going to create its ElĂ­as as essense. 20 00:01:40,260 --> 00:01:41,830 Just execute the cell. 21 00:01:41,850 --> 00:01:46,690 It will take some couple of seconds and operate very close to what you have to do. 22 00:01:46,710 --> 00:01:50,390 You have to video data so you will observe over here. 23 00:01:50,640 --> 00:01:55,410 This is that part where I have all this data for my project. 24 00:01:55,710 --> 00:02:03,150 So let me board another module, which is exactly my OS module that definitely helps us whenever we 25 00:02:03,150 --> 00:02:05,550 have to deal with our operating systems. 26 00:02:05,580 --> 00:02:12,060 I'm just waiting for this OS to from this OS if I'm going to call a function, which is exactly my list 27 00:02:12,060 --> 00:02:14,010 that actory, which is exactly my list. 28 00:02:15,000 --> 00:02:19,460 So if you would shift plus tab, you will get documentation of this function here. 29 00:02:19,470 --> 00:02:21,020 You have to mention your part. 30 00:02:21,180 --> 00:02:26,060 So I'm just going to copy this entire path and I'm just going to do it here. 31 00:02:26,370 --> 00:02:33,510 So whatever path you will based over here, it will exactly return what our stuff's available at that 32 00:02:33,510 --> 00:02:34,500 particular path. 33 00:02:34,500 --> 00:02:42,150 And after that you have to add your art with exactly the to draw string, which tells Python, Yeah, 34 00:02:42,390 --> 00:02:45,660 this is exactly my path and it's Python. 35 00:02:45,690 --> 00:02:46,800 This is my path. 36 00:02:46,980 --> 00:02:49,640 And it is not an exact backslash. 37 00:02:49,680 --> 00:02:51,720 It is exactly my entire path. 38 00:02:51,840 --> 00:02:57,750 Otherwise, in some of the scenarios, if you're not going to append this string in some of the scenarios, 39 00:02:57,750 --> 00:02:59,080 you will get some books. 40 00:02:59,100 --> 00:03:02,580 So if I'm going to execute this well, it will return. 41 00:03:02,610 --> 00:03:06,180 All this stuff's available at this particular spot. 42 00:03:06,390 --> 00:03:11,550 So you will also hear it will return all the data sets available at this path. 43 00:03:11,820 --> 00:03:18,440 Let's say I need this last seven datasets for my analysis purposes fill out. 44 00:03:18,450 --> 00:03:20,730 So what I'm going do I have to access this? 45 00:03:20,910 --> 00:03:23,940 So here I'm just going to say it is exactly my list. 46 00:03:24,150 --> 00:03:31,380 So to access the last seven entry, I'm going to say minus seven to this one just executed. 47 00:03:31,380 --> 00:03:34,280 It will determine my these seven dataset. 48 00:03:34,350 --> 00:03:36,440 Let me store this data. 49 00:03:36,450 --> 00:03:39,180 Let me store this list somewhere in files. 50 00:03:39,450 --> 00:03:42,060 And if I'm going to print my files, that's OK. 51 00:03:42,420 --> 00:03:46,150 Let's say I don't want this data set still. 52 00:03:46,170 --> 00:03:50,390 Let's say I'm going to consider this data set for later analysis. 53 00:03:50,390 --> 00:03:54,720 So what I'm going to do, I'm just going to say why door remove? 54 00:03:54,720 --> 00:03:56,580 I have to remove this dataset. 55 00:03:56,820 --> 00:04:00,990 So I'm going to say I have to remove this one just executed. 56 00:04:00,990 --> 00:04:08,220 And if, again, I'm going to break my files, you will see I have six datasets over here and we have 57 00:04:08,220 --> 00:04:15,360 to collect the entire data from this six dataset and we have to concatenate all the datasets. 58 00:04:15,360 --> 00:04:22,380 And at the end, we will end up having our complete data on what we have to perform some kind of analysis 59 00:04:22,380 --> 00:04:25,350 depending upon what problem statement we have. 60 00:04:25,740 --> 00:04:28,310 So let's see what I'm going to do over here. 61 00:04:28,320 --> 00:04:35,910 Now, let's say very first, I'm going to say I had to just I create on this files and whatever data 62 00:04:35,910 --> 00:04:40,120 form it will return me, I'm just going to contaminate it. 63 00:04:40,200 --> 00:04:44,020 So I'm going to see for file in five. 64 00:04:44,040 --> 00:04:48,540 And now whatever file you have, you have to create a data stream of that. 65 00:04:48,570 --> 00:04:52,950 So I'm going to say dot reach this way. 66 00:04:53,070 --> 00:04:59,880 And here you have a function which is exactly my read underscores the easy and very first here. 67 00:05:00,060 --> 00:05:04,300 If I'm going to pass Jessica's tab here, you will see here you have a file path. 68 00:05:04,560 --> 00:05:08,340 So let me first mention what exactly my path is. 69 00:05:08,610 --> 00:05:13,740 So let me just copy my entire path from here and I'm just going to copy from here. 70 00:05:13,950 --> 00:05:17,970 And let's say I'm going to define a variable over here and here. 71 00:05:17,970 --> 00:05:24,970 I'm just going to come up with it and that board and let me up and over here, which is exactly my roistering. 72 00:05:25,050 --> 00:05:31,620 So once defined that variable, you have to say here, I'm going to very first mention my part, then 73 00:05:31,620 --> 00:05:33,790 I have to upend my backslash. 74 00:05:34,500 --> 00:05:42,900 So here I'm going to say I have to spend my days over here, which is exactly this one, which is exactly 75 00:05:42,900 --> 00:05:44,510 this one guy after it. 76 00:05:44,520 --> 00:05:46,840 What I have to do, I have to read this file. 77 00:05:46,870 --> 00:05:50,520 So I'm going to say concatenate file just with that. 78 00:05:50,520 --> 00:05:53,990 And after it, I have to mention some encoding technique as well. 79 00:05:54,180 --> 00:05:59,420 So I'm going to say, let's see, my encoding technique is exactly my UTF. 80 00:05:59,430 --> 00:06:02,910 It has almost 95 percent of the scenarios. 81 00:06:02,910 --> 00:06:09,590 You will have this encoding technique as UTF eight because this is the most famous encoding technique. 82 00:06:09,780 --> 00:06:11,670 So it will exactly return with some data. 83 00:06:12,210 --> 00:06:16,230 So let me store it in once I'm going to store that. 84 00:06:16,380 --> 00:06:21,560 What I'm going to do, I'm just going to say, oh, BDI dot concat. 85 00:06:21,570 --> 00:06:27,330 So I have to concatenate each of the data frame from this iterations before this. 86 00:06:27,330 --> 00:06:32,730 I'm going to say beedi dot concat and let me define a new data frame over here. 87 00:06:32,970 --> 00:06:42,090 So here I'm going to say final is it goes to nothing but b'day dot data frame and this is exactly the 88 00:06:42,090 --> 00:06:43,140 data frame function. 89 00:06:43,320 --> 00:06:45,970 So final is exactly a blank data frame. 90 00:06:46,170 --> 00:06:52,630 And I'm basically going to append my final with this data. 91 00:06:52,680 --> 00:06:58,650 So I'm going to say final comedia once I will do all the stuff I'm going to say final. 92 00:06:58,660 --> 00:06:59,910 It goes to all this. 93 00:06:59,910 --> 00:07:06,590 It means in this final data frame, I have my all these entire data. 94 00:07:06,600 --> 00:07:08,190 That's what I'm going to do. 95 00:07:08,400 --> 00:07:14,770 There are basically a baby approach to collect this entire huge chunk of data. 96 00:07:14,880 --> 00:07:17,130 Now you have to just execute this. 97 00:07:17,430 --> 00:07:22,410 It will take some couple of seconds depending upon what a specification you have now. 98 00:07:22,500 --> 00:07:24,630 All your stuff gets executed. 99 00:07:24,630 --> 00:07:28,630 And if you guys are not that much confortable will with this logic. 100 00:07:28,800 --> 00:07:31,340 So there is an alternative approach for you. 101 00:07:31,350 --> 00:07:37,590 What you can do, you can create a separate set of frame of this one, this one and all these different 102 00:07:37,860 --> 00:07:38,280 ones. 103 00:07:38,280 --> 00:07:45,090 You created a frame of each and every data set then you can concatenate using this Beedie or concat 104 00:07:45,090 --> 00:07:45,710 function. 105 00:07:45,720 --> 00:07:48,180 So that's basically a little bit Landi approach. 106 00:07:48,480 --> 00:07:54,630 But that's a smarter approach and this is exactly that approach that you have to follow in your real 107 00:07:54,630 --> 00:07:54,920 words. 108 00:07:54,930 --> 00:07:58,480 And adding, that's why I'm going to mention this one approach in front of you. 109 00:07:58,500 --> 00:08:04,280 So let me check what exactly the dimension of this new data frame which I have created over here. 110 00:08:04,290 --> 00:08:07,020 So let me save final shape now. 111 00:08:07,020 --> 00:08:10,050 You will see what it has that much number of rows. 112 00:08:10,380 --> 00:08:14,390 So just amazing how much huge amount of data we have. 113 00:08:14,610 --> 00:08:21,840 So that's basically your exact data, your real world data on which you can perform your analysis. 114 00:08:21,900 --> 00:08:23,270 That's all about the session. 115 00:08:23,280 --> 00:08:25,170 Hopefully you will love this approach. 116 00:08:25,170 --> 00:08:25,620 How? 117 00:08:25,620 --> 00:08:33,660 I have just using some iteration, how I have collected this huge amount of data in just a single data 118 00:08:33,660 --> 00:08:33,990 of him. 119 00:08:34,170 --> 00:08:37,290 So I hope you will love the session and definitely this session. 120 00:08:37,290 --> 00:08:39,030 Really very helpful for you. 121 00:08:39,240 --> 00:08:41,880 If you are going to follow this approach. 122 00:08:42,390 --> 00:08:43,770 I hope you will love this. 123 00:08:44,010 --> 00:08:44,670 Thank you. 124 00:08:44,680 --> 00:08:45,620 Have a nice day. 125 00:08:45,630 --> 00:08:46,470 Keep learning. 126 00:08:46,470 --> 00:08:47,310 Keep growing. 127 00:08:47,550 --> 00:08:48,420 Keep practicing.