1 00:00:00,210 --> 00:00:06,600 Hello and welcome to the very first session of this Amazon data, Alice's project in which we are going 2 00:00:06,600 --> 00:00:14,700 to perform data analysis and data processing, lots of data cleaning to analyze this huge chunk of Amazon 3 00:00:14,700 --> 00:00:15,180 data. 4 00:00:15,360 --> 00:00:19,000 And then we all have come up with some meaningful insights. 5 00:00:19,020 --> 00:00:25,620 So this is exactly my job with a notebook idea where I'm going to code in Python programming language. 6 00:00:25,740 --> 00:00:32,790 So very first I have to import my all the necessary modules over here, or I can say all the necessary 7 00:00:32,790 --> 00:00:35,250 modules and classes over here so very fast. 8 00:00:35,250 --> 00:00:41,550 I'm just going to import my findings and I'm going to create ideas as speedy for data manipulation, 9 00:00:41,550 --> 00:00:44,190 for data stacking, for data filtering. 10 00:00:44,460 --> 00:00:50,540 Lots of use cases can be easily solved by using these bonders modules after it. 11 00:00:50,580 --> 00:00:52,800 I'm going to import my name by module. 12 00:00:52,810 --> 00:01:00,810 So I'm going to say import no and I'm going to create its areas as an IP and after it for widgets and 13 00:01:00,810 --> 00:01:04,970 stuff, I am going to import my explorative and Seabourne as well. 14 00:01:05,250 --> 00:01:12,960 So here I'm just going to import my matplotlib dot by plot, which is exactly my python plot S.P.C.A. 15 00:01:13,380 --> 00:01:19,050 After it, I have to also import my seabourne, which is a little bit advanced with the addition library. 16 00:01:19,380 --> 00:01:24,660 So I'm going to say it's nothing but Seabourne as Ascendis so just executed it. 17 00:01:24,970 --> 00:01:27,340 Now what you have to do ready. 18 00:01:27,360 --> 00:01:33,300 First you have to read your data wherever your data will be, whether it will be used in databases, 19 00:01:33,510 --> 00:01:39,930 whether it will be in use, some of the APIs or in some frameworks, whether it will be in some big 20 00:01:39,930 --> 00:01:45,240 data frameworks like how group Apache Hive, Apache pick wherever it will be. 21 00:01:45,390 --> 00:01:49,710 You have to read that data because that's how you are going to work on the real world. 22 00:01:49,710 --> 00:01:54,750 Expect you don't have data or you might not have data in the form of reformat. 23 00:01:55,020 --> 00:02:02,160 So at that time you have to read that data because most of the time your data will be in your database 24 00:02:02,160 --> 00:02:08,040 because companies like Manses or whether it will be top notch companies, they prefer to store their 25 00:02:08,040 --> 00:02:10,230 data in some databases. 26 00:02:10,250 --> 00:02:16,170 Similarly, over here, if you will, see, this is exactly the database where you have all your data 27 00:02:16,170 --> 00:02:21,930 as well as you have also data, any form of CSP, which is exactly the same use DOT CSC. 28 00:02:22,050 --> 00:02:29,760 But anyhow, you have to read this data from this database so you will observe its extension is not 29 00:02:29,760 --> 00:02:30,640 a skew light. 30 00:02:30,660 --> 00:02:34,170 It means this data is stored in Ashkali database. 31 00:02:34,350 --> 00:02:38,850 So now you need your askew light compiler to open this database. 32 00:02:39,000 --> 00:02:45,210 Either you can download in offline mode, then either you can download that compiler and open this database 33 00:02:45,210 --> 00:02:53,490 over there, or you can also go to Google and you have to just Google as Escalada online compiler. 34 00:02:53,490 --> 00:02:54,250 Yeah, online. 35 00:02:54,270 --> 00:02:59,400 Ask that compiler, whatever it will be, and you have to just click on this very first hyperlink. 36 00:02:59,670 --> 00:03:03,510 And this is exactly the interface for which I am talking about. 37 00:03:03,750 --> 00:03:10,560 You will see whatever it is exactly you're asking like online compiler and here you have an option as 38 00:03:10,560 --> 00:03:10,980 file. 39 00:03:11,310 --> 00:03:16,560 Just go there, click on Open Database and it will go to this page. 40 00:03:16,560 --> 00:03:18,690 You have to just select the part. 41 00:03:18,690 --> 00:03:23,250 And this is my Amazon dataset and this is exactly that database. 42 00:03:23,250 --> 00:03:24,690 You have to just open it. 43 00:03:24,990 --> 00:03:28,230 It will take some couple of seconds to open this database. 44 00:03:28,230 --> 00:03:31,800 Now, you will observe your database gets open over here. 45 00:03:31,800 --> 00:03:36,750 And this is exactly the table, because this is exactly my relational database. 46 00:03:36,750 --> 00:03:42,450 And data is all about whenever your data is going to be stored in the form of tables in the form of 47 00:03:42,450 --> 00:03:43,410 rows and columns. 48 00:03:43,590 --> 00:03:44,730 Similarly were here. 49 00:03:45,030 --> 00:03:49,350 All your data is exactly stored in these reviews tables. 50 00:03:49,350 --> 00:03:52,730 And if you are going to click on this, you will see all your data. 51 00:03:52,740 --> 00:03:54,120 Is this, this, this. 52 00:03:54,120 --> 00:03:56,700 These are all the columns in your database. 53 00:03:56,700 --> 00:04:02,090 And if you have to visualize this data or if you want to show this data, you have to just run this 54 00:04:02,100 --> 00:04:07,260 study, select estar from then you have to give your table lims here. 55 00:04:07,260 --> 00:04:14,970 My table name is nothing but reviews and you have to just press enter over here and you have to select 56 00:04:14,970 --> 00:04:18,030 this run over there and it will run this query. 57 00:04:18,360 --> 00:04:21,420 So what exactly the meaning of this select is start from reviews. 58 00:04:21,420 --> 00:04:28,640 It means you have to select all rows, all columns from this particular table that you have in this 59 00:04:28,650 --> 00:04:29,460 database. 60 00:04:29,460 --> 00:04:31,260 Now you will feel right over here. 61 00:04:31,680 --> 00:04:36,090 This is exactly the data on which you have to perform certain kind of analysis. 62 00:04:36,090 --> 00:04:37,110 You will see over here. 63 00:04:37,110 --> 00:04:39,150 This is the very huge data. 64 00:04:39,150 --> 00:04:45,720 You will see how much a number of rules I have approx very values you will observe here. 65 00:04:46,170 --> 00:04:52,830 So you have to load this data in your Jupiter book because you have to do a lot of people sending. 66 00:04:53,240 --> 00:04:59,100 You have to do a lot of analysis on the data and then you have to come up with some meaningful insight 67 00:04:59,100 --> 00:04:59,550 from this. 68 00:05:00,170 --> 00:05:05,590 Chunk of data so far, this very first, you need some external modules, so I'm just going to import 69 00:05:05,590 --> 00:05:10,660 my askew light tree and after it I have to just execute it. 70 00:05:10,660 --> 00:05:15,120 And very first, you have to establish your connection to this database. 71 00:05:15,310 --> 00:05:21,340 So for this what you guys can do, you have to very close access this module, which is exactly your 72 00:05:21,550 --> 00:05:22,620 escalatory. 73 00:05:22,630 --> 00:05:24,880 So I'm just going to say it's like three. 74 00:05:25,090 --> 00:05:29,650 And here you have a function, which is exactly my connect function. 75 00:05:29,830 --> 00:05:36,970 And here very first, you have to mention to what database you have to connect to what database you 76 00:05:36,970 --> 00:05:38,380 have to establish a connection. 77 00:05:38,590 --> 00:05:44,290 So I'm just going to copy this part and I'm just going to paste over there and here if I'm going to 78 00:05:44,290 --> 00:05:45,250 press tab. 79 00:05:45,580 --> 00:05:49,570 This is exactly that database of which I'm talking about. 80 00:05:49,810 --> 00:05:55,500 Let's say I'm going to store its object and see or and or connection, whatever you want. 81 00:05:55,510 --> 00:05:56,560 It's all up to you. 82 00:05:57,100 --> 00:05:58,240 Just executed. 83 00:05:58,240 --> 00:05:59,860 All this stuff gets executed. 84 00:05:59,860 --> 00:06:06,190 And if I'm going to show you what exactly the type of this now you will observe, it is an object of 85 00:06:06,190 --> 00:06:09,220 this class disconnection class over here. 86 00:06:09,310 --> 00:06:13,050 Now, what you have to do, you have to read data. 87 00:06:13,060 --> 00:06:17,990 You have to read your data from this database using your partner's module. 88 00:06:18,340 --> 00:06:27,810 So I'm just going to say PD dot, read, underscore, you have to just step, underscore, Eskew and 89 00:06:28,210 --> 00:06:30,780 underscore as you. 90 00:06:31,360 --> 00:06:33,850 And this is exactly that function here. 91 00:06:33,850 --> 00:06:36,580 You have to say select. 92 00:06:37,060 --> 00:06:45,220 Oh, you have to say select star from whatever your table name of your database, which is exactly my 93 00:06:45,220 --> 00:06:45,790 reviews. 94 00:06:45,970 --> 00:06:53,710 So here I am going to say Slackistan form reviews and here you have to mention that connection that 95 00:06:53,710 --> 00:06:59,770 you have established over here, which is exactly in my SEO and or you can see Conexion. 96 00:07:00,160 --> 00:07:01,920 Know what you guys can do. 97 00:07:02,110 --> 00:07:04,210 Let's say it will send me some data frame. 98 00:07:04,210 --> 00:07:10,300 So I'm just going to store it in, let's say, some days of solitary, whatever data frame name you 99 00:07:10,300 --> 00:07:11,030 want to assign. 100 00:07:11,170 --> 00:07:13,300 Now you have to just execute the search. 101 00:07:13,330 --> 00:07:19,020 It will take some couple of seconds because you will observe this database is very huge. 102 00:07:19,030 --> 00:07:25,040 It means it will somewhere take one or two minutes depending upon what processor you are using, depending 103 00:07:25,070 --> 00:07:27,370 upon what goes on of processor. 104 00:07:27,370 --> 00:07:30,400 You have what ram you have worked hard disk you have. 105 00:07:30,550 --> 00:07:31,790 It's all up to that. 106 00:07:31,810 --> 00:07:33,730 So all this stuff gets executed. 107 00:07:33,730 --> 00:07:36,740 And if I'm going to call, I had to get a preview of it. 108 00:07:36,760 --> 00:07:42,130 I think you will observe this is a data frame on which you have to do certain kind of analysis. 109 00:07:42,310 --> 00:07:44,580 And if I am going to call shape order. 110 00:07:44,950 --> 00:07:51,840 So this is exactly the number of roles in my data and this is exactly the number of columns in my data. 111 00:07:52,120 --> 00:07:57,540 Let's say I need some, let's say only three rows from this database. 112 00:07:57,550 --> 00:08:02,140 So either we can call this come on as well, select a from reviews. 113 00:08:02,140 --> 00:08:05,560 And here I'm going to say limit three. 114 00:08:05,560 --> 00:08:10,650 It means I just need three rows from this table, lets it just execute it. 115 00:08:10,690 --> 00:08:13,970 This is exactly the top three rows of my data frame. 116 00:08:14,140 --> 00:08:17,620 Similarly, if I'm going to pass here, five will observe. 117 00:08:17,620 --> 00:08:23,650 These are my top five rows and by calling the Abdulahad still I have my top five rows. 118 00:08:23,860 --> 00:08:25,890 So these are almost similar operation. 119 00:08:25,900 --> 00:08:32,110 So if you are very much familiar with your actual database or any of the database, so this escalated 120 00:08:32,110 --> 00:08:35,280 just like a piece of cake, literally, just like a piece of cake. 121 00:08:35,290 --> 00:08:42,010 And if you want to read this data, this data said you have to just call this real safety function from 122 00:08:42,010 --> 00:08:47,050 despondence model, which is exactly my read underscores the ASV. 123 00:08:47,050 --> 00:08:51,630 And here you have to mention your part, which is exactly this one. 124 00:08:51,630 --> 00:08:57,040 So I'm just going to copy from here and just going to paste from here and just press tab. 125 00:08:57,040 --> 00:09:00,720 And this is exactly the dataset of which I'm talking about. 126 00:09:00,760 --> 00:09:07,240 So it will also give me my data frame that what this query will give us if I'm going to execute it, 127 00:09:07,240 --> 00:09:12,670 it will take some couple of seconds in its execution, depending upon what the specifications you have. 128 00:09:12,940 --> 00:09:19,300 You will figure out what here this is exactly that entire data frame on which you have to perform certain 129 00:09:19,300 --> 00:09:20,250 kind of analysis. 130 00:09:20,260 --> 00:09:26,950 So you will figure out there are two ways to read your data in this use case, because here you have 131 00:09:26,950 --> 00:09:29,970 data in your database as well as in your case before. 132 00:09:30,310 --> 00:09:37,780 But if you will ask my advice or my solution, I will always ask you to go ahead with this database 133 00:09:37,780 --> 00:09:41,110 approach, because whenever you are, we do work from some real personal use. 134 00:09:41,380 --> 00:09:47,590 Maybe you have your data from some databases or maybe you have data in some, let's say in some big 135 00:09:47,590 --> 00:09:51,370 data sources like Hadoop, like Apache HYP, like Apache big. 136 00:09:51,370 --> 00:09:56,340 So you have to extract that data from such a number of big data sources. 137 00:09:56,560 --> 00:09:59,830 So at that time, at the time this established. 138 00:09:59,960 --> 00:10:04,110 And this country will play a very major role over here. 139 00:10:04,460 --> 00:10:11,210 So that's why I'm going to suggest you always go ahead with this discovery and this this blocks, of 140 00:10:11,210 --> 00:10:14,510 course, whenever you have data in the form of some database. 141 00:10:15,020 --> 00:10:16,580 So that's all about the search and hope. 142 00:10:16,580 --> 00:10:17,570 You love it very much. 143 00:10:18,020 --> 00:10:18,660 Thank you. 144 00:10:18,680 --> 00:10:19,550 Have a nice day. 145 00:10:19,880 --> 00:10:20,720 Keep learning. 146 00:10:20,720 --> 00:10:21,500 Keep going. 147 00:10:21,680 --> 00:10:22,100 Keep in.