1 00:00:00,150 --> 00:00:06,000 Halaal, before going deep dive into the session, let's have a quick recap of what we have done in 2 00:00:06,000 --> 00:00:07,010 our previous session. 3 00:00:07,260 --> 00:00:12,440 So we have basically read our data using this ESKIL database very first. 4 00:00:12,450 --> 00:00:15,060 We have established our connection to the database. 5 00:00:15,390 --> 00:00:23,820 Then using this read as Calgary, we have successfully read the data from this table and using our online 6 00:00:23,820 --> 00:00:26,470 compiler, we all get to know ya. 7 00:00:26,730 --> 00:00:31,520 This is exactly the table that is available inside that database. 8 00:00:31,980 --> 00:00:38,370 So after eight, we have also readable data using our real key function, which is exactly. 9 00:00:38,370 --> 00:00:39,290 Let me show that. 10 00:00:39,300 --> 00:00:39,660 Yeah. 11 00:00:39,660 --> 00:00:40,960 Which is exactly this one. 12 00:00:41,250 --> 00:00:46,290 So if you will ask my advice, I will always advise you to go through with this database approach. 13 00:00:46,530 --> 00:00:48,720 So in this session we have this assignment. 14 00:00:48,960 --> 00:00:55,140 The very first statement we have to deal with that is exactly we have to perform sentiment analysis 15 00:00:55,140 --> 00:00:55,930 on our data. 16 00:00:56,250 --> 00:00:59,970 So what exactly is sentiment analysis? 17 00:01:00,000 --> 00:01:07,970 So sentiment analysis is all about determining what feelings a particular person has, let's say, and 18 00:01:07,980 --> 00:01:11,470 saying, yeah, that celebrity looks awesome. 19 00:01:11,490 --> 00:01:15,720 It means I have a positive sentiment for that particular celebrity. 20 00:01:16,170 --> 00:01:17,000 Same thing. 21 00:01:17,010 --> 00:01:18,740 Yeah, that guy is ugly. 22 00:01:18,750 --> 00:01:23,510 It means I have a negative sentiment for that particular guy that I'm seeing. 23 00:01:24,090 --> 00:01:26,530 This Coffey's average not not that much good. 24 00:01:26,550 --> 00:01:28,350 Not not that much bad. 25 00:01:28,510 --> 00:01:31,400 So it means I have a neutral sentiment for that coffee. 26 00:01:31,710 --> 00:01:32,830 So that's all about that. 27 00:01:32,850 --> 00:01:36,200 That's the idea behind what exactly is sentiment analysis. 28 00:01:36,360 --> 00:01:40,920 So with respect to data, you have to perform your sentiment analysis. 29 00:01:40,920 --> 00:01:47,460 It means with respect to the summary column, you have to perform this sentiment analysis so that if 30 00:01:47,940 --> 00:01:54,600 you need some external libraries to perform this sentiment analysis, because writing code from scratch 31 00:01:54,810 --> 00:01:57,200 will definitely very complex over here. 32 00:01:57,420 --> 00:02:03,300 So Python gives that functionality so that we don't have to write that much huge number of lines to 33 00:02:03,300 --> 00:02:10,740 perform sentiment analysis so that if us to have to install our external model, which is exactly my 34 00:02:11,460 --> 00:02:16,370 text blob's, I'm going to say install or text blob. 35 00:02:16,560 --> 00:02:20,600 So if you haven't install it, just install using the blocks of code. 36 00:02:20,880 --> 00:02:25,230 Now after what you have to do, you have to just import this tax block. 37 00:02:25,230 --> 00:02:34,730 So I'm going to say from this text block module, you have to import a class which is exactly my text 38 00:02:34,740 --> 00:02:43,260 got class, so I'm just going to say import my text blob class, just BASTABLE over here. 39 00:02:43,260 --> 00:02:45,110 And it is exactly that class. 40 00:02:45,240 --> 00:02:48,700 Now, what you have to do, let's say let me show the thing. 41 00:02:49,020 --> 00:02:56,430 Let's say I'm just going to say deal of summary, summary of zero. 42 00:02:56,610 --> 00:03:03,660 So if let's say if I'm going to print it, you will see this is exactly the very first summary given 43 00:03:03,660 --> 00:03:06,500 by my user, given by my some customer. 44 00:03:06,750 --> 00:03:11,850 And let's say I have to perform sentiment analysis on this particular text. 45 00:03:12,200 --> 00:03:14,460 See, I'm just going to store it in my text. 46 00:03:14,730 --> 00:03:17,930 And if I were to print it, that's not a fancy task. 47 00:03:18,210 --> 00:03:22,350 Now you have to perform sentiment analysis using this class. 48 00:03:22,530 --> 00:03:27,540 So I'm just going to say I have to pass this text in my tax law. 49 00:03:27,750 --> 00:03:36,780 And on this, I have to call my sentiment dot polarity because we have to state what exactly the popularity 50 00:03:36,780 --> 00:03:42,960 of that particular sentiment, whether it is plus one, minus one or between plus one to minus one or 51 00:03:43,110 --> 00:03:51,960 zero, because plus one definitely indicates, yeah, it's a positive polarity or you can see it's positive 52 00:03:51,960 --> 00:03:58,800 sentiment, whereas zero refers to my neutral sentiment, minus one refers to my negative sentiment 53 00:03:58,800 --> 00:04:03,150 just executed even though it has that much sentiment. 54 00:04:03,210 --> 00:04:10,680 Yeah, it means it seems to have a very positive sentiment with respect to this particular summary. 55 00:04:10,830 --> 00:04:16,230 So it means you have to perform the sentiment analysis with respect to each and every summary. 56 00:04:16,440 --> 00:04:21,900 So I'm just going to say very first, I have to define a blank list, which is exact polarity. 57 00:04:22,140 --> 00:04:27,750 So whatever polarity I will get with respect to each and every summary, I'm just going to restore in 58 00:04:27,750 --> 00:04:28,410 this list. 59 00:04:28,710 --> 00:04:35,370 After what I have to do, I have to just as for I in deal of summary. 60 00:04:35,640 --> 00:04:37,740 So I have to just as this one. 61 00:04:37,760 --> 00:04:44,360 Now I'm going to say on each and every I am just going to say very first I have to exit this test block 62 00:04:44,730 --> 00:04:53,610 and in this I have to pass my this I and on these I'm going to say dot sentiment, dot polarity. 63 00:04:53,820 --> 00:04:59,360 Then whatever polarity it will be, I'm just going to find in this. 64 00:05:00,140 --> 00:05:07,500 At least that said, that's what that's what is our best way to find sentiment analysis on your data. 65 00:05:07,910 --> 00:05:09,430 But there is a hack. 66 00:05:09,650 --> 00:05:14,260 What if what if I have some missing values in my data for this? 67 00:05:14,270 --> 00:05:19,490 What I can do, I can write these blocks of code in my tribe block. 68 00:05:19,700 --> 00:05:28,360 So here I going to say it is my try block and whatever exception will come that will get handed by basically 69 00:05:28,360 --> 00:05:29,580 you except block. 70 00:05:29,810 --> 00:05:35,900 So here I am going to say very first I have to write my blog and here I have to say whenever there is 71 00:05:35,900 --> 00:05:36,940 some exception. 72 00:05:36,950 --> 00:05:41,030 So in such case you have to just append zero in your list. 73 00:05:41,060 --> 00:05:46,700 I'm just going to append to in my list, make sure you have proper indentation. 74 00:05:46,940 --> 00:05:53,880 Otherwise the Salta lots of time at it will give my some improper conclusion. 75 00:05:54,080 --> 00:05:58,260 So make sure you have this proper indentation over here and over here. 76 00:05:58,730 --> 00:06:04,670 So just executer, it will be somewhere around one to two minutes depending upon what the specifications 77 00:06:04,670 --> 00:06:05,630 you have now. 78 00:06:05,630 --> 00:06:08,120 You will figure out your you Celldex executed. 79 00:06:08,120 --> 00:06:15,740 And if I'm going to calculate the length of my polarity, then you will see over here it has that muslum. 80 00:06:15,770 --> 00:06:22,160 Just see one here, how much complex data we have, because whenever you are going to work on some real 81 00:06:22,160 --> 00:06:25,790 one aspect, you always have that much use it. 82 00:06:25,910 --> 00:06:30,440 You don't have those data and some say some hundred rows, some thousand. 83 00:06:30,650 --> 00:06:32,270 You never had that much data. 84 00:06:32,300 --> 00:06:37,220 So now what we have to do, we have to insert this polarity in my data frame. 85 00:06:37,230 --> 00:06:39,640 So I'm just going to say, let's say very close. 86 00:06:39,650 --> 00:06:42,090 I'm just going to create a copy of my data frame. 87 00:06:42,290 --> 00:06:49,340 And in this I'm going to say I'm just going to store it in this D.F. or I can see data and after it, 88 00:06:49,340 --> 00:06:56,930 what I have to do if I'm going to define a new column in my this data frame, which is exactly polarity. 89 00:06:57,110 --> 00:07:00,490 And here I have to assign this polarity as well. 90 00:07:00,500 --> 00:07:01,070 That's it. 91 00:07:01,370 --> 00:07:08,660 And make sure if I'm going to execute it and if I'm going to call, I had a word you will figure out 92 00:07:08,870 --> 00:07:16,040 a new column with polarity that contains polarity with respect to each and every summary has been added 93 00:07:16,040 --> 00:07:16,190 in. 94 00:07:16,970 --> 00:07:23,420 So that is exactly what sentiment analysis with respect to the summary column, in a similar way, you 95 00:07:23,420 --> 00:07:27,810 can perform sentiment analysis with respect to this text column as well. 96 00:07:27,830 --> 00:07:28,680 It's all up to you. 97 00:07:28,700 --> 00:07:34,160 So let's go ahead with our NextRadio statement, the second one in which I have to perform exploratory 98 00:07:34,220 --> 00:07:38,230 analysis, which is easier for the positive sentences. 99 00:07:38,540 --> 00:07:43,140 So here I'm just going to say wherever polarity will be greater than zero. 100 00:07:43,160 --> 00:07:45,980 That is exactly my positive sentence. 101 00:07:46,070 --> 00:07:54,830 So here I will say that if I have a defined filter, which is exactly data of polarity, must be greater 102 00:07:54,830 --> 00:07:57,520 than zero, must be greater than zero. 103 00:07:57,650 --> 00:07:58,940 This is exactly my filter. 104 00:07:58,970 --> 00:08:06,590 I had to just pass this filter in my data so that I will have my full to say I'm just going to store 105 00:08:06,590 --> 00:08:07,170 it somewhere. 106 00:08:07,280 --> 00:08:14,120 So this is exactly my data underscore positive, just executed and ifs in this data and it's so positive, 107 00:08:14,390 --> 00:08:20,030 a fine way to call shape or you will see that huge amount of data. 108 00:08:20,390 --> 00:08:27,920 Now let's say you have to perform your exploratory analysis on this data, on the positive data frame. 109 00:08:28,160 --> 00:08:32,090 It means you have to figure out you're in the summary column. 110 00:08:32,090 --> 00:08:33,440 In this summary column. 111 00:08:33,660 --> 00:08:39,620 What are those keywords on which user is going to focus on? 112 00:08:39,650 --> 00:08:45,350 Or you can say what are those keywords on which user is going to emphasize? 113 00:08:45,450 --> 00:08:51,860 So whenever you have that scenario where you have huge chunk of data and from that huge chunk of data, 114 00:08:52,010 --> 00:08:58,910 you have to extract some popular keywords in such scenarios, always go ahead with your word cloud. 115 00:08:59,210 --> 00:09:01,580 So what exactly is word for word? 116 00:09:01,580 --> 00:09:07,280 Cloud will always reflect those words that has a higher priority in data. 117 00:09:07,550 --> 00:09:13,560 So here I'm just going to say very first, if you haven't installed in the cloud so you guys can simply 118 00:09:13,560 --> 00:09:16,880 install using this install word cloud. 119 00:09:16,910 --> 00:09:18,450 So I have already installed it. 120 00:09:18,450 --> 00:09:21,050 So it makes no sense at all to install it again. 121 00:09:21,200 --> 00:09:30,890 So now from this word cloud module, I'm going to say I have to import my word cloud and I have to import 122 00:09:30,980 --> 00:09:34,840 something else, which is exactly my word. 123 00:09:35,090 --> 00:09:40,460 So what exactly are stored words like the hishe it they then his him. 124 00:09:40,460 --> 00:09:40,790 Her. 125 00:09:40,970 --> 00:09:46,310 So these are exactly my strong words because it makes no sense at all in your analysis. 126 00:09:46,520 --> 00:09:47,780 So just execute it. 127 00:09:47,810 --> 00:09:52,970 Now what we have to do, we have to create a unique Stallworth's. 128 00:09:52,970 --> 00:09:57,230 So I'm just going to say I just need my unique Stallworth's now. 129 00:09:57,230 --> 00:09:59,440 I'm just going to call a set on this because. 130 00:09:59,920 --> 00:10:05,200 Providing my uniqueness, let's say I'm just going to store it somewhere else, like, say, store it 131 00:10:05,200 --> 00:10:07,930 in stores, just execute it now. 132 00:10:07,960 --> 00:10:14,590 Now, what I'm going to do and let's see if on this stage and it's to the I'm just going to call ahead 133 00:10:15,040 --> 00:10:17,240 to get a preview of my date of frame. 134 00:10:17,290 --> 00:10:20,620 You will see this is exactly my summary column, Lexcen. 135 00:10:21,400 --> 00:10:26,910 I need I need the entire data of this summary column. 136 00:10:27,100 --> 00:10:29,530 So for this, we can join this column. 137 00:10:29,890 --> 00:10:31,900 So far, this is what I'm going to do. 138 00:10:31,900 --> 00:10:36,280 I'm just going to say I have to join this summary column here. 139 00:10:36,280 --> 00:10:37,630 I'm going to use my joint. 140 00:10:37,870 --> 00:10:42,420 And very first, I have to access my data, underscore positive. 141 00:10:42,430 --> 00:10:45,640 And in this, I have to access this summary. 142 00:10:45,850 --> 00:10:50,780 Let's say after doing Jan, you have to store it, say, somewhere else. 143 00:10:50,950 --> 00:10:55,270 So here I'm just going to say let's say I have to store it in total text. 144 00:10:55,270 --> 00:10:58,480 It's all up to you, whatever, whatever name you want to assign. 145 00:10:58,930 --> 00:11:03,660 And after it, what I'm going to do now, I'm just going to execute. 146 00:11:03,670 --> 00:11:06,820 It will take some couple of seconds, get executed successfully. 147 00:11:06,820 --> 00:11:13,510 And if I'm going to call a line over there now, you will figure out what exactly the total and look 148 00:11:13,510 --> 00:11:16,230 at it, how much huge amount of data you have. 149 00:11:16,480 --> 00:11:24,370 And let's see if you have to print some first ten thousand so you guys can print using this zero to 150 00:11:24,370 --> 00:11:31,730 ten thousand, you will see this is that cost ten thousand worth of this total text string. 151 00:11:31,900 --> 00:11:34,000 Findus use it, you will figure it out. 152 00:11:34,000 --> 00:11:36,910 You have this dot and some special characters as well. 153 00:11:37,120 --> 00:11:39,580 It means you have to remove all these things. 154 00:11:39,850 --> 00:11:45,430 So now what I am going to do, I'm just going to say I have two very first import my regular expression 155 00:11:45,430 --> 00:11:51,940 module that will highly helpful for us whenever we have to deal with our text it Aleksa whenever we 156 00:11:51,940 --> 00:11:56,470 have to clean our texada or whenever we have to do some modification or text data. 157 00:11:56,680 --> 00:11:59,150 So always go ahead with this added module. 158 00:11:59,410 --> 00:12:05,620 So here what I am going to do and it's going to call my sub, which is exactly my substitute function 159 00:12:05,770 --> 00:12:07,510 in this module. 160 00:12:07,690 --> 00:12:17,340 And here I am going to say, oh, except A to Z and capitally to that, whatever we have just eliminated. 161 00:12:17,350 --> 00:12:25,870 So for this I'm going to say, except you have to use this operator A to Z and capitally to that as 162 00:12:25,870 --> 00:12:26,110 well. 163 00:12:26,110 --> 00:12:27,240 So capitally does it. 164 00:12:27,430 --> 00:12:32,470 So whenever you have this, just replace it with some Espace. 165 00:12:32,620 --> 00:12:35,530 That's what this substitute function will do. 166 00:12:35,750 --> 00:12:39,100 Then you have to tell on what data you have to perform this operation. 167 00:12:39,330 --> 00:12:42,780 So I have to perform this operation on total Alesco text. 168 00:12:43,060 --> 00:12:45,760 Let's say I have to update it as well. 169 00:12:45,770 --> 00:12:47,920 So I'm just going to update you using this. 170 00:12:48,340 --> 00:12:49,440 Just execute it. 171 00:12:49,450 --> 00:12:51,350 It will take some couple of seconds. 172 00:12:51,820 --> 00:12:56,610 Now, this blocks of code gets successfully executed over here. 173 00:12:56,920 --> 00:13:03,740 And if I say I'm going to print this total underscore text, let's say I have to print my host. 174 00:13:04,000 --> 00:13:05,050 Twenty thousand. 175 00:13:05,170 --> 00:13:06,700 It's all up to know. 176 00:13:06,700 --> 00:13:08,160 You will figure out over here. 177 00:13:08,500 --> 00:13:10,810 Now you have some extra spaces. 178 00:13:10,810 --> 00:13:11,490 Look at here. 179 00:13:11,500 --> 00:13:12,130 Look at here. 180 00:13:12,130 --> 00:13:13,560 You have some extra spaces. 181 00:13:13,570 --> 00:13:16,300 It means it's still you have to clean this data. 182 00:13:16,600 --> 00:13:25,030 See, whenever you are going to work from some real world aspect, almost 70 to 80 percent of your total 183 00:13:25,030 --> 00:13:29,880 time will get spent and you are cleaning and in your data privacy. 184 00:13:30,340 --> 00:13:32,860 So always be patient. 185 00:13:33,100 --> 00:13:35,800 Now we have to remove this access space. 186 00:13:36,280 --> 00:13:40,540 So for this, I'm going to say it, Audie, dot substitute. 187 00:13:40,840 --> 00:13:46,330 And this time I have to remove wherever I have more than one extra spaces. 188 00:13:46,330 --> 00:13:50,200 I have to remove it with one extra space. 189 00:13:50,470 --> 00:13:53,140 So this is that way of writing it. 190 00:13:53,140 --> 00:13:56,910 Then I have to say on what data I have to perform this. 191 00:13:56,920 --> 00:14:03,170 So basically on total in this context, then I am going to say I have to update it as well. 192 00:14:03,490 --> 00:14:05,130 So just execute it. 193 00:14:05,150 --> 00:14:07,160 It will take some couple of seconds. 194 00:14:07,390 --> 00:14:16,660 Now, if now I'm going to actually print my first ten thousand, now you will see you don't have any 195 00:14:16,660 --> 00:14:19,440 extra spaces in this data. 196 00:14:19,720 --> 00:14:26,100 So you are up to some extent you data somehow already you still you have some dirtiness in your data, 197 00:14:26,120 --> 00:14:27,370 but that's okay. 198 00:14:27,550 --> 00:14:33,790 So now what you can do, you can easily create word cloud of this huge chunk of data, because from 199 00:14:33,790 --> 00:14:40,990 this huge chunk of data you need some those highlighted given that has some higher priority. 200 00:14:41,380 --> 00:14:46,680 So what I'm going to do, I'm just going to initialize my workflow, which is exactly this one. 201 00:14:46,690 --> 00:14:48,970 Here you have all your custom parameters. 202 00:14:48,970 --> 00:14:49,570 Look at that. 203 00:14:49,900 --> 00:14:50,890 What is your rate? 204 00:14:50,890 --> 00:14:54,260 What is your height and what are your all these things? 205 00:14:54,270 --> 00:14:57,800 What are your top words and all these different different things. 206 00:14:58,270 --> 00:14:59,530 So let's say I have to agenda. 207 00:14:59,810 --> 00:15:07,250 Workload having my own specification, so here I'm going to say let's say I want which sage of like 208 00:15:07,250 --> 00:15:13,070 say a thousand and after it I want pyt of like say five hundred. 209 00:15:13,080 --> 00:15:19,670 Then in this Stallworth's I have to mention mine is doverton that I have created earlier. 210 00:15:20,030 --> 00:15:24,740 Know what you have to do using this generate. 211 00:15:25,010 --> 00:15:31,370 You have to generate your workload and here you have to pass your total on the score text. 212 00:15:31,580 --> 00:15:33,610 Let's say I have to store it somewhere else. 213 00:15:33,620 --> 00:15:36,650 Let's say I'm going to store it in a word cloud. 214 00:15:36,680 --> 00:15:38,320 This is exactly my cloud. 215 00:15:38,600 --> 00:15:43,790 Now, what you have to do, let's say I have to set my own secure site. 216 00:15:43,800 --> 00:15:49,690 So here I am with the safety altitude figger fixie age of 15. 217 00:15:49,700 --> 00:15:51,170 Come on, let's say 15. 218 00:15:51,380 --> 00:15:57,740 Five is exactly my Vinda site in which my word cloud is going to be represented over here. 219 00:15:57,890 --> 00:16:03,830 Now I have to showcase this world using this I am sure function. 220 00:16:03,830 --> 00:16:08,450 And here I have to mention my word cloud, which is exactly this one. 221 00:16:08,630 --> 00:16:12,770 And if I'm going to execute it, it will take some couple of seconds over here. 222 00:16:12,800 --> 00:16:14,540 So this is exactly that word. 223 00:16:14,750 --> 00:16:21,790 But you will see if you still have some access so you can disable this access using this particular 224 00:16:21,800 --> 00:16:25,640 axis, bypassing of parameter again executed. 225 00:16:25,640 --> 00:16:29,060 It will again take some couple of seconds to from this word cloud. 226 00:16:29,060 --> 00:16:34,640 If you have to conclude something, you will figure out this delicious love good fast. 227 00:16:34,640 --> 00:16:39,320 And there are many such key words that are highly prioritized. 228 00:16:39,440 --> 00:16:43,070 It means users are going to focus on these words. 229 00:16:43,070 --> 00:16:47,270 It means most of the time users are going to use these words in a similar way. 230 00:16:47,270 --> 00:16:53,840 You can perform all this analysis, all this analysis for your negative sentiment as well as in the 231 00:16:53,840 --> 00:16:54,590 upcoming session. 232 00:16:54,590 --> 00:16:59,400 We are going to deal with this statement and many other problems statement as well. 233 00:16:59,650 --> 00:17:01,490 Hope you love this session very much. 234 00:17:01,730 --> 00:17:02,650 Thank you, guys. 235 00:17:02,660 --> 00:17:04,280 How nice to keep learning. 236 00:17:04,520 --> 00:17:06,410 Keep growing, keep motivated.