1
00:00:00,150 --> 00:00:06,000
Halaal, before going deep dive into the session, let's have a quick recap of what we have done in

2
00:00:06,000 --> 00:00:07,010
our previous session.

3
00:00:07,260 --> 00:00:12,440
So we have basically read our data using this ESKIL database very first.

4
00:00:12,450 --> 00:00:15,060
We have established our connection to the database.

5
00:00:15,390 --> 00:00:23,820
Then using this read as Calgary, we have successfully read the data from this table and using our online

6
00:00:23,820 --> 00:00:26,470
compiler, we all get to know ya.

7
00:00:26,730 --> 00:00:31,520
This is exactly the table that is available inside that database.

8
00:00:31,980 --> 00:00:38,370
So after eight, we have also readable data using our real key function, which is exactly.

9
00:00:38,370 --> 00:00:39,290
Let me show that.

10
00:00:39,300 --> 00:00:39,660
Yeah.

11
00:00:39,660 --> 00:00:40,960
Which is exactly this one.

12
00:00:41,250 --> 00:00:46,290
So if you will ask my advice, I will always advise you to go through with this database approach.

13
00:00:46,530 --> 00:00:48,720
So in this session we have this assignment.

14
00:00:48,960 --> 00:00:55,140
The very first statement we have to deal with that is exactly we have to perform sentiment analysis

15
00:00:55,140 --> 00:00:55,930
on our data.

16
00:00:56,250 --> 00:00:59,970
So what exactly is sentiment analysis?

17
00:01:00,000 --> 00:01:07,970
So sentiment analysis is all about determining what feelings a particular person has, let's say, and

18
00:01:07,980 --> 00:01:11,470
saying, yeah, that celebrity looks awesome.

19
00:01:11,490 --> 00:01:15,720
It means I have a positive sentiment for that particular celebrity.

20
00:01:16,170 --> 00:01:17,000
Same thing.

21
00:01:17,010 --> 00:01:18,740
Yeah, that guy is ugly.

22
00:01:18,750 --> 00:01:23,510
It means I have a negative sentiment for that particular guy that I'm seeing.

23
00:01:24,090 --> 00:01:26,530
This Coffey's average not not that much good.

24
00:01:26,550 --> 00:01:28,350
Not not that much bad.

25
00:01:28,510 --> 00:01:31,400
So it means I have a neutral sentiment for that coffee.

26
00:01:31,710 --> 00:01:32,830
So that's all about that.

27
00:01:32,850 --> 00:01:36,200
That's the idea behind what exactly is sentiment analysis.

28
00:01:36,360 --> 00:01:40,920
So with respect to data, you have to perform your sentiment analysis.

29
00:01:40,920 --> 00:01:47,460
It means with respect to the summary column, you have to perform this sentiment analysis so that if

30
00:01:47,940 --> 00:01:54,600
you need some external libraries to perform this sentiment analysis, because writing code from scratch

31
00:01:54,810 --> 00:01:57,200
will definitely very complex over here.

32
00:01:57,420 --> 00:02:03,300
So Python gives that functionality so that we don't have to write that much huge number of lines to

33
00:02:03,300 --> 00:02:10,740
perform sentiment analysis so that if us to have to install our external model, which is exactly my

34
00:02:11,460 --> 00:02:16,370
text blob's, I'm going to say install or text blob.

35
00:02:16,560 --> 00:02:20,600
So if you haven't install it, just install using the blocks of code.

36
00:02:20,880 --> 00:02:25,230
Now after what you have to do, you have to just import this tax block.

37
00:02:25,230 --> 00:02:34,730
So I'm going to say from this text block module, you have to import a class which is exactly my text

38
00:02:34,740 --> 00:02:43,260
got class, so I'm just going to say import my text blob class, just BASTABLE over here.

39
00:02:43,260 --> 00:02:45,110
And it is exactly that class.

40
00:02:45,240 --> 00:02:48,700
Now, what you have to do, let's say let me show the thing.

41
00:02:49,020 --> 00:02:56,430
Let's say I'm just going to say deal of summary, summary of zero.

42
00:02:56,610 --> 00:03:03,660
So if let's say if I'm going to print it, you will see this is exactly the very first summary given

43
00:03:03,660 --> 00:03:06,500
by my user, given by my some customer.

44
00:03:06,750 --> 00:03:11,850
And let's say I have to perform sentiment analysis on this particular text.

45
00:03:12,200 --> 00:03:14,460
See, I'm just going to store it in my text.

46
00:03:14,730 --> 00:03:17,930
And if I were to print it, that's not a fancy task.

47
00:03:18,210 --> 00:03:22,350
Now you have to perform sentiment analysis using this class.

48
00:03:22,530 --> 00:03:27,540
So I'm just going to say I have to pass this text in my tax law.

49
00:03:27,750 --> 00:03:36,780
And on this, I have to call my sentiment dot polarity because we have to state what exactly the popularity

50
00:03:36,780 --> 00:03:42,960
of that particular sentiment, whether it is plus one, minus one or between plus one to minus one or

51
00:03:43,110 --> 00:03:51,960
zero, because plus one definitely indicates, yeah, it's a positive polarity or you can see it's positive

52
00:03:51,960 --> 00:03:58,800
sentiment, whereas zero refers to my neutral sentiment, minus one refers to my negative sentiment

53
00:03:58,800 --> 00:04:03,150
just executed even though it has that much sentiment.

54
00:04:03,210 --> 00:04:10,680
Yeah, it means it seems to have a very positive sentiment with respect to this particular summary.

55
00:04:10,830 --> 00:04:16,230
So it means you have to perform the sentiment analysis with respect to each and every summary.

56
00:04:16,440 --> 00:04:21,900
So I'm just going to say very first, I have to define a blank list, which is exact polarity.

57
00:04:22,140 --> 00:04:27,750
So whatever polarity I will get with respect to each and every summary, I'm just going to restore in

58
00:04:27,750 --> 00:04:28,410
this list.

59
00:04:28,710 --> 00:04:35,370
After what I have to do, I have to just as for I in deal of summary.

60
00:04:35,640 --> 00:04:37,740
So I have to just as this one.

61
00:04:37,760 --> 00:04:44,360
Now I'm going to say on each and every I am just going to say very first I have to exit this test block

62
00:04:44,730 --> 00:04:53,610
and in this I have to pass my this I and on these I'm going to say dot sentiment, dot polarity.

63
00:04:53,820 --> 00:04:59,360
Then whatever polarity it will be, I'm just going to find in this.

64
00:05:00,140 --> 00:05:07,500
At least that said, that's what that's what is our best way to find sentiment analysis on your data.

65
00:05:07,910 --> 00:05:09,430
But there is a hack.

66
00:05:09,650 --> 00:05:14,260
What if what if I have some missing values in my data for this?

67
00:05:14,270 --> 00:05:19,490
What I can do, I can write these blocks of code in my tribe block.

68
00:05:19,700 --> 00:05:28,360
So here I going to say it is my try block and whatever exception will come that will get handed by basically

69
00:05:28,360 --> 00:05:29,580
you except block.

70
00:05:29,810 --> 00:05:35,900
So here I am going to say very first I have to write my blog and here I have to say whenever there is

71
00:05:35,900 --> 00:05:36,940
some exception.

72
00:05:36,950 --> 00:05:41,030
So in such case you have to just append zero in your list.

73
00:05:41,060 --> 00:05:46,700
I'm just going to append to in my list, make sure you have proper indentation.

74
00:05:46,940 --> 00:05:53,880
Otherwise the Salta lots of time at it will give my some improper conclusion.

75
00:05:54,080 --> 00:05:58,260
So make sure you have this proper indentation over here and over here.

76
00:05:58,730 --> 00:06:04,670
So just executer, it will be somewhere around one to two minutes depending upon what the specifications

77
00:06:04,670 --> 00:06:05,630
you have now.

78
00:06:05,630 --> 00:06:08,120
You will figure out your you Celldex executed.

79
00:06:08,120 --> 00:06:15,740
And if I'm going to calculate the length of my polarity, then you will see over here it has that muslum.

80
00:06:15,770 --> 00:06:22,160
Just see one here, how much complex data we have, because whenever you are going to work on some real

81
00:06:22,160 --> 00:06:25,790
one aspect, you always have that much use it.

82
00:06:25,910 --> 00:06:30,440
You don't have those data and some say some hundred rows, some thousand.

83
00:06:30,650 --> 00:06:32,270
You never had that much data.

84
00:06:32,300 --> 00:06:37,220
So now what we have to do, we have to insert this polarity in my data frame.

85
00:06:37,230 --> 00:06:39,640
So I'm just going to say, let's say very close.

86
00:06:39,650 --> 00:06:42,090
I'm just going to create a copy of my data frame.

87
00:06:42,290 --> 00:06:49,340
And in this I'm going to say I'm just going to store it in this D.F. or I can see data and after it,

88
00:06:49,340 --> 00:06:56,930
what I have to do if I'm going to define a new column in my this data frame, which is exactly polarity.

89
00:06:57,110 --> 00:07:00,490
And here I have to assign this polarity as well.

90
00:07:00,500 --> 00:07:01,070
That's it.

91
00:07:01,370 --> 00:07:08,660
And make sure if I'm going to execute it and if I'm going to call, I had a word you will figure out

92
00:07:08,870 --> 00:07:16,040
a new column with polarity that contains polarity with respect to each and every summary has been added

93
00:07:16,040 --> 00:07:16,190
in.

94
00:07:16,970 --> 00:07:23,420
So that is exactly what sentiment analysis with respect to the summary column, in a similar way, you

95
00:07:23,420 --> 00:07:27,810
can perform sentiment analysis with respect to this text column as well.

96
00:07:27,830 --> 00:07:28,680
It's all up to you.

97
00:07:28,700 --> 00:07:34,160
So let's go ahead with our NextRadio statement, the second one in which I have to perform exploratory

98
00:07:34,220 --> 00:07:38,230
analysis, which is easier for the positive sentences.

99
00:07:38,540 --> 00:07:43,140
So here I'm just going to say wherever polarity will be greater than zero.

100
00:07:43,160 --> 00:07:45,980
That is exactly my positive sentence.

101
00:07:46,070 --> 00:07:54,830
So here I will say that if I have a defined filter, which is exactly data of polarity, must be greater

102
00:07:54,830 --> 00:07:57,520
than zero, must be greater than zero.

103
00:07:57,650 --> 00:07:58,940
This is exactly my filter.

104
00:07:58,970 --> 00:08:06,590
I had to just pass this filter in my data so that I will have my full to say I'm just going to store

105
00:08:06,590 --> 00:08:07,170
it somewhere.

106
00:08:07,280 --> 00:08:14,120
So this is exactly my data underscore positive, just executed and ifs in this data and it's so positive,

107
00:08:14,390 --> 00:08:20,030
a fine way to call shape or you will see that huge amount of data.

108
00:08:20,390 --> 00:08:27,920
Now let's say you have to perform your exploratory analysis on this data, on the positive data frame.

109
00:08:28,160 --> 00:08:32,090
It means you have to figure out you're in the summary column.

110
00:08:32,090 --> 00:08:33,440
In this summary column.

111
00:08:33,660 --> 00:08:39,620
What are those keywords on which user is going to focus on?

112
00:08:39,650 --> 00:08:45,350
Or you can say what are those keywords on which user is going to emphasize?

113
00:08:45,450 --> 00:08:51,860
So whenever you have that scenario where you have huge chunk of data and from that huge chunk of data,

114
00:08:52,010 --> 00:08:58,910
you have to extract some popular keywords in such scenarios, always go ahead with your word cloud.

115
00:08:59,210 --> 00:09:01,580
So what exactly is word for word?

116
00:09:01,580 --> 00:09:07,280
Cloud will always reflect those words that has a higher priority in data.

117
00:09:07,550 --> 00:09:13,560
So here I'm just going to say very first, if you haven't installed in the cloud so you guys can simply

118
00:09:13,560 --> 00:09:16,880
install using this install word cloud.

119
00:09:16,910 --> 00:09:18,450
So I have already installed it.

120
00:09:18,450 --> 00:09:21,050
So it makes no sense at all to install it again.

121
00:09:21,200 --> 00:09:30,890
So now from this word cloud module, I'm going to say I have to import my word cloud and I have to import

122
00:09:30,980 --> 00:09:34,840
something else, which is exactly my word.

123
00:09:35,090 --> 00:09:40,460
So what exactly are stored words like the hishe it they then his him.

124
00:09:40,460 --> 00:09:40,790
Her.

125
00:09:40,970 --> 00:09:46,310
So these are exactly my strong words because it makes no sense at all in your analysis.

126
00:09:46,520 --> 00:09:47,780
So just execute it.

127
00:09:47,810 --> 00:09:52,970
Now what we have to do, we have to create a unique Stallworth's.

128
00:09:52,970 --> 00:09:57,230
So I'm just going to say I just need my unique Stallworth's now.

129
00:09:57,230 --> 00:09:59,440
I'm just going to call a set on this because.

130
00:09:59,920 --> 00:10:05,200
Providing my uniqueness, let's say I'm just going to store it somewhere else, like, say, store it

131
00:10:05,200 --> 00:10:07,930
in stores, just execute it now.

132
00:10:07,960 --> 00:10:14,590
Now, what I'm going to do and let's see if on this stage and it's to the I'm just going to call ahead

133
00:10:15,040 --> 00:10:17,240
to get a preview of my date of frame.

134
00:10:17,290 --> 00:10:20,620
You will see this is exactly my summary column, Lexcen.

135
00:10:21,400 --> 00:10:26,910
I need I need the entire data of this summary column.

136
00:10:27,100 --> 00:10:29,530
So for this, we can join this column.

137
00:10:29,890 --> 00:10:31,900
So far, this is what I'm going to do.

138
00:10:31,900 --> 00:10:36,280
I'm just going to say I have to join this summary column here.

139
00:10:36,280 --> 00:10:37,630
I'm going to use my joint.

140
00:10:37,870 --> 00:10:42,420
And very first, I have to access my data, underscore positive.

141
00:10:42,430 --> 00:10:45,640
And in this, I have to access this summary.

142
00:10:45,850 --> 00:10:50,780
Let's say after doing Jan, you have to store it, say, somewhere else.

143
00:10:50,950 --> 00:10:55,270
So here I'm just going to say let's say I have to store it in total text.

144
00:10:55,270 --> 00:10:58,480
It's all up to you, whatever, whatever name you want to assign.

145
00:10:58,930 --> 00:11:03,660
And after it, what I'm going to do now, I'm just going to execute.

146
00:11:03,670 --> 00:11:06,820
It will take some couple of seconds, get executed successfully.

147
00:11:06,820 --> 00:11:13,510
And if I'm going to call a line over there now, you will figure out what exactly the total and look

148
00:11:13,510 --> 00:11:16,230
at it, how much huge amount of data you have.

149
00:11:16,480 --> 00:11:24,370
And let's see if you have to print some first ten thousand so you guys can print using this zero to

150
00:11:24,370 --> 00:11:31,730
ten thousand, you will see this is that cost ten thousand worth of this total text string.

151
00:11:31,900 --> 00:11:34,000
Findus use it, you will figure it out.

152
00:11:34,000 --> 00:11:36,910
You have this dot and some special characters as well.

153
00:11:37,120 --> 00:11:39,580
It means you have to remove all these things.

154
00:11:39,850 --> 00:11:45,430
So now what I am going to do, I'm just going to say I have two very first import my regular expression

155
00:11:45,430 --> 00:11:51,940
module that will highly helpful for us whenever we have to deal with our text it Aleksa whenever we

156
00:11:51,940 --> 00:11:56,470
have to clean our texada or whenever we have to do some modification or text data.

157
00:11:56,680 --> 00:11:59,150
So always go ahead with this added module.

158
00:11:59,410 --> 00:12:05,620
So here what I am going to do and it's going to call my sub, which is exactly my substitute function

159
00:12:05,770 --> 00:12:07,510
in this module.

160
00:12:07,690 --> 00:12:17,340
And here I am going to say, oh, except A to Z and capitally to that, whatever we have just eliminated.

161
00:12:17,350 --> 00:12:25,870
So for this I'm going to say, except you have to use this operator A to Z and capitally to that as

162
00:12:25,870 --> 00:12:26,110
well.

163
00:12:26,110 --> 00:12:27,240
So capitally does it.

164
00:12:27,430 --> 00:12:32,470
So whenever you have this, just replace it with some Espace.

165
00:12:32,620 --> 00:12:35,530
That's what this substitute function will do.

166
00:12:35,750 --> 00:12:39,100
Then you have to tell on what data you have to perform this operation.

167
00:12:39,330 --> 00:12:42,780
So I have to perform this operation on total Alesco text.

168
00:12:43,060 --> 00:12:45,760
Let's say I have to update it as well.

169
00:12:45,770 --> 00:12:47,920
So I'm just going to update you using this.

170
00:12:48,340 --> 00:12:49,440
Just execute it.

171
00:12:49,450 --> 00:12:51,350
It will take some couple of seconds.

172
00:12:51,820 --> 00:12:56,610
Now, this blocks of code gets successfully executed over here.

173
00:12:56,920 --> 00:13:03,740
And if I say I'm going to print this total underscore text, let's say I have to print my host.

174
00:13:04,000 --> 00:13:05,050
Twenty thousand.

175
00:13:05,170 --> 00:13:06,700
It's all up to know.

176
00:13:06,700 --> 00:13:08,160
You will figure out over here.

177
00:13:08,500 --> 00:13:10,810
Now you have some extra spaces.

178
00:13:10,810 --> 00:13:11,490
Look at here.

179
00:13:11,500 --> 00:13:12,130
Look at here.

180
00:13:12,130 --> 00:13:13,560
You have some extra spaces.

181
00:13:13,570 --> 00:13:16,300
It means it's still you have to clean this data.

182
00:13:16,600 --> 00:13:25,030
See, whenever you are going to work from some real world aspect, almost 70 to 80 percent of your total

183
00:13:25,030 --> 00:13:29,880
time will get spent and you are cleaning and in your data privacy.

184
00:13:30,340 --> 00:13:32,860
So always be patient.

185
00:13:33,100 --> 00:13:35,800
Now we have to remove this access space.

186
00:13:36,280 --> 00:13:40,540
So for this, I'm going to say it, Audie, dot substitute.

187
00:13:40,840 --> 00:13:46,330
And this time I have to remove wherever I have more than one extra spaces.

188
00:13:46,330 --> 00:13:50,200
I have to remove it with one extra space.

189
00:13:50,470 --> 00:13:53,140
So this is that way of writing it.

190
00:13:53,140 --> 00:13:56,910
Then I have to say on what data I have to perform this.

191
00:13:56,920 --> 00:14:03,170
So basically on total in this context, then I am going to say I have to update it as well.

192
00:14:03,490 --> 00:14:05,130
So just execute it.

193
00:14:05,150 --> 00:14:07,160
It will take some couple of seconds.

194
00:14:07,390 --> 00:14:16,660
Now, if now I'm going to actually print my first ten thousand, now you will see you don't have any

195
00:14:16,660 --> 00:14:19,440
extra spaces in this data.

196
00:14:19,720 --> 00:14:26,100
So you are up to some extent you data somehow already you still you have some dirtiness in your data,

197
00:14:26,120 --> 00:14:27,370
but that's okay.

198
00:14:27,550 --> 00:14:33,790
So now what you can do, you can easily create word cloud of this huge chunk of data, because from

199
00:14:33,790 --> 00:14:40,990
this huge chunk of data you need some those highlighted given that has some higher priority.

200
00:14:41,380 --> 00:14:46,680
So what I'm going to do, I'm just going to initialize my workflow, which is exactly this one.

201
00:14:46,690 --> 00:14:48,970
Here you have all your custom parameters.

202
00:14:48,970 --> 00:14:49,570
Look at that.

203
00:14:49,900 --> 00:14:50,890
What is your rate?

204
00:14:50,890 --> 00:14:54,260
What is your height and what are your all these things?

205
00:14:54,270 --> 00:14:57,800
What are your top words and all these different different things.

206
00:14:58,270 --> 00:14:59,530
So let's say I have to agenda.

207
00:14:59,810 --> 00:15:07,250
Workload having my own specification, so here I'm going to say let's say I want which sage of like

208
00:15:07,250 --> 00:15:13,070
say a thousand and after it I want pyt of like say five hundred.

209
00:15:13,080 --> 00:15:19,670
Then in this Stallworth's I have to mention mine is doverton that I have created earlier.

210
00:15:20,030 --> 00:15:24,740
Know what you have to do using this generate.

211
00:15:25,010 --> 00:15:31,370
You have to generate your workload and here you have to pass your total on the score text.

212
00:15:31,580 --> 00:15:33,610
Let's say I have to store it somewhere else.

213
00:15:33,620 --> 00:15:36,650
Let's say I'm going to store it in a word cloud.

214
00:15:36,680 --> 00:15:38,320
This is exactly my cloud.

215
00:15:38,600 --> 00:15:43,790
Now, what you have to do, let's say I have to set my own secure site.

216
00:15:43,800 --> 00:15:49,690
So here I am with the safety altitude figger fixie age of 15.

217
00:15:49,700 --> 00:15:51,170
Come on, let's say 15.

218
00:15:51,380 --> 00:15:57,740
Five is exactly my Vinda site in which my word cloud is going to be represented over here.

219
00:15:57,890 --> 00:16:03,830
Now I have to showcase this world using this I am sure function.

220
00:16:03,830 --> 00:16:08,450
And here I have to mention my word cloud, which is exactly this one.

221
00:16:08,630 --> 00:16:12,770
And if I'm going to execute it, it will take some couple of seconds over here.

222
00:16:12,800 --> 00:16:14,540
So this is exactly that word.

223
00:16:14,750 --> 00:16:21,790
But you will see if you still have some access so you can disable this access using this particular

224
00:16:21,800 --> 00:16:25,640
axis, bypassing of parameter again executed.

225
00:16:25,640 --> 00:16:29,060
It will again take some couple of seconds to from this word cloud.

226
00:16:29,060 --> 00:16:34,640
If you have to conclude something, you will figure out this delicious love good fast.

227
00:16:34,640 --> 00:16:39,320
And there are many such key words that are highly prioritized.

228
00:16:39,440 --> 00:16:43,070
It means users are going to focus on these words.

229
00:16:43,070 --> 00:16:47,270
It means most of the time users are going to use these words in a similar way.

230
00:16:47,270 --> 00:16:53,840
You can perform all this analysis, all this analysis for your negative sentiment as well as in the

231
00:16:53,840 --> 00:16:54,590
upcoming session.

232
00:16:54,590 --> 00:16:59,400
We are going to deal with this statement and many other problems statement as well.

233
00:16:59,650 --> 00:17:01,490
Hope you love this session very much.

234
00:17:01,730 --> 00:17:02,650
Thank you, guys.

235
00:17:02,660 --> 00:17:04,280
How nice to keep learning.

236
00:17:04,520 --> 00:17:06,410
Keep growing, keep motivated.