1 00:00:00,170 --> 00:00:04,540 Hello, in the previous session, we have performed lots of people sitting on our data. 2 00:00:04,560 --> 00:00:06,340 We have deal with this. 3 00:00:06,550 --> 00:00:14,410 This are all logic because our substitute function of Audy module isn't going to be very handy in this 4 00:00:14,410 --> 00:00:15,040 use case. 5 00:00:15,060 --> 00:00:16,410 That's why we have. 6 00:00:16,410 --> 00:00:16,710 Right. 7 00:00:16,710 --> 00:00:17,780 Our own logic. 8 00:00:18,030 --> 00:00:22,410 And after this, we have used this one function to remove punctuation. 9 00:00:22,710 --> 00:00:26,930 Then we have deal with what we have is worth in our data. 10 00:00:26,940 --> 00:00:30,320 So we have to use all these logics like that. 11 00:00:30,330 --> 00:00:36,870 And in this session, we are basically going to do still lots of people sitting on our data. 12 00:00:37,170 --> 00:00:42,490 And at the end we will come up with some meaningful insights from this huge chunk of data. 13 00:00:42,630 --> 00:00:49,560 So that's a very full time way to check whether I have any some hyperlinks or some others present in 14 00:00:49,560 --> 00:00:51,060 my text column or not. 15 00:00:51,420 --> 00:00:55,610 For this, I'm just going to say final of text. 16 00:00:55,620 --> 00:00:56,150 Very first. 17 00:00:56,160 --> 00:00:58,380 I have to exit this text on this. 18 00:00:58,380 --> 00:01:08,940 I'm just going to say Star Dot contains what I have to contain basically to deep, so deep, dark some 19 00:01:09,060 --> 00:01:17,520 because I just need total number of rows that contains this SCDP in there, as are Substring to just 20 00:01:17,520 --> 00:01:18,300 execute it. 21 00:01:18,310 --> 00:01:19,410 And it will show me. 22 00:01:19,410 --> 00:01:25,560 Yeah, there are thirty five entries who have this SCDP as a substring. 23 00:01:25,830 --> 00:01:30,230 So if I'm going to select it as to be Kushima just executed. 24 00:01:30,310 --> 00:01:31,670 Still you have thirty five rule. 25 00:01:31,680 --> 00:01:38,130 It means there are thirty five rules who have dirtiness so you have to remove this. 26 00:01:38,140 --> 00:01:41,820 So let me show you where exactly it has to be. 27 00:01:42,060 --> 00:01:48,840 So for this I'm just going to copy all this stuff and here I'm just going to say I have to remove this 28 00:01:48,840 --> 00:01:51,120 so I'm just executed. 29 00:01:51,120 --> 00:01:54,360 It will show me value in the form of force and rule. 30 00:01:54,600 --> 00:02:01,050 So wherever it is false, it means that a string or you can say at that particular index, it doesn't 31 00:02:01,050 --> 00:02:06,840 contain SCDP, wherever it is true, it means it has to be all there. 32 00:02:07,170 --> 00:02:13,880 So you will see you don't have to adhere to like me show all these two draws. 33 00:02:14,160 --> 00:02:18,150 So for days I'm just going to say, please dot set option. 34 00:02:18,540 --> 00:02:23,580 And here I have to say I have to display my maximum rules as two thousand. 35 00:02:23,970 --> 00:02:29,070 So I'm going to say display dot max underscore roles. 36 00:02:29,070 --> 00:02:34,830 And here I'm just going to say which is exactly my two thousand just executed. 37 00:02:34,830 --> 00:02:42,240 It will take some couple of seconds and you will figure out this twenty first index has as DDP as a 38 00:02:42,240 --> 00:02:43,110 service shrink. 39 00:02:43,110 --> 00:02:47,000 It means you have to remove this extra deeply in this string. 40 00:02:47,190 --> 00:02:50,010 So let me show you how exactly it looks like. 41 00:02:50,010 --> 00:02:55,260 So I'm going to say final of text of twenty one just executed. 42 00:02:55,260 --> 00:03:04,530 You will see it has this hyperreal phrase, SCDP, Amazon and all these kinds of things. 43 00:03:04,560 --> 00:03:10,700 You will see this is that dirtiness that I was talking about in the previous session. 44 00:03:11,070 --> 00:03:17,460 So you have to remove this hyperlink from your data for this is what I'm going to do. 45 00:03:17,460 --> 00:03:18,030 Very first. 46 00:03:18,030 --> 00:03:23,070 You have to write some all logic over here using your armadillo's here. 47 00:03:23,070 --> 00:03:28,930 I'm going to say very first, if you haven't imposed that order, you guys can import ideas were so 48 00:03:28,950 --> 00:03:36,480 in this ardie, you have a function which is exactly my compiler function and just start to read all 49 00:03:36,480 --> 00:03:41,050 the documentation, all the parameters for this function do and all these kinds of things. 50 00:03:41,670 --> 00:03:48,340 So here I'm just going to say using its function, you guys can find out some patterns from your data. 51 00:03:48,580 --> 00:03:52,320 So for this, I have to mention some pattern over here. 52 00:03:52,860 --> 00:03:56,430 So that pattern is exactly let me write that pattern. 53 00:03:56,790 --> 00:04:04,800 The pattern is nothing but which is exactly it contains either hyper reference, which is X or if you 54 00:04:04,800 --> 00:04:09,810 will see or here and whether it will have some, let's say, skip. 55 00:04:09,870 --> 00:04:12,440 So here I would say it is exactly dippie. 56 00:04:12,870 --> 00:04:20,790 And here you have to say, doc, it means after s GDP, you have any number of characters still in space 57 00:04:21,060 --> 00:04:25,550 and after it here you have to mention this word. 58 00:04:25,610 --> 00:04:33,480 So this this W plus means W is basically my word character, which which includes basically smally to 59 00:04:33,480 --> 00:04:35,970 that capitally to that like underscore. 60 00:04:35,970 --> 00:04:44,100 And this plus indicates here more than one word character that does that exact meaning of this entire 61 00:04:44,100 --> 00:04:46,180 line of code that I have written over here. 62 00:04:46,560 --> 00:04:49,380 So this is exactly my usual pattern. 63 00:04:49,430 --> 00:04:54,080 So I'm going to say this is exactly my let's say you order an escort pattern. 64 00:04:54,480 --> 00:04:59,820 So once wherever or you can see wherever I have this pattern, you have to just add a. 65 00:05:00,220 --> 00:05:01,520 It was something else. 66 00:05:01,750 --> 00:05:07,600 So for this time, we just say, let me let me see it, let's say wherever I have this pattern so that 67 00:05:07,600 --> 00:05:12,760 if I have to exit this, you are a pattern dot certitudes. 68 00:05:13,000 --> 00:05:15,670 So to which you have to substitute it. 69 00:05:16,180 --> 00:05:20,380 So here I'm going to say wherever I have this pattern, just replace it. 70 00:05:20,590 --> 00:05:26,890 So using this, you guys can say just replace it and now you have to pass it over here. 71 00:05:26,900 --> 00:05:29,610 Let's say I'm just going to store this in somewhere else. 72 00:05:29,620 --> 00:05:29,880 Let's see. 73 00:05:29,890 --> 00:05:35,770 This is exactly my review so you guys can also print your review as well. 74 00:05:36,280 --> 00:05:41,860 Now, here you have to mention this review, which is exactly this one. 75 00:05:42,010 --> 00:05:44,710 So after that, you have to just execute it. 76 00:05:44,710 --> 00:05:50,070 Now, you will see over here you don't have any hyperlinks in your data. 77 00:05:50,380 --> 00:05:57,190 By just writing these blocks of code, you will see the power of python and power of this regular expression 78 00:05:57,190 --> 00:05:57,640 module. 79 00:05:57,820 --> 00:06:04,150 So now what you have to do, you have to just create a function and you have to just paste that logic 80 00:06:04,150 --> 00:06:09,190 over there so that you can apply this function on each and every rule of data. 81 00:06:09,220 --> 00:06:16,100 So far, this I'm just going to say my function imitating would remove underscore you are else. 82 00:06:16,140 --> 00:06:21,640 And here you have to mention, you can see here you have to pass your review after it. 83 00:06:21,670 --> 00:06:28,630 You have to just copy all this stuff and you have to just paste over there, make sure you have to provide 84 00:06:28,630 --> 00:06:30,820 right indentation as well. 85 00:06:31,210 --> 00:06:38,950 Now you have to just return this and after it what you have to do, you have to just execute these blocks 86 00:06:38,950 --> 00:06:39,440 of code. 87 00:06:39,460 --> 00:06:42,280 Now you have to apply this function on your text. 88 00:06:42,670 --> 00:06:54,580 So I'm going to say final text dot apply and you have to apply this remove, underscore you are a function. 89 00:06:54,970 --> 00:06:59,400 Once you apply this, you have to update this text as well. 90 00:06:59,770 --> 00:07:04,930 So I'm just going to update using this just executed. 91 00:07:04,930 --> 00:07:07,830 All of this stuff gets executed with some warnings. 92 00:07:07,880 --> 00:07:09,220 Don't worry at all. 93 00:07:09,550 --> 00:07:18,150 I now what I'm going to do and if let me show you a thing and if I'm just going to copy this and if 94 00:07:18,160 --> 00:07:25,270 again, I am going to print over here now, you will figure out you don't have any of the hyperlinks 95 00:07:25,270 --> 00:07:26,470 available in your data. 96 00:07:26,920 --> 00:07:30,540 Similarly, let me select some other rule, say 25. 97 00:07:30,850 --> 00:07:33,780 And here, if I want to say it is my twenty five. 98 00:07:34,170 --> 00:07:39,560 Now, you will see you don't have any any dirtiness in your data. 99 00:07:39,610 --> 00:07:47,540 It means now your data is going to ready for your analysis purposes, but you still have some doubt. 100 00:07:47,770 --> 00:07:48,210 Yeah. 101 00:07:48,790 --> 00:07:54,910 What if you had you still have some SCDP in this data or soul searching in your data. 102 00:07:55,450 --> 00:08:03,970 So let me just copy this and let me just paste over there to show you whether it contains or not. 103 00:08:03,970 --> 00:08:04,990 Just execute it. 104 00:08:04,990 --> 00:08:07,390 You will see its count is zero. 105 00:08:07,390 --> 00:08:10,670 It means you don't have any hyperlinking your data. 106 00:08:10,720 --> 00:08:18,550 And if let's say I'm going to print the data out for index, you will still observe over here. 107 00:08:18,970 --> 00:08:21,040 You have some extra value. 108 00:08:21,050 --> 00:08:28,270 Let's say this B are over here, which is exactly this one, because what if this bar is available in 109 00:08:28,270 --> 00:08:29,590 each of the text? 110 00:08:29,860 --> 00:08:32,280 So that definitely makes no sense at all. 111 00:08:32,320 --> 00:08:35,140 It means you have to remove this for this. 112 00:08:35,140 --> 00:08:41,240 If I'm going to say replace this bar with something else, let's have to replace this bar. 113 00:08:41,620 --> 00:08:46,500 So if I'm going to execute it now, you will see your B-R guess disappear. 114 00:08:46,570 --> 00:08:52,010 Now, you have to apply this to replace function on each and every two or three days. 115 00:08:52,020 --> 00:09:01,810 For this, I'm going to say for I in range zero to 10, whatever be the length of the final text for 116 00:09:01,810 --> 00:09:05,190 this, I have to exit this final text over there. 117 00:09:05,320 --> 00:09:13,420 Now, on each and every I, I have to say I have to apply this replace, so I have to apply this to 118 00:09:13,570 --> 00:09:19,110 replace and wherever I have this beyond just replace it, that's it. 119 00:09:19,120 --> 00:09:22,450 And I have to update this each and every day as well. 120 00:09:22,720 --> 00:09:25,510 So here I have to paste it now. 121 00:09:25,510 --> 00:09:26,740 Just executed. 122 00:09:26,740 --> 00:09:31,540 It will take some couple of seconds and all the stuffs gets executed now. 123 00:09:31,630 --> 00:09:38,380 Now your wait is over and it's a time to perform your workload, your presentation of data. 124 00:09:38,500 --> 00:09:47,590 So now if I'm going to say final thought, had to get a preview of how exactly my data looks like. 125 00:09:47,600 --> 00:09:54,410 So this is exactly my next feature that I have to consider for my workload representation. 126 00:09:54,730 --> 00:09:59,650 So here I'm going to say I have to use this word cloud, and this is exactly my word. 127 00:10:00,030 --> 00:10:05,130 And in this word cloud, what I have said already for let's say I have to set my custom parameter, 128 00:10:05,130 --> 00:10:11,670 what exactly is word, which is 800 and after it, I have to set some custom height, which is, let's 129 00:10:11,670 --> 00:10:13,540 say, exactly of 800. 130 00:10:14,160 --> 00:10:22,650 After doing all these things, let's say I have to say my stop words is equal to stop force that I have 131 00:10:22,650 --> 00:10:24,030 already defined earlier. 132 00:10:24,450 --> 00:10:30,320 And on this, I'm just going to say I have to generate my workload. 133 00:10:30,570 --> 00:10:37,020 So here you need some data so that if I'm just going to create my entire data for this, I'm going to 134 00:10:37,020 --> 00:10:48,330 save Dot Joachin and I have to join what I have to do on average on this final of text, let's say I'm 135 00:10:48,330 --> 00:10:57,540 going to say this is exactly my comment on this code words, and I have to just pass this in my comment 136 00:10:57,540 --> 00:11:04,500 on as code words in general function, which is exactly command underscore words very first. 137 00:11:04,500 --> 00:11:07,460 You have to execute the cell, just best tab. 138 00:11:07,470 --> 00:11:10,600 And this is exactly your comment and score. 139 00:11:10,770 --> 00:11:14,790 Now I'm going to say it will be done with my word cloud object. 140 00:11:14,790 --> 00:11:17,320 So I have this there after it. 141 00:11:17,340 --> 00:11:19,600 You have to set your own figures. 142 00:11:19,870 --> 00:11:29,580 So I'm going to say BLT Dot Prager here, you have some fixed sites like We Saijo IT Comite, whatever 143 00:11:29,580 --> 00:11:31,680 you want to assign, it's up to you. 144 00:11:32,250 --> 00:11:34,010 After that, you have to showcase it. 145 00:11:34,020 --> 00:11:42,040 So I'm going to say I am sure of what cloud, which is exactly this one after doing all this is does 146 00:11:42,090 --> 00:11:45,330 I have to disable my access as well for this? 147 00:11:45,330 --> 00:11:50,860 I have to say this axis of just as good, all this stuff. 148 00:11:51,090 --> 00:11:55,880 Now, this is a beautiful word, core of which I was talking about. 149 00:11:56,100 --> 00:11:59,160 So you can definitely come up with some conclusion here. 150 00:11:59,250 --> 00:12:01,900 This is exactly the behavior of the customer. 151 00:12:02,070 --> 00:12:06,450 It means he or she is going to perform from these these these key words. 152 00:12:06,450 --> 00:12:12,600 Most of the time he's going to Praful, these key words so you can easily examine what exactly the behavior 153 00:12:12,600 --> 00:12:13,520 of your customer. 154 00:12:13,980 --> 00:12:15,870 So that's all about this project. 155 00:12:15,910 --> 00:12:21,600 Hope you love this project very much to try to explore as much as from your own site. 156 00:12:22,230 --> 00:12:23,820 So that's all about this project. 157 00:12:23,820 --> 00:12:27,180 Guys, hope you love this session and project very much. 158 00:12:27,510 --> 00:12:28,180 Thank you. 159 00:12:28,200 --> 00:12:29,940 How nice to keep learning. 160 00:12:29,940 --> 00:12:32,180 Keep growing, keep motivated.