1 00:00:00,266 --> 00:00:02,600 Hello and welcome to this art tutorial. 2 00:00:02,600 --> 00:00:04,766 So now, in this new step of the cleaning process, 3 00:00:04,766 --> 00:00:07,833 we will remove all the punctuations in the reviews of the corpus. 4 00:00:08,100 --> 00:00:10,600 And it's going to be as simple as before. 5 00:00:10,600 --> 00:00:13,633 We will copy this line, paste 6 00:00:13,666 --> 00:00:17,100 it below and instead of remove numbers here 7 00:00:17,133 --> 00:00:20,133 we will input remove punctuation. 8 00:00:20,966 --> 00:00:22,266 As simple as that. 9 00:00:22,266 --> 00:00:23,600 We don't need anything else. 10 00:00:23,600 --> 00:00:28,766 And that's ready to remove any punctuation and all the reviews of the corpus. 11 00:00:29,200 --> 00:00:30,433 All right so let's check it out. 12 00:00:30,433 --> 00:00:32,966 We can actually check it out with the first review. 13 00:00:32,966 --> 00:00:35,966 Well it's really all that love this place. 14 00:00:36,033 --> 00:00:40,000 And so what we're supposed to obtain after applying this remove 15 00:00:40,000 --> 00:00:43,866 punctuation function and all the reviews of the corporate through the tmp function. 16 00:00:44,300 --> 00:00:47,300 Well, the three little dots are supposed to disappear. 17 00:00:47,566 --> 00:00:49,333 All right. So let's check it out. 18 00:00:49,333 --> 00:00:52,833 So remember in the actual version of the corpus right now, 19 00:00:52,833 --> 00:00:56,800 the first review is wow, thrilled that I love this place. 20 00:00:56,966 --> 00:01:00,066 So now let's select this new line of code, 21 00:01:00,633 --> 00:01:05,000 execute new corpus of reviews created with all the punctuation removed. 22 00:01:05,166 --> 00:01:10,166 So let's go back to the console and let's press the up arrow to get 23 00:01:10,300 --> 00:01:14,000 the line of code that is giving us access to the first review. 24 00:01:14,200 --> 00:01:16,900 Here it is. So let's press enter now. 25 00:01:16,900 --> 00:01:20,400 And as you can see, the thrilled that's disappeared. 26 00:01:20,833 --> 00:01:24,600 Therefore that's exactly what the remove punctuation function does here. 27 00:01:24,800 --> 00:01:28,666 It removes any kind of punctuation, including dot commas. 28 00:01:28,666 --> 00:01:29,100 Collins. 29 00:01:29,100 --> 00:01:32,100 Semicolon or any other kind of punctuation. 30 00:01:32,366 --> 00:01:33,100 All right. 31 00:01:33,100 --> 00:01:34,266 So next step done. 32 00:01:34,266 --> 00:01:36,433 We are ready to move on to the next step, 33 00:01:36,433 --> 00:01:40,066 which will be to remove all the non relevant words in the reviews. 34 00:01:40,266 --> 00:01:43,300 So for example if we have a look at this first review here. 35 00:01:43,300 --> 00:01:45,566 Well this is not very relevant. 36 00:01:45,566 --> 00:01:49,200 You know this doesn't give any hint on knowing whether the review is positive 37 00:01:49,200 --> 00:01:50,100 or negative. 38 00:01:50,100 --> 00:01:53,833 So this is typically a word that we don't want to have in the final sparse matrix 39 00:01:54,033 --> 00:01:55,666 because this is not relevant. 40 00:01:55,666 --> 00:01:58,666 So we will remove it in the next step of the cleaning process. 41 00:01:58,700 --> 00:02:01,633 And we'll do the same for all the other words of the same kind. 42 00:02:01,633 --> 00:02:03,366 So let's do that in the next tutorial. 43 00:02:03,366 --> 00:02:05,133 And until then enjoy machine learning.