1 00:00:00,530 --> 00:00:07,430 So now let us move on to the removing known later characters know certain characters such as numbers 2 00:00:07,430 --> 00:00:15,020 or punctuation marks, will cause overvote detection to feel because words won't look exactly as they 3 00:00:15,020 --> 00:00:16,700 are spelled in our dictionary fight. 4 00:00:17,150 --> 00:00:23,240 For example, see if you have the last word, as you find that is you don't. 5 00:00:23,930 --> 00:00:29,570 And we didn't remove the dot at the end of the string so it wouldn't be counted as an English word because 6 00:00:29,570 --> 00:00:31,700 you that is vital. 7 00:00:31,760 --> 00:00:34,880 You always dealt with the period in a dictionary. 8 00:00:35,450 --> 00:00:41,390 So to avoid such misinterpretations, numbers and punctuation marks needs to be removed. 9 00:00:41,810 --> 00:00:48,230 OK, so the previously explained our Get English word count function calls the function remove known 10 00:00:48,230 --> 00:00:54,210 letters is now on a string to remove any number and punctuation characters from it. 11 00:00:54,470 --> 00:01:00,890 So if we look at the removal on letter astar, which takes a message start as a parameter, OK, yachties 12 00:01:00,890 --> 00:01:04,640 letters only Astana's blank dictionary of the blank list. 13 00:01:04,910 --> 00:01:10,170 Then from symbol one in the message Istria if symbol one in the letters in space hestia. 14 00:01:10,190 --> 00:01:12,970 We are saying that those onliest append symbol one. 15 00:01:13,520 --> 00:01:20,360 OK, now over here we are creating a blank list that goes on last year and after which we use a for 16 00:01:20,360 --> 00:01:27,190 loop to loop over each character in the message is an argument and next before loop checks, whether 17 00:01:27,270 --> 00:01:34,670 the character exists in the string letters and this is then we check if the character is a number or 18 00:01:34,670 --> 00:01:35,780 a punctuation mark. 19 00:01:36,290 --> 00:01:42,950 OK, if it won't exist in letters and space is the string and will be added to the list if the character 20 00:01:42,950 --> 00:01:48,260 doesn't exist in the string, it's added to the end of the list using the method over here. 21 00:01:48,730 --> 00:01:51,560 OK, now let's have a look at next. 22 00:01:51,560 --> 00:01:53,250 That is the append the list method. 23 00:01:53,780 --> 00:01:59,960 OK, now when we add a value to the end of the list, we are appointing a value to the list in that 24 00:01:59,960 --> 00:02:00,200 case. 25 00:02:00,410 --> 00:02:06,620 So this is done with list so frequently in Python that they didn't append the list method that takes 26 00:02:06,620 --> 00:02:09,290 a single argument to obtain to the end of the list. 27 00:02:09,650 --> 00:02:15,020 Now, for example, just to check this up in Matorin, the interactive shall, if we go back here and 28 00:02:15,040 --> 00:02:18,590 here, we create a list, one that's equal to blank. 29 00:02:19,410 --> 00:02:28,430 OK, after this, we are seeing list one dot of pink and we give civil fighting within the last one, 30 00:02:28,530 --> 00:02:30,700 it brings the value that is vital. 31 00:02:31,110 --> 00:02:36,240 Again, we see at least one dot up and here we are hacking. 32 00:02:37,320 --> 00:02:40,660 And again, we see that in the value of list one. 33 00:02:41,280 --> 00:02:44,200 So it Prince Python and then hacking that was added. 34 00:02:44,520 --> 00:02:51,150 So after we have created an empty list, name the last one we can enter to list one dot up in a particular 35 00:02:51,150 --> 00:02:53,310 string to add a string value to the list. 36 00:02:53,670 --> 00:02:59,530 Then we can enter again, list one to return the value stored in business, which is right now Python. 37 00:02:59,850 --> 00:03:03,600 If you again use append to the list one to at the end of the list. 38 00:03:03,930 --> 00:03:07,630 Now list one, we are getting Python and hacking both. 39 00:03:08,430 --> 00:03:15,240 So similarly, we also use the only method to add the items to the letters only SDR list we created 40 00:03:15,240 --> 00:03:16,320 in our code earlier. 41 00:03:16,710 --> 00:03:23,310 OK, so whenever we select those only as append and assemble one in the line, it basically goes on 42 00:03:23,310 --> 00:03:28,370 upending the values, then succumbing to creating a string of letters. 43 00:03:28,650 --> 00:03:30,480 Let's go back to our program here. 44 00:03:31,140 --> 00:03:38,910 And after finishing the loop over here, that is in removing all Asgeir and the append one after this. 45 00:03:39,240 --> 00:03:43,410 OK, now letters onliest should be a list for each letter. 46 00:03:43,410 --> 00:03:50,160 And this character from the original message is now because a list of one character string isn't useful 47 00:03:50,160 --> 00:03:51,660 for finding the English word. 48 00:03:52,050 --> 00:03:58,590 We have joined the character string in the letters on Steer into one string and Rutan's that so that 49 00:03:58,920 --> 00:04:06,240 we have Poolesville written or blank, then go to join letters only now to concatenate the list items 50 00:04:06,240 --> 00:04:12,120 in letters only a year into one lobstering we call the joint string Martorell LeBlanc's string that 51 00:04:12,120 --> 00:04:19,290 is open plus single apostrophe and this joins the string in the letters only start with a blank string 52 00:04:19,530 --> 00:04:26,850 between them and this string value is then returned as a remove non letter SDR functions return value. 53 00:04:27,330 --> 00:04:32,280 OK, now coming to the dictating English votes or detecting English words. 54 00:04:32,580 --> 00:04:39,330 Now when a message is encrypted, all of which is decrypted OK with a Ronke, it will often produce 55 00:04:39,330 --> 00:04:42,360 far more non letter and non space, correct? 56 00:04:42,360 --> 00:04:45,360 Those that are found in a typical English message. 57 00:04:45,720 --> 00:04:51,830 All the votes it produces will often be random and not found in the dictionary of English words. 58 00:04:52,170 --> 00:04:58,800 So if we go for a definition of is English check function, we can check for both of these issues in 59 00:04:58,800 --> 00:05:02,310 the given strength, not here in this function. 60 00:05:02,580 --> 00:05:08,850 We have created this or set up this to accept the strong argument that this message is suggestion and 61 00:05:08,850 --> 00:05:14,160 return a boolean value of proof when the string is English text and when it is not. 62 00:05:14,430 --> 00:05:16,500 Now this function has three parameters. 63 00:05:16,500 --> 00:05:23,010 One is messages are verbose and SDR that is 20 and letterboxing, which is eighty five. 64 00:05:23,250 --> 00:05:29,970 And the first parameter contains a string Kubitschek second and third parameter set default percentages 65 00:05:29,970 --> 00:05:36,740 for votes and letters which the string must contain in order to be confirmed as English. 66 00:05:37,110 --> 00:05:40,370 OK, now a percentage is just a number between zero one hundred. 67 00:05:40,380 --> 00:05:45,390 That shows how much of something is a proportional to total number of those things. 68 00:05:45,630 --> 00:05:51,580 So we'll explore how to use this default arguments and calculate percentage in the polling station now. 69 00:05:51,960 --> 00:05:59,910 Now, using the default argument, sometimes a function will always have the same values pass to it 70 00:06:00,150 --> 00:06:03,330 when it is scored instead of including these. 71 00:06:03,330 --> 00:06:10,050 For every function call, you can specify a default argument in the functions definition statement itself. 72 00:06:10,500 --> 00:06:16,770 Like here we have defined three parameters with default arguments of twenty and eighty five provided 73 00:06:16,770 --> 00:06:20,860 for four percent a year and Letha percent respectively. 74 00:06:21,180 --> 00:06:23,130 Now this is English check function. 75 00:06:23,130 --> 00:06:25,650 Can we call with one, two, three arguments? 76 00:06:25,920 --> 00:06:31,500 If no arguments are parsed for one percent or two percent a year, then the values assigned to these 77 00:06:31,500 --> 00:06:34,340 parameters will be the default arguments. 78 00:06:34,560 --> 00:06:42,180 OK, now the default arguments defined what percent of the message ASDF string needs to be made up of 79 00:06:42,180 --> 00:06:44,850 real English votes for is English. 80 00:06:44,910 --> 00:06:49,200 All these English to predetermine that message just yet is an English string. 81 00:06:49,230 --> 00:06:55,560 And what percent of message needs to be made up of Letrozole spaces instead of numbers or punctuation 82 00:06:55,560 --> 00:06:55,980 marks? 83 00:06:56,430 --> 00:07:03,210 For example, if is English check is called with only one argument before arguments are used for four 84 00:07:03,210 --> 00:07:08,730 percent, which means twenty percent of string needs to be made up of English words and eighty five 85 00:07:08,730 --> 00:07:11,460 percent of the string needs to be made up of letters. 86 00:07:11,880 --> 00:07:18,660 Now, these percentages work for detecting your English in most cases, but you might. 87 00:07:18,700 --> 00:07:25,660 I want to try a document combinations in specific cases, one is English, which is basically called 88 00:07:26,110 --> 00:07:29,350 needs some more losers, more restrictive tests. 89 00:07:30,460 --> 00:07:36,490 So in those situations, a program can just pass arguments for even four percent this year and let the 90 00:07:36,490 --> 00:07:39,640 person this year instead of using the default ones.