1
00:00:00,150 --> 00:00:07,170
Now, this process of determining how this how frequently a letter appeals in a plain text in a ciphertext

2
00:00:07,170 --> 00:00:13,590
is called frequency analysis, not understanding frequency analysis is an important step in hacking

3
00:00:13,590 --> 00:00:14,760
the Virginia cipher.

4
00:00:15,210 --> 00:00:21,480
So we will use the legal frequency analysis to break the Virginia cipher in the next session.

5
00:00:21,810 --> 00:00:27,360
So in this particular session, we are going to cover the frequency of the letter frequency, the method

6
00:00:27,360 --> 00:00:30,600
ski and the reverse method or the reverse keyword arguments.

7
00:00:30,930 --> 00:00:32,670
Parsing functions has value.

8
00:00:32,670 --> 00:00:39,180
Instead of calling the functions and converting dictionaries to list using the keys, values and items

9
00:00:39,180 --> 00:00:39,680
matter.

10
00:00:40,230 --> 00:00:44,100
So let's understand analyzing the frequency of letters in a text.

11
00:00:44,410 --> 00:00:49,370
Now, when you flip a coin, about half the time it comes up heads and half the time it comes up as

12
00:00:49,380 --> 00:00:49,860
sticks.

13
00:00:50,310 --> 00:00:54,350
Now that is the frequency of heads and this should be about the same.

14
00:00:54,840 --> 00:01:02,370
We can represent the frequency as a percentage by dividing the total number of lines and even by total

15
00:01:02,370 --> 00:01:08,670
number of items at that particular event and then multiplying the quotient by one hundred so we can

16
00:01:08,670 --> 00:01:15,390
learn much more about a coin from its frequency of hits and things, whether the coin is fair or unfairly

17
00:01:15,390 --> 00:01:18,080
weighted or even if it has 200 points.

18
00:01:18,510 --> 00:01:23,430
We can also learn much about the ciphertext from a frequency of its letters.

19
00:01:23,760 --> 00:01:27,870
Like some letters in English, alphabets are used more often than others.

20
00:01:28,140 --> 00:01:37,140
For example, the letters E all appear most frequently in the English words, whereas a little G excuse

21
00:01:37,140 --> 00:01:37,380
it.

22
00:01:37,380 --> 00:01:39,440
Appeals are less frequently in English.

23
00:01:39,840 --> 00:01:45,720
So we'll use this differences in the letter frequencies in English language to connect the Virginia

24
00:01:45,720 --> 00:01:46,920
encrypted messages.

25
00:01:47,400 --> 00:01:49,800
Now we will see a graph.

26
00:01:50,130 --> 00:01:57,750
Basically, you can have it when you can compile your other sources for the frequency analysis and then

27
00:01:57,960 --> 00:02:03,690
you can sort those little frequencies in order to order of the greatest frequency to the least frequency,

28
00:02:03,690 --> 00:02:04,380
for example.

29
00:02:04,500 --> 00:02:12,060
OK, now C or likewise, the letters that appear most often a.D.A ciphertext and a simple substitution

30
00:02:12,060 --> 00:02:18,120
ciphertext are more likely to have been encrypted from the most commonly found English letters like

31
00:02:18,120 --> 00:02:19,800
80 or so.

32
00:02:19,800 --> 00:02:25,230
Similarly, the letters that at least often in the ciphertext are more likely to have been encrypted

33
00:02:25,230 --> 00:02:28,760
from excuser, for example, in the plain.

34
00:02:29,080 --> 00:02:35,730
So if we come to some matching letter frequencies to find the letter frequencies in a message, we will

35
00:02:35,730 --> 00:02:43,020
use an algorithm that simply Aldo's the letter in a string by the highest frequency to the lowest frequency.

36
00:02:43,380 --> 00:02:50,400
Then the algorithm uses this ordered string to calculate what in this particular section is called Frequency

37
00:02:50,400 --> 00:02:57,150
Match School, which we will use to determine how similar the strings letters frequency is to that of

38
00:02:57,150 --> 00:02:58,140
the standard English.

39
00:02:58,410 --> 00:03:04,470
So to calculate the frequency med school for the ciphertext, we start with zero and then add a point

40
00:03:04,470 --> 00:03:05,220
each time.

41
00:03:05,520 --> 00:03:12,810
One of the most frequently of frequent English letters at the CDC or ION appeals among the six most

42
00:03:12,810 --> 00:03:19,650
frequent letters of the ciphertext, will also add a point to score each time one of the least frequency

43
00:03:19,650 --> 00:03:23,070
letters like the V or appeals.

44
00:03:23,310 --> 00:03:30,480
Among the six least frequency of frequent letters of the ciphertext, the frequency mathkour of a string

45
00:03:30,480 --> 00:03:32,610
can range from zero to twelve.

46
00:03:32,940 --> 00:03:38,370
Knowing the frequency, mascord of a ciphertext can reveal important information about the original

47
00:03:38,370 --> 00:03:39,100
plaintext.

48
00:03:39,510 --> 00:03:48,570
So, for example, if we go for using a frequency analysis on a Virginia Saiful, so to Harkavy Saiful

49
00:03:48,570 --> 00:03:51,570
we need to decrypt the sub D individually.

50
00:03:51,930 --> 00:03:59,670
That means we can't rely on using English word or detection because we won't be able to decrypt enough

51
00:03:59,670 --> 00:04:01,890
of the message using just one subject.

52
00:04:02,080 --> 00:04:08,970
Instead, we will decrypt the letters and repeat with one subject and preform frequency analysis to

53
00:04:08,970 --> 00:04:15,950
determine which decrypted ciphertext produces a little frequency that most closely matches that of a

54
00:04:15,960 --> 00:04:16,860
regular English.

55
00:04:17,160 --> 00:04:21,720
In other words, we will need to find which decryption has the highest frequency.

56
00:04:21,730 --> 00:04:26,510
Matsuko, which is a good indication that we have found the correct subject.

57
00:04:26,940 --> 00:04:31,040
We repeat this process for the second, third, fourth, fifth suppy as well.

58
00:04:31,650 --> 00:04:37,590
So just for now, we are just guessing that the Guilin this five letters because that are twenty six

59
00:04:37,590 --> 00:04:39,520
descriptions for each subject.

60
00:04:39,540 --> 00:04:46,960
So in a beginning for the computer only has to perform twenty six plus twenty six plus one plus one

61
00:04:46,960 --> 00:04:51,960
is six and that is one fifty six decryption for the five letter.

62
00:04:51,960 --> 00:04:58,890
Q So this is much easier than performing scriptures for every possible combinations which would be somewhere

63
00:04:58,890 --> 00:04:59,370
around.

64
00:04:59,910 --> 00:05:08,160
One one eight one three seven six or something like that, so all these are more steps to have the Saiful,

65
00:05:08,790 --> 00:05:10,890
which we will learn in the next session, obviously.

66
00:05:10,890 --> 00:05:17,220
But when we write the hacking program for now, let's write a module that will form a frequency analysis

67
00:05:17,220 --> 00:05:19,560
using the following helpful function.

68
00:05:19,560 --> 00:05:25,350
First is getting the letter count, which will take the string parameter and return a dictionary that

69
00:05:25,350 --> 00:05:29,560
has a count of how often each letter appears in the string.

70
00:05:29,880 --> 00:05:35,640
Second is get the frequency order, which takes a string parameter and returns a string of twenty six

71
00:05:35,640 --> 00:05:40,890
letters order from most frequent or least frequent in the string parameter.

72
00:05:40,920 --> 00:05:46,260
And finally, English frequency math score, which takes a string parameter undertones and integer from

73
00:05:46,260 --> 00:05:50,610
zero to 12 indicating Alekos frequency match score.

74
00:05:50,820 --> 00:05:56,630
So understanding the background of what frequency analysis will do in the next session.

75
00:05:56,650 --> 00:06:01,050
Now we would start by creating the source code for matching little frequencies.

76
00:06:01,320 --> 00:06:03,550
We would create a separate file for it.

77
00:06:03,600 --> 00:06:10,020
And in that, after creating the file, we will also understand how the file has been created, which

78
00:06:10,020 --> 00:06:16,080
will be helpful for us in the next session to know how much or what are other techniques of hacking

79
00:06:16,080 --> 00:06:17,690
your original cipher also.

80
00:06:17,880 --> 00:06:23,760
So we would see how writing a particular frequency analysis program in the next session.

81
00:06:23,920 --> 00:06:25,860
That's from the session for now.

82
00:06:26,220 --> 00:06:28,150
We will see in the next one.

83
00:06:28,320 --> 00:06:29,310
Thank you very much.