1 00:00:00,720 --> 00:00:02,430 Hello and welcome back. 2 00:00:02,940 --> 00:00:09,450 In this video, we are going to talk about the principles of performing JavaScript analysis. 3 00:00:12,230 --> 00:00:20,960 We are going to take a look at malicious groups and the users malicious JavaScript script obfuscation 4 00:00:21,470 --> 00:00:24,260 and how to defeat obfuscation. 5 00:00:26,670 --> 00:00:37,440 As we mentioned before, PDF, document or any other document can be used to download for the mail. 6 00:00:38,190 --> 00:00:45,090 And this is often used to using embedded scripts, for example, for PDF. 7 00:00:45,300 --> 00:00:54,540 If we have an embedded JavaScript and this JavaScript could take advantage of some exploits in the PDF 8 00:00:54,540 --> 00:01:00,820 reader to execute, quote, they can download for the malware onto the machine. 9 00:01:02,410 --> 00:01:09,150 This can also happen for other kinds of documents, office documents which may contain embedded VBA 10 00:01:09,150 --> 00:01:11,460 scripts that can do the same thing. 11 00:01:14,750 --> 00:01:23,120 And also, we have seen earlier that there are various vectors of attack, for example, a document 12 00:01:23,630 --> 00:01:32,030 that can contain malicious JavaScript or other scripts could be used to download for the malware. 13 00:01:33,360 --> 00:01:41,070 JavaScript can also be sent in via email attachments, and if a user were to open these attachments, 14 00:01:41,400 --> 00:01:50,820 it would trigger the embedded JavaScript to download for analysis that can also be other types of attack 15 00:01:51,240 --> 00:01:52,240 from Web pages. 16 00:01:52,260 --> 00:02:00,810 If a user were to visit certain Web pages there has got embedded JavaScript browser from the user's 17 00:02:00,810 --> 00:02:09,060 machine could inadvertently execute those embedded JavaScript and caused some additional malware to 18 00:02:09,060 --> 00:02:09,720 be downloaded. 19 00:02:12,400 --> 00:02:20,470 So there are certain things we keep on the lookout for in performing JavaScript analysis or any kind 20 00:02:20,470 --> 00:02:22,940 of document analysis for that matter. 21 00:02:23,350 --> 00:02:30,800 And these are you are hours you hours contain the Web site from which the target. 22 00:02:32,410 --> 00:02:35,390 Will be trying to download additional malaby. 23 00:02:36,870 --> 00:02:43,710 We also keep a lookout for commands, for example, commands to execute bauscher, and these commands 24 00:02:43,710 --> 00:02:51,480 can do additional tasks that can cause damage to the machine or even to download further additional 25 00:02:51,480 --> 00:02:52,050 tools. 26 00:02:53,220 --> 00:02:55,620 We also keep a lookout for filenames. 27 00:02:55,880 --> 00:03:04,410 Filenames could be names of the new files which have been downloaded and saved on the target victim's 28 00:03:04,410 --> 00:03:11,220 machine and found this can also contain a path where these additional files are being saved. 29 00:03:14,700 --> 00:03:23,690 Here is an example of a JavaScript, as you can see here, there are several variables being created. 30 00:03:24,210 --> 00:03:31,630 For example, there will be will be used to create an array here. 31 00:03:31,650 --> 00:03:33,120 The string is. 32 00:03:34,650 --> 00:03:43,410 Contains an additional dodgier this javascript means that you are performing a function call on the 33 00:03:43,410 --> 00:03:43,890 string. 34 00:03:44,280 --> 00:03:47,950 So the fashion call in this case is a split function call. 35 00:03:48,540 --> 00:03:58,550 So what it does is it takes the string here and split into two elements of an array based on the separate 36 00:03:58,710 --> 00:03:59,490 sort of separate. 37 00:03:59,520 --> 00:04:01,190 In this case, it space here. 38 00:04:01,620 --> 00:04:09,180 So if that's especially as you can see in the string, this split function will split this string into 39 00:04:09,180 --> 00:04:17,160 two elements, element one containing the first Yorio and element to containing the second Yorio and 40 00:04:17,160 --> 00:04:20,820 put them in an array which is called B. 41 00:04:21,120 --> 00:04:24,480 So B now is an array containing these two strings. 42 00:04:25,650 --> 00:04:35,090 And then further down you can see here variable and is actually containing the string for the user's 43 00:04:35,100 --> 00:04:43,120 temp variable temp environment variable in this time environment variable here is a string containing 44 00:04:43,120 --> 00:04:51,300 the path to the temporary location on the user's machine and it concatenate step with a character string 45 00:04:51,300 --> 00:04:55,460 from charcoaled function with the ninety two as a parameter. 46 00:04:55,830 --> 00:05:04,140 So this ninety-two here is an escort which represents the backspace backslash and then is appended to 47 00:05:04,350 --> 00:05:06,550 the number one zero seven. 48 00:05:06,870 --> 00:05:11,640 So this variable is trying to create a path to store something. 49 00:05:12,810 --> 00:05:20,580 And in this line here, you excel here contains the object which will be used to connect to the Internet. 50 00:05:21,210 --> 00:05:24,290 Same thing we see in this line here. 51 00:05:24,300 --> 00:05:28,140 You can see that this file here is slipping through. 52 00:05:28,770 --> 00:05:36,660 Something starts with zero to the length of the variable, to the length of every so since the length 53 00:05:36,660 --> 00:05:44,080 of the arrays to it is actually looping through two times each time for each string of the array. 54 00:05:44,910 --> 00:05:51,720 And every time we lose through this string, here it is trying to connect to the Internet, be I. 55 00:05:51,870 --> 00:05:55,090 When you lose through the first iteration, I would be zero. 56 00:05:55,380 --> 00:06:02,070 So in the first iteration, is constructing a Yuanchao out of this string here. 57 00:06:02,550 --> 00:06:10,080 And then from there he will call the next line the sand, which is to connect to this. 58 00:06:10,080 --> 00:06:13,950 You are able to perform some kind of download. 59 00:06:14,580 --> 00:06:18,800 And here in an instant when he checks whether the result is successful. 60 00:06:18,930 --> 00:06:22,260 Two hundred is our for successful. 61 00:06:22,590 --> 00:06:29,520 If it is successful, it would then open it, open this line here and down and CV to a file. 62 00:06:30,000 --> 00:06:39,810 And then here is the path which we saw just now containing the complete path to where this fire is going 63 00:06:39,810 --> 00:06:40,620 to be saved. 64 00:06:40,980 --> 00:06:43,500 And the name it is constructed here as well. 65 00:06:43,950 --> 00:06:46,230 It end extension. 66 00:06:46,950 --> 00:06:53,360 And over here, the next line, even as I see once, it has me safe to location. 67 00:06:53,940 --> 00:06:58,140 So this is an example of a malicious JavaScript code. 68 00:07:00,250 --> 00:07:10,600 However, very often we will not find the JavaScript code so easy to read and in plain text, most often 69 00:07:10,600 --> 00:07:17,560 it will be obfuscated and this is probably what we are more likely to see. 70 00:07:18,040 --> 00:07:20,700 He doesn't have any clear formatting. 71 00:07:20,920 --> 00:07:27,620 He has got obfuscation and substitution and knee replacements and so on to make it difficult for the 72 00:07:28,000 --> 00:07:30,640 analyst to understand what is going on. 73 00:07:32,200 --> 00:07:41,830 It's good obfuscation, obfuscation, can you use any of these four or all of them in various combinations? 74 00:07:42,880 --> 00:07:44,350 The first is formating. 75 00:07:44,860 --> 00:07:47,290 That is, he will remove any formating. 76 00:07:47,290 --> 00:07:55,750 That is to make it harder to see to understand the code and then also inject additional unnecessary 77 00:07:55,750 --> 00:07:58,340 code to make it even more confusing. 78 00:07:59,530 --> 00:08:05,260 Then he will also perform data obfuscation by trying to. 79 00:08:06,590 --> 00:08:13,850 Creates strings in various ingenious ways, rather than simply put it out so that people can easily 80 00:08:13,850 --> 00:08:14,970 understand the strings. 81 00:08:16,070 --> 00:08:23,270 And then there's also substitution where the variable names will be replaced by junk collectors who 82 00:08:23,270 --> 00:08:23,940 junk names. 83 00:08:23,970 --> 00:08:28,490 So that is difficult to understand the function of a particular variable. 84 00:08:30,800 --> 00:08:40,340 Formatting, so this is first obfuscation method formatting, this is an example of formatting, obfuscation, 85 00:08:40,640 --> 00:08:47,690 so on the left you can see it is clearly, easily formatted for e for someone to programmer analyst 86 00:08:47,840 --> 00:08:48,370 to read. 87 00:08:48,800 --> 00:08:55,140 And this is easy to understand on the right of formatting has been replaced and all we see is just a 88 00:08:55,190 --> 00:08:59,900 junk blob of characters, which is almost impossible to understand. 89 00:09:00,620 --> 00:09:08,540 The solution of who overcome this kind of formatting obfuscation is the use beautification programs. 90 00:09:08,870 --> 00:09:16,750 So communication programs will be able to reformat this and make it formatted nicely as shown. 91 00:09:16,760 --> 00:09:17,300 And the left. 92 00:09:20,620 --> 00:09:30,640 So this is an example of a court which has got formating removed after applying the beautification process 93 00:09:30,640 --> 00:09:30,980 to it. 94 00:09:31,390 --> 00:09:33,730 It is now much easier to read. 95 00:09:36,260 --> 00:09:43,510 As you can see here, there is a clear indication of a follow up here and there's a block here and within 96 00:09:43,670 --> 00:09:45,500 this if statement as well. 97 00:09:48,940 --> 00:09:56,380 The second way, in which case it can happen is through injecting extraneous code and then means two 98 00:09:56,390 --> 00:10:01,850 extra lines of code in order to confuse the analysts, for example, here on the left. 99 00:10:02,200 --> 00:10:03,580 This is the original quote. 100 00:10:03,940 --> 00:10:10,240 And on the right, you can see unnecessary code has been added to make it difficult to read, to understand. 101 00:10:10,870 --> 00:10:17,020 As you can see here, the important parts of the code is random on one, assign ten random to assign 102 00:10:17,020 --> 00:10:20,310 five and one and three is assign them one plus two. 103 00:10:20,650 --> 00:10:25,530 However, in between you can see additional extraneous could be injected. 104 00:10:25,900 --> 00:10:33,670 For example, here, which is assigned ten by plus 10, the assigned one, two, three, three, four 105 00:10:33,970 --> 00:10:35,890 and six assigned by the line. 106 00:10:36,190 --> 00:10:40,890 So this is A, B and C are unused at home. 107 00:10:41,170 --> 00:10:48,600 So the solution to overcome these kind of extraneous obfuscation is to search for variables and code 108 00:10:48,940 --> 00:10:52,990 that is only used once and to remove them from the code. 109 00:10:55,590 --> 00:11:03,360 Take a look at our example here in this example here, you will see a lot of unnecessary corn being 110 00:11:03,360 --> 00:11:04,020 injected. 111 00:11:05,810 --> 00:11:08,900 Now, these are the unnecessary could we use. 112 00:11:10,470 --> 00:11:12,810 Only once and never use again any. 113 00:11:13,140 --> 00:11:23,190 So these are meant to confuse and unnecessary complexity to the court after we have removed them, you 114 00:11:23,190 --> 00:11:25,920 will see the courtly, smushing, eternal and shorter. 115 00:11:29,820 --> 00:11:33,660 Another example is data obfuscation. 116 00:11:34,070 --> 00:11:42,020 This technique is where we use corporations to make data unreadable or confusing, for example, in 117 00:11:42,020 --> 00:11:42,870 these two lines. 118 00:11:43,030 --> 00:11:46,630 Here you can see the original lines are as follows. 119 00:11:47,120 --> 00:11:55,220 But after going through data officiation, you will see that the first line now is being complicated 120 00:11:55,220 --> 00:11:55,760 further. 121 00:11:56,360 --> 00:12:05,510 So what it does is it is still trying to correct the string if you end, but it is doing so by injecting 122 00:12:05,510 --> 00:12:09,060 unnecessary operations into the line. 123 00:12:09,950 --> 00:12:20,390 For example, here it is concatenating blank character to F, and here it is converting his representation 124 00:12:20,390 --> 00:12:25,860 of ASCII from a charcoal function, which actually would be you. 125 00:12:26,450 --> 00:12:27,950 And finally, the end here. 126 00:12:29,130 --> 00:12:37,590 He's continuing to do it, and anything less than 100 years will be split up into a complicated mathematical 127 00:12:37,800 --> 00:12:41,100 operation to make it difficult to understand. 128 00:12:42,180 --> 00:12:51,540 So the solution for this kind of escalation is to replace all these unnecessary operations with readable 129 00:12:51,540 --> 00:12:52,040 values. 130 00:12:52,050 --> 00:12:58,850 That means you have to go through manually all of this and then replace it to to get back his original 131 00:12:59,700 --> 00:13:01,650 simplified form. 132 00:13:04,610 --> 00:13:14,160 Now, this is another example I saw earlier, and you can see here that all these lines here be 1990, 133 00:13:14,200 --> 00:13:19,860 1998 and so on, when you come to the front line is being replaced. 134 00:13:20,480 --> 00:13:30,560 So what is doing is taking all this string here and wherever you find any stage and contains any name, 135 00:13:30,890 --> 00:13:32,510 you removed any name. 136 00:13:33,860 --> 00:13:42,110 Now, if you want to replace all this unnecessary operations, we would end up with something like this. 137 00:13:42,830 --> 00:13:48,830 So this is what you get after removing all the data obfuscation and you can see that it is much easier 138 00:13:48,830 --> 00:13:49,340 to read. 139 00:13:50,420 --> 00:13:55,220 Another method for confusing the data analyst is a use substitution. 140 00:13:55,580 --> 00:14:02,360 Substitution is where you modify the variable names to random names, meaningless names. 141 00:14:02,360 --> 00:14:08,300 For example, on the left you see that this variable password from reading the name of the verb. 142 00:14:08,300 --> 00:14:12,910 Yet we can clearly understand that use purposes to start with. 143 00:14:13,250 --> 00:14:18,750 But on the right it has to be replaced if a random string, which has no meaning at all. 144 00:14:19,430 --> 00:14:23,850 So the purpose of this is to obfuscate, make it difficult to understand. 145 00:14:24,290 --> 00:14:32,300 Now, in order to to overcome this kind of substitution obfuscation, we need to analyze to find the 146 00:14:32,300 --> 00:14:33,580 meaning of the variable. 147 00:14:33,860 --> 00:14:40,880 And once we found the meaning of it, we should then go on to search for all the similar variables and 148 00:14:40,880 --> 00:14:43,250 replace them with something meaningful. 149 00:14:44,150 --> 00:14:48,830 So let's take a look at how this is used in practice. 150 00:14:51,110 --> 00:15:00,500 Coming back to our example over here, we can see that if we were to replace only meaningless verbal 151 00:15:00,500 --> 00:15:05,100 names in meaningful names, our code now becomes much easier to read. 152 00:15:05,840 --> 00:15:13,220 As you can see, NIST has no meaning at all, but maybe replacing meaningful words. 153 00:15:13,520 --> 00:15:17,390 We can see that now things begin to make sense. 154 00:15:18,230 --> 00:15:20,900 The same thing with second character here. 155 00:15:21,140 --> 00:15:25,720 Any SGI you replace him is the meaningful variable names. 156 00:15:26,030 --> 00:15:27,450 It begins to make sense that. 157 00:15:29,710 --> 00:15:36,880 So the same thing applies to all the other meaningless variables, we just have to replace them with 158 00:15:36,890 --> 00:15:38,160 meaningful names. 159 00:15:40,270 --> 00:15:48,310 So he has some tips to help you with the obfuscation process one layer at a time, although there are 160 00:15:48,310 --> 00:15:55,000 four possible ways in which far can be off with gated only select one at a time. 161 00:15:55,000 --> 00:16:02,390 And technically second, try to reform the court so that he has proper indentation in spaces so that 162 00:16:02,390 --> 00:16:11,080 it is easier to read and understand that we can try to execute the malicious code in a sandbox. 163 00:16:11,350 --> 00:16:18,730 There is a virtual machine and then once it's executed, the court itself or the office carousel, and 164 00:16:18,730 --> 00:16:26,140 then we can be able to dumb the obfuscated version from memory and examine the obfuscated code. 165 00:16:26,680 --> 00:16:33,400 So this is much easier than trying to manually try to the obfuscate the various quote. 166 00:16:34,210 --> 00:16:41,170 One example of a tool we can help us do that is the PDA, which we have seen before in some previous 167 00:16:41,470 --> 00:16:42,070 lesson. 168 00:16:42,790 --> 00:16:47,230 And people can beautify and execute JavaScript in a sandbox. 169 00:16:47,590 --> 00:16:50,920 And these two, as you have seen already, come to us. 170 00:16:51,070 --> 00:16:52,390 And we have used it before. 171 00:16:52,930 --> 00:16:59,290 And this has sounded in line commands that we can use in order to perform content, beautification and 172 00:16:59,290 --> 00:17:00,580 good analysis. 173 00:17:02,760 --> 00:17:11,130 Another useful tool is as needed and united here to keep from Mozilla implementation of JavaScript. 174 00:17:11,810 --> 00:17:19,380 So this author has taken this JavaScript implementation and created some useful tool that can do the 175 00:17:19,380 --> 00:17:20,640 obfuscation for us. 176 00:17:21,750 --> 00:17:31,500 Some of the JavaScript functions like eval document done right and Vendig navigate our use by script 177 00:17:31,860 --> 00:17:34,720 when you finish with the idea of education process. 178 00:17:35,040 --> 00:17:41,970 And so this too takes advantage of these functions in order to be able to dump out the obfuscated code 179 00:17:42,120 --> 00:17:43,930 for further analysis. 180 00:17:45,420 --> 00:17:47,200 So that's all for this lesson. 181 00:17:47,490 --> 00:17:48,720 Thank you for watching.