Yet another PDF/XDP Malware

Today we’re going to analyze yet another sample of PDF containing an XDP form. The difference between this sample and the one of my previous post is that this one will be less about JavaScript deobfuscation and more about anti-analysis tricks.

If you want to follow hands-on the analysis, this is the link to the malware sample (password: infected29A). Also make sure to update Profiler to the current 2.6.2 version!

MD5: 4D686BCEE50538C969647CF8BB6601F6
SHA-256: 01F13FE4E597F832E8EDA90451B189CDAFFF80F8F26DEE31F6677D894688B370

Let’s open the Zip archive. The first thing we notice is that the file has been incorrectly identified as CFBF.

That’s because the beginning of the file contains a CFBF signature:

If we were to open the file directly from the file-system, we would be prompted to choose the correct file format:

But as such is not the case, we simply go to the decompressed stream in the Zip archive (or to the CFBF document, it doesn’t matter), position the cursor to the start of the file and press Ctrl+E.

We select the PDF format and then open the newly created embedded file in the hierarchy.

What we’ll notice by looking at the summary is that a stream failed to decompress, because it hit the memory limit. A tool-tip informs us that we can tweak this limit from the settings. So let’s click on “Go to report” in the tool-bar.

This will bring us to the main window. From there we can go to the settings and increase the limit.

In our case, 100 MBs are enough, since the stream which failed to decompress is approximately 90 MBs. Let’s click on “Save settings”, click on “Computer Scan” and then back to our file.

Let’s now repeat the procedure to load the embedded file as PDF and this time we won’t get the warning:

Just for the sake of cleanliness, we can also select the mistakenly identified CFBF embedded file and press “Delete”, in order to remove it from the analysis.

We are informed by the summary that the PDF contains an interactive form and, in fact, we can already see the XDP as child of the PDF.

We could directly proceed with the analysis of the XFA, but let’s just step back a second to analyze a trick this malware uses to break automatic analysis. The XFA is contained in the object 1.0 of the PDF.

Let’s go with the cursor to the stream part of the object (the one in turquoise), then let’s open the context menu and click on “Ranges->Select continuous range” (alternatively Ctrl+Alt+A). This will select the stream data of the object. Let’s now press Ctrl+T to invoke the filters and apply the unpack/zlib filter. If we now click on “Preview”, we’ll notice that an error is reported.

The stream is still decompressed, but it also reports an error. This is one of the trick this malware uses to break automatic analysis: the ZLib stream is corrupted at the very end.

Let’s now open the XFA. Immediately we can see another simple trick to fool identification of the XDP: a newline byte at the start.

Given the huge size of the XDP it’s not wise to open it in the text editor, but we can look at the extracted JavaScript from the summary.

Here are the various parts which make up the JavaScript code:

The first part contains the information needed to construct ROP for the various versions of Adobe Reader. In the last part we can see that the JavaScript code sprays the heap. So probably they rely on a huge image embedded in the XDP (which is actually the reason why the XDP is so big) to trigger the exploit.

The field name is aptly named “ImageCrash”.

Let’s go back to the shellcode part and let’s analyze that. I’m talking about the part of code which starts with:

We could of course copy that part of a text view, remove the \u, then convert to bytes and then apply a filter to reorder them, as in JavaScript the words are in big-endian. But we can do it even more elegantly and make our shellcode appears as an embedded file. So let’s select the byte array from the hex editor:

Let’s now press Ctrl+E and click on the “Filters” button.

What we want to do is to first remove the “\u” escape. So we add the filter misc/replace and specify “\u” as in and nothing as out (we leave ascii mode as default). Now we have stripped the data from the escape characters. Now we need to convert it from ascii hex to bytes. So we add the convert/from_hex filter. The last step, as already mentioned, is that we need to switch the byte order in the words. To do that, we’ll use the lua/custom filter. I only modified slightly the default script:

If you want to avoid this part, you can simply import the filters I created:

By opening the embedded shellcode file, Profiler will have automatically detected the shellcode:

By looking at the hex-view we can already guess where the shellcode is going to download its payload to execute from:

But let’s analyze it anyway. Let’s press Ctrl+A and then Ctrl+R. Let’s execute the action “Debug->Shellcode to executable” to debug the shellcode with a debugger like OllyDbg.

Here’s the (very simple) analysis:

You can also download the Profiler project with the complete analysis already performed (same password: infected29A). Please notice, you’ll be prompted twice for the password: once for the project and once for the Zip archive.

I hope you enjoyed the read!

Malware in a MSG

Even though sending malware via zipped attachments in spam emails is nothing new and had been around for eons but many people are still puzzled at how it works. Thus, I will go through with you on how to do it with Profiler. I will try to fill in required information about where to look out for information and how decode some of the information.

Firstly, we are going to learn how are a bit about the .msg file format and how is it used to store a message object in a .msg file, which then can be shared between clients or message stores that use the file system.

From an investigator’s point of view, you should always analyze the .msg file without installing Outlook. In order to analyze the .msg file without Outlook, we can read more about the file format from:

  • http://download.microsoft.com/download/5/D/D/5DD33FDF-91F5-496D-9884-0A0B0EE698BB/[MS-OXMSG].pdf
  • https://msdn.microsoft.com/en-us/library/cc463912(v=exchg.80).aspx
  • http://www.fileformat.info/format/outlookmsg/

The purpose of this post is to give a better technical understanding of how attackers makes use spam emails to spread malware.

[ Sample used in the analysis ]
MD5: BC1DF9947B9CF27B2A826E3B68C897B4
SHA256: C7AC39F8240268099EC49A3A4FF76174A50F1906BBB40AE6F88425AF303A44BB
Sample: Sample

[ Part 1 : Getting Started ]
For those who want to follow along, this is a link to the .msg file. Do note, this is a MALICIOUS file, so please do the analysis in a “safe” environment. The password to the attachment is “infected29A

Now, let’s start getting our hands dirty…and open the suspicious .msg file.

The msg file is already flagged by Profiler, as it contains some suspicious features.
Each “__substg” contains valuable pieces of information. The first four of the eight digits at the end tells you what kind of information it is (Property). The last four digits tells you the type (binary, ascii, Unicode, etc)

  • 0x007d: Message header
  • 0x0C1A: Sender name
  • 0x0C1F: Sender email
  • 0x0E1D: Subject (normalized)
  • 0x1000: Message body

[ Part 2 : Email investigation ]
If we are interested in email investigation, let’s check out the following file, “__substg1.0_0C1F001F”.

As we can see below, the sender’s email address is “QuinnMuriel64997@haarboutique-np.nl
But is it really sent from Netherlands?

Well, let’s check out the message header located in “__substg1.0_007D001F” to verify that.

If we were to do through the message header, do a whois on “haarboutique-np.nl” and check out the MX server. We can confirm that the sender is spoofing email as well.

From the message header, we can conclude that the sender sent the email from “115.78.135.85” as shown in the image and the extracted message header as shown below.

    Received: from [115.78.135.85] ([115.78.135.85])
    by mta02.dkim.jp (8.14.4/8.13.8) with ESMTP id u44L8X41032666
    for <info@dkim.jp>; Thu, 5 May 2016 06:08:35 +0900

Whois information showed that IP address where this spam email is sent from is from Vietnam.
But it doesn’t mean that the attacker is from Vietnam. Anyone in the world can buy web hosting services in Vietnam. This is just to let you know that the attacker is definitely not sending from “haarboutique-np.nl

[ Part 3 : Email investigation ]
Using this information opening the “__substg1.0_0E1D001F” file and we can see the subject, “Re:

Hmmmm…this doesn’t look any useful at all. Let’s try opening the file, “__substg1.0_1000001F”, containing the “subject body” instead.

      “Hi, info

Please find attached document you requested. The attached file is your account balance and transactions history.

Regards,
Muriel Quinn”

Awesome, Muriel Quinn is sending me my account balance and transactions history which I may or may not have requested at all. Awesome, he is also attaching the files to the email just for me. This is definitely suspicious to me.

[ Part 4 : Email attachment ]
Now that we are interested in the attachments, let’s look at “Root Entry/__attach_version1.0_#00000000” and refer to the specifications again.

  • //Attachments (37xx):
  • 0x3701: Attachment data
  • 0x3703: Attach extension
  • 0x3704: Attach filename
  • 0x3707: Attach long filenm
  • 0x370E: Attach mime tag

If we were to look at “__substg1.0_3704001F”, we will see that the filename of the attachment is called “transa~1.zip” and the display name “__substg1.0_3001001F” of the attachment is called “transactions-625.zip”.

Now let’s look at the actual data located within “__substg1.0_37010102” as shown below.

Now, let’s press “Ctrl+A” to select the entire contents. Then copy it into a new file as shown in the image below.

But as we can see on the left, Profiler can identify what is inside the attachment. There are 3 Javascript files inside the .zip file.

Now let’s fire up “New Text View” and copy the contents of “transactions 774219.js” as shown below.

Press “Ctrl+R” and select “Beautify JavaScript” and Profiler will “JSBeautify” it for you. But let’s add some “Colouring” to it by doing “Right-click -> Language -> JavaScript” as shown below.

We can use Profiler to debug the JavaScript but I shall leave that as an exercise for the readers.
The decoded JavaScript will look something like this.

As we can see from the image above, it is downloading from “http://infograffo[.]com[.]br/lkdd9ikfds” and saving it as “ew3FbUdAB.exe” in the victims’ TEMP directory.

We won’t be going through on reversing the malware.

In the meantime, we hope you enjoyed reading this and would be happy to receive your feedback!

PDF/XDP Malware Reversing

Recently version 2.6 of Profiler has been released and among the improvements support for XDP has been introduced. For those of you who are unfamiliar with XPD, here’s the Wikipedia description:

“XML Data Package (XDP) is an XML file format created by Adobe Systems in 2003. It is intended to be an XML-based companion to PDF. It allows PDF content and/or Adobe XML Forms Architecture (XFA) resources to be packaged within an XML container.

XDP is XML 1.0 compliant. The XDP may be a standalone document or it may in turn be carried inside a PDF document.

XDP provides a mechanism for packaging form components within a surrounding XML container. An XDP can also package a PDF file, along with XML form and template data. When the XFA (XML Forms Architecture) grammars used for an XFA form are moved from one application to another, they must be packaged as an XML Data Package.”

So I’ll use the occasion to show the reversing of a nice PDF with all the goodies. Let’s open the suspicious PDF.

The PDF is already heavily flagged by Profiler, as it contains many suspicious features.

If we take a look, just out of curiosity, at the object 8 of the PDF we will notice that the XDP data contains a bogus endstream keyword to fool the parsers of security solutions.

Profiler handles this correctly, so we don’t have to do anything, just worth mentioning.

Let’s take a look at the raw XDP data.

As you can see, it is completely unreadable because of the XML escaped characters. Even this is not really important for us, since the XML parser of Profiler handles this automatically, again just worth mentioning.

So let’s open directly the embedded XDP child and we can see a readable and nicely indented XML.

We can see that the XML contains JavaScript code, but Profiler already warns us of this. So let’s just click on the warning.

The code isn’t readable. So let’s select the JavaScript portion and then press Ctrl+R->Beautify JavaScript.

Much better, isn’t it?

The code is quite easy to understand although it’s obfuscated. It takes a value straight from the XDP, processes it and then calls eval on it.

This is the value it takes:

What we want is the result of the processing, before eval is called. So what I did is to modify slightly the JavaScript code like this:

I didn’t paste now the entire value in here as it was way too big, but I did so in the code edit:

At this point, we can just press Ctrl+R->Debug/Execute JavaScript and get the result of the execution.

We will get the following code:

What it does is basically to spray the heap using an array. It changes the payload based on the version of Adobe Reader. The version is retrieved by calling the _l5 function.

Now we could just examine the _l1 or _l2 payloads directly, but just to make sure I let the code generate a spray portion. So I changed the code accordingly and avoided to actually spray a lot of data.

We can run this script in the JavaScript debugger (Ctrl+R->Debug JavaScript).

The final print will give us the payload in memory. We can copy the just the initial part, avoiding the padding. Let’s paste the string into a text editor in Profiler and then Ctrl+R->Hex string to bytes.

If we look at the payload, we can see that the beginning (the marked portion) looks like ROP code. So in order to avoid looking for the gadgets in memory, let’s skip the ROP as it most likely is only going to jump to the actual shellcode. Let’s assume that is the case and thus focus on the data which follows.

We can see a web address at the end of the data. So we could just assume that the shellcode downloads an executable and runs it. But just for the sake of completeness, let’s analyze it.

We can of course disassemble the shellcode by applying a filter to it (Ctrl+T->x86 disasm). But what we’ll do is to use a debugger via Ctrl+R->Shellcode to execute. This way we can quickly step through what it does.

Here’s the commented code:

So yes, in the end it just downloads the file from the address we’ve seen and tries to execute it, then tries to register it as a COM object. Some AV-evasion techniques are also present.

Cheers!

Analysis of CVE-2013-3906 (TIFF)

This is just a demonstration of malware analysis with Profiler, I haven’t looked into previous literature on the topic. So, perhaps there’s nothing new here, but I hope it will be of help for our users.

We open the main DOCX file. The first embedded file we analyze is the TIFF image, which stands out because it’s the only image.

TIFF directories

Among the directories we have two which specify an embedded JPEG. So we just select the data area according to the offset and length value and load it as an embedded JPEG (it’s a good idea to do this automatically in the future: bear with us, TIFF support has been just introduced).

JPEG data

JPEG embedded

Now we can inspect the embedded JPEG.

JPEG meta

The JPEG looks strange. Even by looking at the format fields, it looks malformed. Given certain anomalies we can suppose that the TIFF might be used as some sort of vector for something. Let’s keep that in mind and let’s take a look at other files. Two of them stand out: they have a .bin extension and are CFBFs (same format as DOC files).

CFBF Foreign

We immediately notice various problems. Foreign data is abundant and the file looks malformed. More alarmingly lots of shellcode warnings are reported. Let’s look at a random one (they are all the same basically).

CFBF Shellcode

Let’s start with the analysis of this shellcode.

The start of the shellcode is just a decryption loop which xors every byte following the last call with 0xEE. So, that’s exactly what we’re going to do. We select 0x27E bytes after the last call, then we press Ctrl+T to open the filters view.

Decrypt shellcode

We use two filters: ‘misc/basic‘ to xor the bytes and ‘disasm/x86‘ to disassemble them.

At this point it’s useful to load the shellcode into a debugger. So let’s select the decrypted shellcode and press Ctrl+R and activate the ‘Shellcode to executable‘ action:

Shellcode to executable

The options dialog will pop up.

Shellcode to executable options

After pressing OK, the debugger will be executed. Depending on your debugger, put a break-point on the beginning of the shellcode and run the code.

The start of the shellcode resolves API addresses:

This code basically retrieves the base of ‘kernel32.dll‘ and then resolves API address by calling 0x120d for 0x12 times (which is the amount of APIs to be resolved).

This is the function which resolves API names hashes to addresses:

The API hashes are stored at the end of the shellcode. Here are the hashes and the resolved APIs:

We can actually close the debugger now. Once the API names are known, it’s easy to continue the analysis statically.

The next step in the shellcode is to suspend all other threads in the current process:

It then looks for the handle in the current process for the main DOCX file:

It reads the entire file (minus the first 0x24 bytes) into memory and looks for a certain signature:

It creates a new file in the temp directory:

Now comes the juicy part. It reads few parameters after the matched signature, including a size parameter, and uses these parameters to decrypt a region of data:

Let’s select in the hex view the same region (we find the start address by looking for the signature just as the shellcode does).

Encrypted payload

Then we press Ctrl+E to add an embedded file and we click on filters. Since the decryption routine is too complex to be expressed through a simple filter, we have finally a good reason to use a Lua filter, in particular the ‘lua/custom‘ one.

Lua custom filter

As you can see, a sample stub is already provided when clicking on the script value in the options. We have to modify it only slightly for our purposes.

By clicking on preview, we can see that the file start with a typical MZ header. Thus, we can specify a PE file when loading the payload.

PE payload

By inspecing the PE file we can see that it’s a DLL among other things. Now, before analyzing the DLL, let’s finish the shellcode analysis.

The payload is now decrypted. So the shellcode just writes it to the temporary file, loads the DLL, unloads it and then terminates the current thread.

That’s it. Now we can go back to the payload dll.

One of the resources embedded in the PE file is a DOCX.

Payload DOCX

Probably a sane document to re-open in a reader to avoid making the user suspicious of a document which didn’t open.

There’s another suspicious embedded resource, but before making hypotheisis about it, let’s do a complete analysis of the DLL, (don’t worry: it’s very small). For this purpose I used IDA Pro and the decompiler.

It starts with the DllMain:

The called function:

It basically retrieves both embedded resources and then performs some stuff with both of them. First we take a look at what it does with the DOCX resource:

As hypothized previously, it just launches a new instance of the current process with the sane DOCX this time.

Now let’s take a look at what it does with the resource we haven’t yet identified:

It dumps it to file (with a vbe extension) and then executes it with ‘cscript.exe‘. I actually didn’t know about VBE files: they’re just encoded VBS files. To decode it I used an online tool by GreyMagic.

VBE resource

The VBS code is quite easy to read and quite boring. The only interesting part in my opinion is the update mechanism.

It basically looks for certain comments in two YouTube pages with a regex. The url captured by the regex is then used to perform the update.

It sends back to the URL various information about the current machine:

As a final anecdote: the main part of the script just contains an enormous byte array which is dumped to file. The file is a PNG with a RAR file appended to it which in turn contains a VBE encoding executable. The script itself doesn’t seem to make use of this, so no idea why it’s there.

Appended RAR

That’s all. While it may seem a lot of work, writing the article took much longer than performing the actual analysis (~30 minutes, including screenshots).

We hope it may be of help or interest to someone.

Disclosure: Creating undetected malware for OS X

While this PoC is about static analysis, it’s very different than applying a packer to a malware. OS X uses an internal mechanism to load encrypted Apple executables and we’re going to exploit the same mechanism to defeat current anti-malware solutions.

OS X implements two encryption systems for its executables (Mach-O). The first one is implemented through the LC_ENCRYPTION_INFO loader command. Here’s the code which handles this command:

This code calls the set_code_unprotect function which sets up the decryption through text_crypter_create:

The text_crypter_create function is actually a function pointer registered through the text_crypter_create_hook_set kernel API. While this system can allow for external components to register themselves and handle decryption requests, we couldn’t see it in use on current versions of OS X.

The second encryption mechanism which is actually being used internally by Apple doesn’t require a loader command. Instead, it signals encrypted segments through a flag.

Protected flag

The ‘PROTECTED‘ flag is checked while loading a segment in the load_segment function:

The unprotect_segment function sets up the range to be decrypted, the decryption function and method. It then calls vm_map_apple_protected.

Two things about the code above. The first 3 pages (0x3000) of a Mach-O can’t be encrypted/decrypted. And, as can be noticed, the decryption function is dsmos_page_transform.

Just like text_crypter_create even dsmos_page_transform is a function pointer which is set through the dsmos_page_transform_hook kernel API. This API is called by the kernel extension “Dont Steal Mac OS X.kext“, allowing for the decryption logic to be contained outside of the kernel in a private kernel extension by Apple.

Apple uses this technology to encrypt some of its own core components like “Finder.app” or “Dock.app”. On current OS X systems this mechanism doesn’t provide much of a protection against reverse engineering in the sense that attaching a debugger and dumping the memory is sufficient to retrieve the decrypted executable.

However, this mechanism can be abused by encrypting malware which will no longer be detected by the static analysis technologies of current security solutions.

To demonstrate this claim we took a known OS X malware:

Scan before encryption

Since this is our public disclosure, we will say that the detection rate stood at about 20-25.

And encrypted it:

Scan after encryption

After encryption has been applied, the malware is no longer detected by scanners at VirusTotal. The problem is that OS X has no problem in loading and executing the encrypted malware.

The difference compared to a packer is that the decryption code is not present in the executable itself and so the static analysis engine can’t recognize a stub or base itself on other data present in the executable, since all segments can be encrypted. Thus, the scan engine also isn’t able to execute the encrypted code in its own virtual machine for a more dynamic analysis.

Two other important things about the encryption system is that the private key is the same and is shared across different versions of OS X. And it’s not a chained encryption either: but per-page. Which means that changing data in the first encrypted page doesn’t affect the second encrypted page and so on.

Our flagship product, Cerbero Profiler, which is an interactive file analysis infrastructure, is able to decrypt protected executables. To dump an unprotected copy of the Mach-O just perform a “Select all” (Ctrl+A) in the main hex view and then click on “Copy into new file” like in the screen-shot below.

Mach-O decryption

The saved file can be executed on OS X or inspected with other tools.

Decrypted Mach-O

Of course, the decryption can be achieved programmatically through our Python SDK as well. Just load the Mach-O file, initialize it (ProcessLoadCommands) and save to disk the stream returned by the GetStream.

A solution to mitigate this problem could be one of the following:

  • Implement the decryption mechanism like we did.
  • Check the presence of encrypted segments. If they are present, trust only executables with a valid code signature issued by Apple.
  • 3. Check the presence of encrypted segments. If they are present, trust only executables whose cryptographic hash matches a trusted one.

This kind of internal protection system should be avoided in an operating system, because it can be abused.

After we shared our internal report, VirusBarrier Team at Intego sent us the following previous research about Apple Binary Protection:

http://osxbook.com/book/bonus/chapter7/binaryprotection/
http://osxbook.com/book/bonus/chapter7/tpmdrmmyth/
https://github.com/AlanQuatermain/appencryptor

The research talks about the old implementation of the binary protection. The current page transform hook looks like this:

VirusBarrier Team also reported the following code by Steve Nygard in his class-dump utility:

https://bitbucket.org/nygard/class-dump/commits/5908ac605b5dfe9bfe2a50edbc0fbd7ab16fd09c

This is the correct decryption code. In fact, the kernel extension by Apple, just as in the code above provided by Steve Nygard, uses the OpenSSL implementation of Blowfish.

We didn’t know about Nygard’s code, so we did our own research about the topic and applied it to malware. We would like to thank VirusBarrier Team at Intego for its cooperation and quick addressing of the issue. At the time of writing we’re not aware of any security solution for OS X, apart VirusBarrier, which isn’t tricked by this technique. We even tested some of the most important security solutions individually on a local machine.

The current 0.9.9 version of Cerbero Profiler already implements the decryption of Mach-Os, even though it’s not explicitly written in the changelist.

We didn’t implement the old decryption method, because it didn’t make much sense in our case and we’re not aware of a clean way to automatically establish whether the file is old and therefore uses said encryption.

These two claims need a clarification. If we take a look at Nygard’s code, we can see a check to establish the encryption method used:

It checks the first dword in the encrypted segment (after the initial three non-encrypted pages) to decide which decryption algorithm should be used. This logic has a problem, because it assumes that the first encrypted block is full of 0s, so that when encrypted with AES it produces a certain magic and when encrypted with Blowfish another one. This logic fails in the case the first block contains values other than 0. In fact, some samples we encrypted didn’t produce a magic for this exact reason.

Also, current versions of OS X don’t rely on a magic check and don’t support AES encryption. As we can see from the code displayed at the beginning of the article, the kernel doesn’t read the magic dword and just sets the Blowfish magic value as a constant:

So while checking the magic is useful for normal cases, security solutions can’t rely on it or else they can be easily tricked into using the wrong decryption algorithm.

If your organization wishes to be informed by us in the future before public disclosure about findings & issues, it can contact us and become a technical partner for free.

Creating undetected malware for OS X

We have discovered a way to defeat current anti-malware solutions. We will publicly disclose the full details of the issue in a few weeks.

In the meantime, we’re more than happy to confidentially disclose the information with interested organizations (either security vendors or known companies which could benefit from it). Just send an email to: info@icerbero.com