JBIG2 Encoded Malware in PDFs

The upcoming version of Profiler 2.7 adds support for JBIG2 encoding inside PDFs. Although JBIG2 isn’t intended to encode data other than images, it can be used to do so. Quoting the PDF documentation:

The JBIG2Decode filter (PDF 1.4) decodes monochrome (1 bit per pixel) image data that has been encoded using JBIG2 encoding. JBIG stands for the Joint Bi-Level Image Experts Group, a group within the International Organization forStandardization (ISO) that developed the format. JBIG2 is the second version of a standard originally released as JBIG1.

JBIG2 encoding, which provides for both lossy and lossless compression, is useful only for monochrome images, not for color images, grayscale images, or general data. The algorithms used by the encoder, and the details of the format, are not described here. A working draft of the JBIG2 specification can be found through the Web site for the JBIG and JPEG (Joint Photographic Experts Group) committees at http://www.jpeg.org.

Here’s a PDF malware trying to conceal its XFA form by encoding it via JBIG2:

And the decoded content:

While this is in no way common in PDF malware, it’s an effective trick to prevent automatic and manual analysis, since JBIG2 is seldom supported by security tools.

Yet another PDF/XDP Malware

Today we’re going to analyze yet another sample of PDF containing an XDP form. The difference between this sample and the one of my previous post is that this one will be less about JavaScript deobfuscation and more about anti-analysis tricks.

If you want to follow hands-on the analysis, this is the link to the malware sample (password: infected29A). Also make sure to update Profiler to the current 2.6.2 version!

MD5: 4D686BCEE50538C969647CF8BB6601F6
SHA-256: 01F13FE4E597F832E8EDA90451B189CDAFFF80F8F26DEE31F6677D894688B370

Let’s open the Zip archive. The first thing we notice is that the file has been incorrectly identified as CFBF.

That’s because the beginning of the file contains a CFBF signature:

If we were to open the file directly from the file-system, we would be prompted to choose the correct file format:

But as such is not the case, we simply go to the decompressed stream in the Zip archive (or to the CFBF document, it doesn’t matter), position the cursor to the start of the file and press Ctrl+E.

We select the PDF format and then open the newly created embedded file in the hierarchy.

What we’ll notice by looking at the summary is that a stream failed to decompress, because it hit the memory limit. A tool-tip informs us that we can tweak this limit from the settings. So let’s click on “Go to report” in the tool-bar.

This will bring us to the main window. From there we can go to the settings and increase the limit.

In our case, 100 MBs are enough, since the stream which failed to decompress is approximately 90 MBs. Let’s click on “Save settings”, click on “Computer Scan” and then back to our file.

Let’s now repeat the procedure to load the embedded file as PDF and this time we won’t get the warning:

Just for the sake of cleanliness, we can also select the mistakenly identified CFBF embedded file and press “Delete”, in order to remove it from the analysis.

We are informed by the summary that the PDF contains an interactive form and, in fact, we can already see the XDP as child of the PDF.

We could directly proceed with the analysis of the XFA, but let’s just step back a second to analyze a trick this malware uses to break automatic analysis. The XFA is contained in the object 1.0 of the PDF.

Let’s go with the cursor to the stream part of the object (the one in turquoise), then let’s open the context menu and click on “Ranges->Select continuous range” (alternatively Ctrl+Alt+A). This will select the stream data of the object. Let’s now press Ctrl+T to invoke the filters and apply the unpack/zlib filter. If we now click on “Preview”, we’ll notice that an error is reported.

The stream is still decompressed, but it also reports an error. This is one of the trick this malware uses to break automatic analysis: the ZLib stream is corrupted at the very end.

Let’s now open the XFA. Immediately we can see another simple trick to fool identification of the XDP: a newline byte at the start.

Given the huge size of the XDP it’s not wise to open it in the text editor, but we can look at the extracted JavaScript from the summary.

Here are the various parts which make up the JavaScript code:

The first part contains the information needed to construct ROP for the various versions of Adobe Reader. In the last part we can see that the JavaScript code sprays the heap. So probably they rely on a huge image embedded in the XDP (which is actually the reason why the XDP is so big) to trigger the exploit.

The field name is aptly named “ImageCrash”.

Let’s go back to the shellcode part and let’s analyze that. I’m talking about the part of code which starts with:

We could of course copy that part of a text view, remove the \u, then convert to bytes and then apply a filter to reorder them, as in JavaScript the words are in big-endian. But we can do it even more elegantly and make our shellcode appears as an embedded file. So let’s select the byte array from the hex editor:

Let’s now press Ctrl+E and click on the “Filters” button.

What we want to do is to first remove the “\u” escape. So we add the filter misc/replace and specify “\u” as in and nothing as out (we leave ascii mode as default). Now we have stripped the data from the escape characters. Now we need to convert it from ascii hex to bytes. So we add the convert/from_hex filter. The last step, as already mentioned, is that we need to switch the byte order in the words. To do that, we’ll use the lua/custom filter. I only modified slightly the default script:

If you want to avoid this part, you can simply import the filters I created:

By opening the embedded shellcode file, Profiler will have automatically detected the shellcode:

By looking at the hex-view we can already guess where the shellcode is going to download its payload to execute from:

But let’s analyze it anyway. Let’s press Ctrl+A and then Ctrl+R. Let’s execute the action “Debug->Shellcode to executable” to debug the shellcode with a debugger like OllyDbg.

Here’s the (very simple) analysis:

You can also download the Profiler project with the complete analysis already performed (same password: infected29A). Please notice, you’ll be prompted twice for the password: once for the project and once for the Zip archive.

I hope you enjoyed the read!

PDF/XDP Malware Reversing

Recently version 2.6 of Profiler has been released and among the improvements support for XDP has been introduced. For those of you who are unfamiliar with XPD, here’s the Wikipedia description:

“XML Data Package (XDP) is an XML file format created by Adobe Systems in 2003. It is intended to be an XML-based companion to PDF. It allows PDF content and/or Adobe XML Forms Architecture (XFA) resources to be packaged within an XML container.

XDP is XML 1.0 compliant. The XDP may be a standalone document or it may in turn be carried inside a PDF document.

XDP provides a mechanism for packaging form components within a surrounding XML container. An XDP can also package a PDF file, along with XML form and template data. When the XFA (XML Forms Architecture) grammars used for an XFA form are moved from one application to another, they must be packaged as an XML Data Package.”

So I’ll use the occasion to show the reversing of a nice PDF with all the goodies. Let’s open the suspicious PDF.

The PDF is already heavily flagged by Profiler, as it contains many suspicious features.

If we take a look, just out of curiosity, at the object 8 of the PDF we will notice that the XDP data contains a bogus endstream keyword to fool the parsers of security solutions.

Profiler handles this correctly, so we don’t have to do anything, just worth mentioning.

Let’s take a look at the raw XDP data.

As you can see, it is completely unreadable because of the XML escaped characters. Even this is not really important for us, since the XML parser of Profiler handles this automatically, again just worth mentioning.

So let’s open directly the embedded XDP child and we can see a readable and nicely indented XML.

We can see that the XML contains JavaScript code, but Profiler already warns us of this. So let’s just click on the warning.

The code isn’t readable. So let’s select the JavaScript portion and then press Ctrl+R->Beautify JavaScript.

Much better, isn’t it?

The code is quite easy to understand although it’s obfuscated. It takes a value straight from the XDP, processes it and then calls eval on it.

This is the value it takes:

What we want is the result of the processing, before eval is called. So what I did is to modify slightly the JavaScript code like this:

I didn’t paste now the entire value in here as it was way too big, but I did so in the code edit:

At this point, we can just press Ctrl+R->Debug/Execute JavaScript and get the result of the execution.

We will get the following code:

What it does is basically to spray the heap using an array. It changes the payload based on the version of Adobe Reader. The version is retrieved by calling the _l5 function.

Now we could just examine the _l1 or _l2 payloads directly, but just to make sure I let the code generate a spray portion. So I changed the code accordingly and avoided to actually spray a lot of data.

We can run this script in the JavaScript debugger (Ctrl+R->Debug JavaScript).

The final print will give us the payload in memory. We can copy the just the initial part, avoiding the padding. Let’s paste the string into a text editor in Profiler and then Ctrl+R->Hex string to bytes.

If we look at the payload, we can see that the beginning (the marked portion) looks like ROP code. So in order to avoid looking for the gadgets in memory, let’s skip the ROP as it most likely is only going to jump to the actual shellcode. Let’s assume that is the case and thus focus on the data which follows.

We can see a web address at the end of the data. So we could just assume that the shellcode downloads an executable and runs it. But just for the sake of completeness, let’s analyze it.

We can of course disassemble the shellcode by applying a filter to it (Ctrl+T->x86 disasm). But what we’ll do is to use a debugger via Ctrl+R->Shellcode to execute. This way we can quickly step through what it does.

Here’s the commented code:

So yes, in the end it just downloads the file from the address we’ve seen and tries to execute it, then tries to register it as a COM object. Some AV-evasion techniques are also present.


CVE-2010-0188: PDF/Form/TIFF

Given the good reception of the last post, I’ve decided to dedicate more time posting use cases for the Profiler. Today we’re going to analyze a PDF exploiting CVE-2010-0188. Quite old as the name can tell, but it doesn’t really matter for the sake of the demonstration. There’s no real criteria why I picked this one in particular, I just downloaded a pack of malicious PDFs from contagiodump.blogspot.com.

Opening the Zip archive with the Profiler, I chose a random PDF. It is flagged as risky by the Profiler, because it contains an interactive form. If we take a look at the embedded form it’s easy to recognize an embedded image in it which basically represents the whole data of the form. Let’s load this image as an embedded file:

Embedded TIFF

We need to specify the ‘convert/from_base64‘ filter in order to load the actual data. The content of the image is quite obvious. Lots of repetitive bytes, some suspicious strings and some bytes with higher entropy which a trained eye can easily spot as being x86 instructions.

The repetition of the 0x0C 0x90 sequence is easily identifiable as a slide for the shellcode that follows:

Thus, the space after the slide is the start of the actual shellcode. Let’s disassemble it with the Profiler:

Shellcode disasm

In order to quickly analyze the shellcode we can debug it. We select the portion from 0x134 to 0x29E, press Ctrl+R and run the action ‘Shellcode to executable‘. If you don’t have this action, update your copy of the Profiler.

Shellcode to EXE action

What it does is to create a Portable Executable out from the bytes selected in the hex view, so that we can easily debug them with every debugger.

Shellcode to EXE

Optionally we can specify an application to automatically open the generated file. In this case, as you can see, I have selected OllyDbg.

Here’s the analysis of the shellcode:

Very standard code as you can see. It downloads a file with URLDownloadToFileA, executes it with WinExec and quits.

The next time I’ll try to pick out something more recent.

XFA Interactive Form Inspection

The upcoming 0.9.2 version of the Profiler introduces detection of Acro/XFA interactive forms inside PDFs. This technology has been abused numerous times (some recent cases come to mind), so it is now being reported as a potential threat.

The video below shows the inspection of a XFA Interactive Form and how to load a base64-encoded GIF image embedded in it.

Stay tuned!

PDF object search output

In the upcoming 0.8.0 version of the Profiler it will be possible to print out the matches of PDF object searches. This comes very handy during analysis if we want to know, for instance, all values for a given key. The option can be activated in the initial configuration dialog.

PDF object search output option

In this case we’re going to search for URI keys (which specify links).

URI results

URI search has also been added as a predefined search.

PDF AES256 (Revision 6)

The upcoming version 0.7.9 of the Profiler features support for the still to be publicly released PDF symmetric encryption revision 6. While the PDF specifications are not yet freely available, Adobe has already started supporting the new standard.

This is part of our effort of keeping the product up-to-date with the latest standards.

PDF object search

The soon to be released version 0.7.4 of the Profiler features a useful PDF object search functionality. The introduction of this feature was possible thanks to the newly introduced parameters API and format specific actions.

PDF search action

Through this action it’s possible to perform predefined searches as for object with streams, JavaScript or embedded files.

PDF JavaScript search

But it’s also possible to perform custom dictionary searches.

PDF dictionary search

Matches can be tagged and highlighted. Also new searches can be performed without resetting previous matches. In fact, in the screenshot below we can observe two different type of matches.

PDF search matches

This was a requested feature and it certainly is very useful when analyzing PDFs. Stay tuned as more important news will be soon announced. 🙂

The security of non-exec files

This article is based on a speech I gave couple of months ago at DeepSec. I wrote it during the summer, which means I would now expand on some of the paragraphs. Nonetheless, I hope you’ll enjoy the read.


As we know there’s has been a huge increase of malware attacks carried out with files other than executable ones. I’m aware that this is a very generic definition. If we consider a PDF with JavaScript stored inside, would you call it an executable? Probably you wouldn’t, although the script might be executed. Even saying that an executable can only be a file which contains native machine code isn’t accurate. A .NET assembly which contains only managed code would still be considered an executable. But a Shockwave Flash file (with its SWF extension) may not be regarded as standing in the same category. Of course, a Shockwave Flash file is not the same thing as a .NET assembly, but they both contain byte code which at some point is converted into machine code and is executed.

This means that the barriers between executable and non-executable files are thin and in many cases there’s a problem of perception, hence the difficulty of giving this article a completely accurate title. A more appropriate one would have been: the security of all those files generally perceived as harmless or, at least, less dangerous than applications. You may guess why I opted for the other title.

Does this look infected? (no, I’m talking about the file)

This is the most feared issue. How can a non-exec file infect a system? Basically through:

  • Scripting or byte code
  • Shellcode (buffer overflows)
  • Dangerous format features

These vectors are the most common for infection.

Scripting and byte code (security α 1/functionality)

Many file types offer the capability to execute code. However, a distinction has to be drawn between those file formats which offer it just as an additional feature and those formats which completely rely on it.

Shockwave Flash has been a very popular infection vector thanks to its powerful byte code. While it may be apparent even to an unskilled user that a Flash game on the internet is a sort of application, it’s not as apparent under other circumstances.

Very often playing a video in a web browser involves Flash. And I’ve heard many users referring to this as “Flash videos”. They don’t know that what actually happens is that a Flash file is downloaded and its ActionScript code executed.

Download the PDF to continue the reading.