Exposing the Core (part 2)

The release date of the upcoming 0.9.3 version is drawing nearer. Several format classes have already been exposed to Python and in this post I’m going to show you some code snippets. Since it’s impossible to demonstrate all format classes (12 have already been exposed) and all their methods (a single class may contain dozens of methods), the purpose of the snippets below is only to give the reader an idea of what can be achieved.

The SDK organization has changed a bit: because of its increasing size it made sense to subdivide it into modules. Thus, there’s now the Pro.Core module, the Pro.UI one and one module for each format (e.g. Pro.PE).

PDF

This is how we can output to text the raw stream of a PDF:

Output:

Streams in PDFs are usually compressed. Here’s how we can decode the same stream:

Output:

We might also want to iterate through the key/value pairs of a PDF dictionary. Thus, iterators have been implemented everywhere they could be applied. While they don’t yet support the standard Python syntax they are very easy to use:

Output:

Iterating through the objects of a PDF amounts to the same logic:

CFBF (DOC, XLS, PPT, MSI, etc.)

Iterating through the directories of a CFBF can be as simple as:

Output:

Retrieving a stream is equally easy:

Output:

SWF

Here’s how to output the disasm of an ActionScript2 Flash file:

The same can be done for ActionScript3 using the ABCFileObject class.

Class

This is how to disassemble a Java Class file:

DEX

This is how to disassemble an Android DEX file class:

In the upcoming post(s) I’m going to put it all together and do some very interesting things.
So stay tuned as the best has yet to come!

Exposing the Core (part 1)

The main feature of the upcoming 0.9.3 version of the Profiler is the expansion of the public SDK. This basically means that a consistent subset of the internal classes will be exposed. Although it’s a subset, there’s no way to document all methods and functions. Fortunately, many of them should be quite intuitive.

Some of the most common important classes are:

  • NTContainer: this is a generic container which is used to encapsulate data such as files and memory. It’s an extremely important class, since it’s used extensively. Containers can for the time being be created through SDK functions such as: createContainerFromFile/newContainer.
  • NTBuffer/NTContainerBuffer/CFFBuffer/etc.: used to efficiently read iteratively small amounts of data from a source.
  • NTTextStream/NTTextBuffer/NTTextStringBuffer: used to output text. Indentation can be specified.
  • NTXml: used to parse XML. Fast and secure. This class is based on RapidXML.
  • CFFObject: the class from which every format class inherits (ZipObject, PEObject, etc). A very small subset of this class is exposed for now. This will change in the future.
  • CFFStruct: representation of a file format structure.
  • CFFFlags: representation of flags in a CFFStruct.

One of the new additions is that Python can now use filters as well. Do you remember the post about Widget and Views? Let’s use the same code base and change just a few lines:

With just three of the modified lines we are xoring all opened files with the value 0xCC and then show the resulting data in the hex view. The Profiler provides a huge number of filters for any kind of operation and they can be chained, so we could easily compress and then encrypt a file with AES by just replacing one line in the sample above. The function applyFilters displays an optional default wait dialog to the user to interrupt the operation (if it is executing in the main thread). Please remember that the easiest way to obtain the needed filters XML string is to use the UI view and use the export command from the list (context menu->Export…).

NTBuffer generates an exception when a read operations fail. Thus, it should be used as follows:

A small snippet to show how to use NTXml:

Along with the core, several of the file objects will be exposed. A text dump of a structure could be as easy as:

Please notice that the code above misses several checks. We need to make sure that c is valid and Load succeds. I’ll omit these checks here to keep the code minimal.

You might say that printing out a single structure is an easy task. So let’s take a look at another cooler sample:

These few lines output an entire .NET method such as:

Nice, isn’t it? Remember we can change the indentation programmatically.

Of course, it will also be possible to get the object currently being analyzed and similar stuff. But we’ll see how to do that in another post.

If you’re wondering why the case convention for methods is not always the same, the reason is simple. CFFObject/CFFStruct/etc are based on older code which followed the Win32-like convention. Consequently all derived classes like PEObject follow this convention. All other classes use the camel-case convention.

News for version 0.9.2

The new version of the Profiler is out with the following news:

removed virtual memory constraint: large files are now supported
added decompression bomb detection
added media preview for image files
added preview for several PE resources
added text preview for Office Word Documents
added format selection to open file dialog
display format choose dialog when more than one format has been detected
added XFA interactive forms detection inside PDFs
added from/to hex and base64 filters
automatically detect files in Zip archives missing a Central Directory
increased PySide integration
– fixed Office VBA extraction bug
– fixed bug in PDF V4 and V5 Revision encryption

Format detection & selection

To better help with the identification of files which can be interpreted as different formats, the individual file dialog features now some additions.

As you can see the identified formats for the currently selected file are listed (it’s a simple GIF file with a PDF appended at the end). The dialog gives the user also the ability to manually choose the format to use for loading the file. While all this could be achieved even before, it wasn’t as handy as it is now.

However, it wouldn’t make sense to display the file selection dialog when the user uses the shell integration or drops a file to open it. So, instead the Profiler displays a choice dialog for the format in case multiple formats are detected.

Conversion filters

Some new filters are available: from/to hex/base64.

While the actions in the Profiler already feautured a mechanism to do these conversions, having them as filters is extremely useful, because it allows to use them to load embedded files or to convert large portions of data.

Damaged Zip archives

While it has always been possible to manually extract through filters data or partial data from damaged Zip files (e.g. those missing a Central Directory), now the embedded data is automatically analyzed and ready for inspection. This means that even when a Zip archive is truncated and some compressed files are truncated as well, they will nonetheless be automatically detected and be available for inspection by the user.

As you can see many improvements have been introduced. The most important of them is of course the removal of the virtual memory constraints as it represents an important step in the roadmap of the Profiler. Stay tuned as the next version will be important as well!