Rich Text Format support (including OLE extraction)

The work on the upcoming 0.9.4 version of the Profiler has just begun, but there’s already an addition worth mentioning in depth: the support for RTF files. In particular there are two things which are quite useful: the preview of raw text and the extraction of OLE objects.

Let’s start with the first one which is very easy.

Text preview

The same text can be retrieved programmatically with the following code:

sp = proContext().currentScanProvider()
rtf = sp.getObject()
out = NTTextBuffer()
rtf.Output(out)
print(out.buffer)

And now the more interesting part about embedded OLE objects. While RTF is usually regarded as a more safe format than its DOC counterpart, it is able to embed foreign objects through the OLE technology. This technique can be, and is, used by malware authors to conceal the real threat.

Let’s take a look at a file with two embedded objects (a DOC and a PPT).

OLE

What is being viewed in the image above is the metadata of a JPEG contained in a PPT contained in a OLE Stream contained in a RTF file. Nice, isn’t it?

It should be noted that OLE objects in RTF files are stored as OLE Streams an undocumented format (as far as we know). The Profiler is able to parse it nonetheless and it can be observed how this format can contain some interesting information.

OLE Stream metadata

Apart from the original file name we can observe paths which include the user name.

News for version 0.9.3

The new version is out with the following news:

– subdivided the Python SDK into modules
exposed many core and file format classes to Python (part 2)
exposed filters to Python
introduced Python hooks
introduced Python key providers
– improved SDK documentation
added extensions view
added file formats scan option
added decryption keys view
– fixed occasional concurrency issue with large files
– fixed embedded files manual addition issue (affected versions: >= 0.9.1)

Most of the items in the list have been demonstrated in previous posts. The only addition left to discuss is the key dialog. When a file is encrypted and gets decrypted with a key either provided by the user or by a script, then this key ends up in a special list of matched keys. This list can now be inspected by the user.

If some files have been decrypted an additional “Decryption keys” button will be shown. Just click on it and you’ll get the list of matched keys.

That’s all. Enjoy!

Detect broken PE manifests

In the previous post we’ve seen a brief introduction of how hooks work. If you haven’t read that post, you’re encouraged to do so in order to understand this one. What we’re going to do in this post is something practical: verifying the XML correctness of PE manifests contained in executables in the Windows directory.

The hook INI entry:

[PE: verify manifests]
file = pe_hooks.py
scanned = detectBrokenManifest
mode = batch
formats = PE

And the python code:

from Pro.Core import *
from Pro.PE import *

def detectBrokenManifest(sp, ud):
    sp.exclude()
    pe = sp.getObject()
    it = pe.ResourceIterator()
    if it.MoveToRoot(RES_TYPE_CONFIGURATION_FILES) == False:
        return
    while it.Next() and it.RootName() == RES_TYPE_CONFIGURATION_FILES:
        s = it.Data()
        offs = pe.RvaToOffset(s.Num(0)) # same as s.Num("OffsetToData")
        sz = s.Num(1) # same as s.Num("Size")
        if offs == INVALID_OFFSET or sz == 0:
            continue
        bytes = pe.Read(offs, sz)
        xml = NTXml()
        if xml.parse(bytes) != NTXml_ErrNone:
            sp.include()
            break

That’s it!

What the code above does is to ask the PE object for a resource iterator. This class, as our customers can observe from the SDK documentation, is capable of both iterating and moving to a specific resource directory or item. Thus, first it moves to the RES_TYPE_CONFIGURATION_FILES directory and then goes through all its items. If the XML parsing does fail, then the file is included in our final report.

So let’s proceed and do the actual scan. First we need to activate the extension from the extensions view:

Then we need to specify the Windows directory as our scan directory and the kind of file format we’re interested scanning (PE).

Let’s wait for the scan to complete and we’ll get the final results.

So seems these file have a problem with their manifests. Let’s open one and go to its manifest resources:

(if the XML is missing new-lines, just hit “Run action (Ctrl+R)->XML indenter”)

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Copyright (c) Microsoft Corporation -->
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <assemblyIdentity
    name=""Microsoft.Windows.Shell.DevicePairingFolder""
    processorArchitecture=""x86""
    version=""5.1.0.0""
    type="win32"/>
  <description>Wireless Devices Explorer</description>
  <dependency>
    <dependentAssembly>
      <assemblyIdentity
        type="win32"
        name="Microsoft.Windows.Common-Controls"
        version="6.0.0.0"
        processorArchitecture="*"
        publicKeyToken="6595b64144ccf1df"
        language="*" />
    </dependentAssembly>
  </dependency>
</assembly>

As you can see some attributes in assemblyIdentity contain double quotes. I don’t know whether this DLL has been created with Visual C++, but I do remember that this could happen when specifying manifests fields in the project configuration dialog.

Exposing the Core (part 4, Hooks)

Hooks are an extremely powerful extension to the scanning engine of Cerbero Suite. They allow the user to do customize scans and do all sorts of things. Because there’s basically no limit to the applications, I’ll just try to give a brief introduction in this post. In the following post I’ll demonstrate their use with a real-world case.

Just like key providers introduced in the previous post, hooks have their INI configuration file as well (hooks.cfg). This can contain a minimal hook entry:

[Test Hook]
file = test_hooks.py
scanned = scanned

And the python code:

def scanned(sp, ud):
    print(sp.getObjectFormat())

scanned gets called after every file scan and prints out the format of the object. This function is not being called from the main thread, so it’s not possible to call UI functions. However, print is thread-safe and when doing a batch scan will just output to stdout.

Now let’s open Cerbero Suite and go to the new Extensions view.

Extensions view

You’ll notice that the box next to the name of the hook we just created is unchecked. This means that it’s disabled and hence won’t be called. We can enable it manually or we may even specify from the INI file to enable our extension by default:

[Test Hook]
file = test_hooks.py
scanned = scanned
enable = yes

We also specify the scan mode we are interested in:

; not specifying a mode equals to: mode = single|batch
mode = batch

Now the extension will be notified only when doing batch scans. To be even more selective, it’s possible to specify the file format(s) we are interested in:

; this is an optional field, you can omit it and the hook will be notified for any format
formats = PE|SWF

Ok, now let’s create a small sample which actually does something. Let’s say we want to perform a search among the disassembled code of Java Class files and include in the resulting report only those files which contain a particular string.

The configuration entry:

[Search Java Class]
file = test_hooks.py
scanned = searchJavaClass
mode = batch
formats = Class

And the code:

def searchJavaClass(sp, ud):
    from Pro.Core import NTTextBuffer
    cl = sp.getObject()
    out = NTTextBuffer()
    cl.Disassemble(out)
    # search string
    ret = out.buffer.find("HelloWorld") != -1
    sp.include(ret)

Let’s activate the extension by checking its box and then perform a custom scan only on files identified as Java Classes.

Class scan

The result will be:

Class report

The method ScanProvider::include(bool b) is what tells Cerbero Suite which files have to be included in the final report (its counterpart is ScanProvider::exclude(bool b)). Of course, there could be more than one hook active during a scan and a file can be both excluded and included. The logic is that include has priority over exclude and once a file has been included by a hook it can’t be excluded by another one.

Although the few lines above already have a purpose, it’s not quite handy having to change the code in order to perform different searches. Thus, hooks can optionally implement two more callbacks: init and end. Both these callbacks are called from the main UI thread (so that it’s safe to call UI functions). The first one is called before any scan operation is performed, while the latter after all of them have finished.

The syntax for for these callbacks is the following:

def init():
    print("init")
    return print  # returns what the other callbacks will get as their 'ud' argument

def end(ud):
    ud("end")

Instead of using ugly global variables, init can optionally return the user data passed on to the other callbacks. end is useful to perform cleanup operations. But in our sample above we don’t really need to clean up anything, we just need an input box to ask the user for a string to be searched. So we just need to add an init callback.

[Search Java Class]
file = test_hooks.py
init = initSearchJavaClass
scanned = searchJavaClass
mode = batch
formats = Class

And add the new logic to the code:

def initSearchJavaClass():
    from Pro.UI import ProInput
    return ProInput.askText("Insert string:")

def searchJavaClass(sp, ud):
    if ud == None:
        return
    from Pro.Core import NTTextBuffer
    cl = sp.getObject()
    out = NTTextBuffer()
    cl.Disassemble(out)
    # search string
    ret = out.buffer.find(ud) != -1
    sp.include(ret)

Of course, this sample could be improved endlessly by adding options, regular expressions, support for more file formats etc. But that is beyond the scope of this post which was just briefly introduce hooks.

The upcoming version of Cerbero Suite which includes all the improvements of the previous weeks is almost ready. Stay tuned!

Exposing the Core (part 3, Key Providers)

This post will be about key providers, which are the first kind of extension to the scan engine we’re going to see. Key providers are nothing else than a convenient way to provide keys through scripting to files which require a decryption key (e.g. an encrypted PDF).

Let’s take for instance an encrypted Zip file. If we’re not doing a batch scan, Cerbero Suite will ask the user with its dialog to enter a decryption key. While this dialog already has the ability to accept multiple keys and also remember them, there are things it can’t do. For example it is not suitable for trying out key dictionaries (copy and pasting them is inefficient) or to generate a key based on environmental factors (like the name of the file requiring the decryption).

This may sound all a bit complicated, but don’t worry. One of the main objectives of Cerbero Suite is to allow users to do things in the simplest way possible. Thus, showing a practical sample is the best way to demonstrate how it all works.

You’ll notice that the upcoming version of Cerbero Suite contains a keyp.cfg_sample in its config directory. This file can be used as template to create our first provider, just rename it to keyp.cfg. As all configuration files, this is an INI as well. This is what an entry for a key provider looks like:

[KeyProvider Test]
file = key_provider.py
callback = keyProvider
; this is an optional field, you can omit it and the provider will be used for any format
formats = Zip|PDF

Which is pretty much self explaining. It tells Cerbero Suite where our callback is located (the relative path defaults to the plugins/python directory) and it can also optionally specify the formats which may be used in conjunction with this provider. The Python code can be as simple as:

from Pro.Core import *

def keyProvider(obj, index):
    if index != 0:
        return None
    l = NTVariantList()
    l.append("password")
    return l

The provider returns a single key (‘password’). This means that when one of the specified file formats is encrypted, all registered key providers will be asked to provide decryption keys. If one key works, the file is automatically decrypted.

The returned list can contain even thousands of keys, it is up to the user to decide the amount returned. The index argument can be used to decide which bulk of keys must be returned, it starts at 0 and is incremented by l.size(). The key provider will be called until a match is found or it doesn’t return any more keys. Thus, be careful not to always return a key without checking the index, otherwise it’ll result in an endless loop.

When a string is appended to the list, then it will be converted internally by the conversion handlers to bytes (this means that a single string could, for instance, first be converted to UTF8 then to Ascii in order to obtain a match). Sometimes you want to return the exact bytes to be matched. In that case just append a bytearray object to the list.

The same sample could be transformed into a key generation based on variables:

from Pro.Core import *

def keyProvider(obj, index):
    if index != 0:
        return None
    name = obj.GetStream().name()
    # do some operations involving the file name
    variable_part = ...
    l = NTVariantList()
    l.append("static_part" + variable_part)
    return l

And this comes handy when we want to avoid typing in passwords for certain Zip archives which have a fixed decryption key schema.

So, to sum up key providers are powerful and easy-to-use extensions which allow us to test out key dictionaries on various file formats (those for which Cerbero Suite supports decryption) and to avoid the all too frequent hassle of having to type common passwords.