Scan Providers

Version 2.5.0 is close to being released and comes with the last type of extension exposed to Python: scan providers. Scan providers extensions are not only the most complex type of extensions, but also the most powerful ones as they allow to add support for new file formats entirely from Python!

This feature required exposing a lot more of the SDK to Python and can’t be completely discussed in one post. This post is going to introduce the topic, while future posts will show real life examples.

Let’s start from the list of Python scan providers under Extensions -> Scan providers:

Scan provider extensions

This list is retrieved from the configuration file ‘scanp.cfg’. Here’s an example entry:

The name of the section has two purposes: it specifies the name of the format being supported (in this case ‘TEST’) and also the name of the extension, which automatically is associated to that format (in this case ‘.test’, case insensitive). The hard limit for format names is 9 characters for now, this may change in the future if more are needed. The label is the description. The ext parameter is optional and specifies additional extensions to be associated to the format. group specifies the type of file which is being supported; available groups are: img, video, audio, doc, font, exe, manexe, arch, db, sys, cert, script. file specifies the Python source file and allocator the function which returns a new instance of the scan provider class.

Let’s start with the allocator:

It just returns a new instance of TestScanProvider, which is a class dervided from ScanProvider:

Every scan provider has some mandatory methods it must override, let’s begin with the first ones:

_clear gives a chance to free internal resources when they’re no longer used. In Python this is not usually important as member objects will automatically be freed when their reference count reaches zero.

_getObject must return the internal instance of the object being parsed. This must return an instance of a CFFObject derived class.

_initObject creates the object instance and loads the data stream into it. In the sample above we assume it being successful. Otherwise, we would have to return SCAN_RESULT_ERROR. This method is not called by the main thread, so that it doesn’t block the UI during long parse operations.

Let’s take a look at the TestObject class:

This is a minimalistic implementation of a CFFObject derived class. Usually it should contain at least an override of the CustomLoad method, which gives the opportunity to fail when the data stream is first loaded through the Load method. SetDefaultEndianness wouldn’t even be necessary, as every object defaults to little endian by default. SetObjectFormatName, on the other hand, is very important, as it sets the internal format name of the object.

Let’s now take a look at how we scan a file:

The code above will issue a single warning concerning native code. When _startScan returns SCAN_RESULT_OK, _threadScan will be called from a thread other than the main UI one. The logic behind this is that _startScan is actually called from the main thread and if the scan of the file doesn’t require complex operations, like in the case above, then the method could return SCAN_RESULT_FINISHED and then _threadScan won’t be called at all. During a threaded scan, an abort by the user can be detected via the isAborted method.

From the UI side point of view, when a scan entry is clicked in summary, the scan provider is supposed to return UI information.

This will display a text field with a predefined content when the user clicks the scan entry in the summary. This is fairly easy, but what happens when we have several entries of the same type and need to differentiate between them? There’s where the data member of ScanEntryData plays a role, this is a string which will be included in the report xml and passed again back to _scanViewData as an xml node.

For instance:

Becomes this in the final XML report:

The dnode argument of _scanViewData points to the ‘d’ node and its first child will be the ‘o’ node we passed. the xml argument represents an instance of the NTXml class, which can be used to retrieve the children of the dnode.

But this is only half of the story: some of the scan entries may represent embedded files (category SEC_File), in which case the _scanViewData method must return the data representing the file.

Apart from scan entries, we may also want the user to explore the format of the file. To do that we must return a tree representing the structure of our file:

The enableIDs method must be called right after creating a new FormatTree class. The code above creates a format item with id 1 with a child item with id 2, which results in the following:

Format tree

But of course, we haven’t specified neither labels nor different icons in the function above. This information is retrieved for each item when required through the following method:

The various items are identified by their id, which was specified during the creation of the tree.

The UI data for each item is retrieved through the _formatViewData method:

This will display a custom view with a table and a hex view separated by a splitter:

Custom view

Of course, also have specified the callback for our custom view:

It is good to remember that format item IDs and IDs used in custom views are used to encode bookmark jumps. So if they change, saved bookmark jumps become invalid.

And here again the whole code for a better overview:

If you have noticed from the screen-shot above, the analysed file is called ‘a.t’ and as such doesn’t automatically associate to our ‘test’ format. So how does it associate anyway?

Clearly Profiler doesn’t rely on extensions alone to identify the format of a file. For external scan providers a signature mechanism based on YARA has been introduced. In the config directory of the user, you can create a file named ‘yara.plain’ and insert your identification rules in it, e.g.:

This rule will identify the format as ‘test’ if the first 4 bytes of the file match the string ‘test’: the name of the rule identifies the format.

The file ‘yara.plain’ will be compiled to the binary ‘yara.rules’ file at the first run. In order to refresh ‘yara.rules’, you must delete it.

One important thing to remember is that a rule isn’t matched against an entire file, but only against the first 512 bytes.

Of course, our provider behaves 100% like all other providers and can be used to load embedded files:

Embedded files

Our new provider is used automatically when an embedded file is identified as matching our format.

YARA 3.2.0 support

The upcoming 2.3 version of Profiler includes support for the latest YARA engine. This new release is scheduled for the first week of January and it will include YARA on all supported platforms.

One inherent technical advantage of having YARA support in Profiler is that it will be possible to scan for YARA rules inside embedded files/objects, like files in a Zip archive, in a CHM file, in an OLEStream, streams in a PDF, etc.

The YARA engine itself has been compiled with all standard modules (except for cuckoo). Even the magic module is available, since libmagic is also supported by Profiler.

The initial YARA integration comes as a hook extension, an action and Python SDK support. The YARA Python support is the official one and differs from it only in the import statement. You can run existing YARA Python code without modification by using the following import syntax:

So let’s start a YARA scan. To do that, we need to enable the YARA hook extension. On Windows remember to configure Python in case you haven’t yet, since all extensions have been written in it.

When a scan is started, a YARA settings dialog will show up.

This dialog lets us choose various settings including the type of rules to load.

There are four possibilities. A simple text field containing YARA rules, a plain text rules file, a compiled rules file or a custom expression which must eval to a valid Rules object.

The report settings specify how we will be alerted of matches. The ‘only matches’ option makes sure that only files (or their sub-files) with a match will be included in the final report. The ‘add to meta-data” option causes the matches to be visible as meta-data strings of a file. The ‘as threats’ option reports every match as a 100% risk threat. The ‘print to output’ option prints the matches to the output view.

Since we had the ‘only matches’ option enabled, we will find only matching files in our final report.

And since we had also the ‘to meta-data’ option enabled, we will see the matches when opening a file in the workspace.

The YARA scan functionality comes also as an action when we find ourselves in a hex view. You can either scan the whole hex data or select a range. Then press Ctrl+R to run an action and select ‘YARA scan’.

In this case we won’t be given report options, since the only thing which can be performed is to print out matches in the output view.

Like this:

Of course, all supported platforms come also with the official YARA command line utility.

Since this has been a customer request for quite some time, I think it will be appreciated by some of our users.

Stripping symbols from an ELF

Just as the previous post about stripping symbols from a Mach-O binary, here’s one about stripping them from an ELF binary.

The syntax to execute the script is the same as in the previous post, only the called function changes:

Here’s the code:

That’s it!

And here again the complete script which you can use to strip both a Mach-O and an ELF binary.

The Linux x64 version of Profiler should be released soon. So stay tuned!

Stripping symbols from a Mach-O

A common mistake many developers do is to leave names of local symbols inside applications built on OS X. Using the strip utility combined with the compiler visibility flags is, unfortunately, not enough.

So I wrote a small script for Profiler to be run from the command line and I integrated it in the build process.

The syntax to execute the script is:

Here’s the whole code:

The code zeroes names of symbols which are not associated to any section in the binary.

Easy. 🙂

Command-line scripting

The upcoming 2.1 version of Profiler adds support for command-line scripting. This is extremely useful as it enables users to create small (or big) utilities using the SDK and also to integrate those utilities in their existing tool-chain.

The syntax to run a script from command-line is the following:

If we want to run a specific function inside a script, the syntax is:

This calls the function ‘bar’ inside the script ‘foo.py’.

Everything following the script/function is passed on as argument to the script/function itself.

When no function is specified, the arguments can be retrieved from sys.argv.

For the command-line above the output would be:

When a function is specified, the number of arguments are passed on to the function directly:

If you actually try one of these examples, you’ll notice that Profiler will open its main window, focus the output view and you’ll see the output of the Python print functions in there. The reason for this behaviour is that the command-line support also allows us to instrument the UI from command-line. If we want to have a real console behaviour, we must also specify the ‘-c’ parameter. Remember we must specify it before the ‘-r’ one as otherwise it will be consumed as an argument for the script.

However, on Windows you won’t get any output in the console. The reason for that is that on Windows applications can be either console or graphical ones. The type is specified statically in the PE format and the system acts accordingly. There are some dirty tricks which allow to attach to the parent’s console or to allocate a new one, but neither of those are free of glitches. We might come up with a solution for this in the future, but as for now if you need output in console mode on Windows, you’ll have to use another way (e.g. write to file). Of course, you can also use a message box.

But a message box in a console application usually defies the purpose of a console application. Again, this issue affects only Windows.

So, let’s write a small sample utility. Let’s say we want to print out all the import descriptor module names in a PE. Here’s the code:

Running the code above with the following command line:

Produces the following output:

Of course, this is not a very useful utility. But it’s just an example. It’s also possible to create utilities which modify files. There are countless utilities which can be easily written.

Another important part of the command-line support is the capability to register logic providers on the fly. Which means we can force a custom scan logic from the command-line.

This script scans a single file given to it as argument. All callbacks, aside from init, are optional (they default to None).

That’s all! Expect the new release soon along with an additional supported platform. 🙂

EML attachment detection and inspection

The upcoming 0.9.9 version of the Profiler includes some very useful SDK additions. Among these, the addEmbeddedObject method (to add embedded objects) and a new hook notification called ‘scanning’. The scanning notification should be used for long operations and/or to add embedded objects. In this post we’ll demonstrate these new features with a little script to detect attachments in EML files.

EML attachments

One of the advantages of using the Profiler is that we are be able to inspect the sub-files of the attachments as well. The screenshot above shows a PNG contained in an ODT attachment. Nice, isn’t it?

But the nicest part is how little code is necessary to extend the functionality of the Profiler. These are the lines to add to the user hook configuration file:

And this is the Python code:

That’s it. Of course, this is just a demonstration, to improve it we could add support for more encodings apart from ‘base64’ like ‘Quoted-Printable’ for instance.

Some email programs like Thunderbird store EML files by appending them in one single file. In fact, as you can see, the screenshot above displays the attachments of an entire Inbox database. 😉

EML attachment types

Also notice that in the code the addEmbeddedObject method is called by specifying a base64 decode filter to load the file. We can, of course, specify multiple filters and Lua ones as well. This makes it extremely easy to load files without having to write code to decode/decrypt/decompress them. The “?” parameter leaves the Profiler to identify the format of the attachment.

Disasm options & filters

The upcoming version 0.9.4 of the Profiler introduces improvements to several disasm engines: ActionScript3, Dalvik, Java, MSIL. In particular it adds options, so that the user can decide whether to include file offsets and opcodes in the output.

Disasm options

The code indentation can be changed as well.

Another important addition is that these engines have been exposed as filters. This is especially noteworthy since byte code can sometimes be injected or stored outside of a method body, so that it is necessary to be able to disassemble raw data.

Disasm filters

Of course these filters can be used from Python too.

Output:

In the future it will be possible to output a filter directly to NTTextStream, avoiding the need to read from NTContainer.

Stay tuned!

Detect broken PE manifests

In the previous post we’ve seen a brief introduction of how hooks work. If you haven’t read that post, you’re encouraged to do so in order to understand this one. What we’re going to do in this post is something practical: verifying the XML correctness of PE manifests contained in executables in the Windows directory.

The hook INI entry:

And the python code:

That’s it!

What the code above does is to ask the PE object for a resource iterator. This class, as our customers can observe from the SDK documentation, is capable of both iterating and moving to a specific resource directory or item. Thus, first it moves to the RES_TYPE_CONFIGURATION_FILES directory and then goes through all its items. If the XML parsing does fail, then the file is included in our final report.

So let’s proceed and do the actual scan. First we need to activate the extension from the extensions view:

Then we need to specify the Windows directory as our scan directory and the kind of file format we’re interested scanning (PE).

Let’s wait for the scan to complete and we’ll get the final results.

So seems these file have a problem with their manifests. Let’s open one and go to its manifest resources:

(if the XML is missing new-lines, just hit “Run action (Ctrl+R)->XML indenter”)

As you can see some attributes in assemblyIdentity contain double quotes. I don’t know whether this DLL has been created with Visual C++, but I do remember that this could happen when specifying manifests fields in the project configuration dialog.

Exposing the Core (part 4, Hooks)

Hooks are an extremely powerful extension to the scanning engine of the Profiler. They allow the user to do customize scans and do all sorts of things. Because there’s basically no limit to the applications, I’ll just try to give a brief introduction in this post. In the following post I’ll demonstrate their use with a real-world case.

Just like key providers introduced in the previous post, hooks have their INI configuration file as well (hooks.cfg). This can contain a minimal hook entry:

And the python code:

scanned gets called after every file scan and prints out the format of the object. This function is not being called from the main thread, so it’s not possible to call UI functions. However, print is thread-safe and when doing a batch scan will just output to stdout.

Now let’s open the Profiler and go to the new Extensions view.

Extensions view

You’ll notice that the box next to the name of the hook we just created is unchecked. This means that it’s disabled and hence won’t be called. We can enable it manually or we may even specify from the INI file to enable our extension by default:

We also specify the scan mode we are interested in:

Now the extension will be notified only when doing batch scans. To be even more selective, it’s possible to specify the file format(s) we are interested in:

Ok, now let’s create a small sample which actually does something. Let’s say we want to perform a search among the disassembled code of Java Class files and include in the resulting report only those files which contain a particular string.

The configuration entry:

And the code:

Let’s activate the extension by checking its box and then perform a custom scan only on files identified as Java Classes.

Class scan

The result will be:

Class report

The method ScanProvider::include(bool b) is what tells the Profiler which files have to be included in the final report (its counterpart is ScanProvider::exclude(bool b)). Of course, there could be more than one hook active during a scan and a file can be both excluded and included. The logic is that include has priority over exclude and once a file has been included by a hook it can’t be excluded by another one.

Although the few lines above already have a purpose, it’s not quite handy having to change the code in order to perform different searches. Thus, hooks can optionally implement two more callbacks: init and end. Both these callbacks are called from the main UI thread (so that it’s safe to call UI functions). The first one is called before any scan operation is performed, while the latter after all of them have finished.

The syntax for for these callbacks is the following:

Instead of using ugly global variables, init can optionally return the user data passed on to the other callbacks. end is useful to perform cleanup operations. But in our sample above we don’t really need to clean up anything, we just need an input box to ask the user for a string to be searched. So we just need to add an init callback.

And add the new logic to the code:

Of course, this sample could be improved endlessly by adding options, regular expressions, support for more file formats etc. But that is beyond the scope of this post which was just briefly introduce hooks.

The upcoming version of the Profiler which includes all the improvements of the previous weeks is almost ready. Stay tuned!

Exposing the Core (part 3, Key Providers)

This post will be about key providers, which are the first kind of extension to the scan engine we’re going to see. Key providers are nothing else than a convenient way to provide keys through scripting to files which require a decryption key (e.g. an encrypted PDF).

Let’s take for instance an encrypted Zip file. If we’re not doing a batch scan, the Profiler will ask the user with its dialog to enter a decryption key. While this dialog already has the ability to accept multiple keys and also remember them, there are things it can’t do. For example it is not suitable for trying out key dictionaries (copy and pasting them is inefficient) or to generate a key based on environmental factors (like the name of the file requiring the decryption).

This may sound all a bit complicated, but don’t worry. One of the main objectives of the Profiler is to allow users to do things in the simplest way possible. Thus, showing a practical sample is the best way to demonstrate how it all works.

You’ll notice that the upcoming version of the Profiler contains a keyp.cfg_sample in its config directory. This file can be used as template to create our first provider, just rename it to keyp.cfg. As all configuration files, this is an INI as well. This is what an entry for a key provider looks like:

Which is pretty much self explaining. It tells the Profiler where our callback is located (the relative path defaults to the plugins/python directory) and it can also optionally specify the formats which may be used in conjunction with this provider. The Python code can be as simple as:

The provider returns a single key (‘password’). This means that when one of the specified file formats is encrypted, all registered key providers will be asked to provide decryption keys. If one key works, the file is automatically decrypted.

The returned list can contain even thousands of keys, it is up to the user to decide the amount returned. The index argument can be used to decide which bulk of keys must be returned, it starts at 0 and is incremented by l.size(). The key provider will be called until a match is found or it doesn’t return any more keys. Thus, be careful not to always return a key without checking the index, otherwise it’ll result in an endless loop.

When a string is appended to the list, then it will be converted internally by the conversion handlers to bytes (this means that a single string could, for instance, first be converted to UTF8 then to Ascii in order to obtain a match). Sometimes you want to return the exact bytes to be matched. In that case just append a bytearray object to the list.

The same sample could be transformed into a key generation based on variables:

And this comes handy when we want to avoid typing in passwords for certain Zip archives which have a fixed decryption key schema.

So, to sum up key providers are powerful and easy-to-use extensions which allow us to test out key dictionaries on various file formats (those for which the Profiler supports decryption) and to avoid the all too frequent hassle of having to type common passwords.

Another part will follow this one and in my opinion it will be even more interesting than the previous posts. After that yet another post will follow which is going to show a real-world case for demonstration purposes. Then it shouldn’t take long for the new version to be released. Stay tuned!