sudosecure.net

              is anything truly secure…

Archive for the 'Tutorials' Category

Analyzing PDF files and Shellcode

Posted by jeremy on 11th November 2008

With all the talk over the recent Adobe CVE-2008-2992 vulnerability being exploited in the wild, I thought it would be a good time to document how I go about analyzing PDF files and shellcode.  Before I get into how I go about doing this type of analysis I would like to thank all the contributors to MalwareDomainList.com, as they supplied me with several malicious PDF files and links to malicious PDF files.  I should also note that PDF file analysis is a new subject for me, and I have spent the last few days really diving into the what exactly makes up a PDF file, and the additional functionalities made available by Adobe for PDF files.

The very first thing I do is take a look at the PDF file using either “less” or “more” to display the contents of the PDF file out to the terminal windows just to see if I can spot anything out of the ordinary.  The stream sections in a PDF file can be compressed, and will usually show up looking something like this if they are:

To decompress the stream data and clean up the formating in the PDF file I use a tool called pdftk, and decompress the file using the following command:

pdftk bad.pdf output bad_dumped.pdf uncompress

This command takes the “bad.pdf” file as input and outputs it to “bad_dumped.pdf” while decompressing it.  The really cool thing about the uncompress argument is that you really don’t need to know whether the PDF file is compressed or not, as it will not hurt anything or error out if it isn’t.  So as a rule of thumb I usually just provide the uncompress argument.  pdftk will also attempt format the PDF file making it easier to read using a text editor or the “less” command.  Taking a look at the bad_dumped.pdf file now using the “less” command we can see the compressed stream was really obfusticated JavaScript set to run when the PDF is opened.

To extract the obfusticated JavaScript I simply highlight it, copy and paste it into a text editor, vi in my case.  To deobfusticate the JavaScript I normally use SpiderMonkey or Malzilla.  This time I choose to utilize SpiderMonkey and simply replaced the “eval” calls with “print” calls, and after several iterations the obfusticated code was deofusticated.  The two significant portions of the deofusticated code are:

As you have probably figured out already the first image demonstrates this is one of the exploits currently targeting the Adobe  CVE-2008-2992 vulnerability, and the second image is the shellcode being passed into the util.printf function.  Now to take a look at the shellcode.

To extract the shellcode I simply highlight it, copy and paste it into a text editor for manipulation.  I prefer to use vi, so to clean up the shellcode I normally use regular expressions and substitutions.  To clean up this particular instance I simply executed these two commands in vi:

:%s/[\"+]//g
:%j!

The first regex simply removes all the “ and + characters.  The second simply joins all the lines together to make one long string of text.  This resulted in the shellcode now looking like this:

Now there are several different ways to analyze shellcode, but I tend to use just two.  The first way is some simple perl-fu that simply outputs the character representation of the shellcode.  The perl-fu part I got from an ISC SANS diary entry made by Daniel Wesemann.  Here is the command I execute:

cat shellcode.file | perl -pe ’s/\%u(..)(..)/chr(hex($2)).chr(hex($1))/ge’

In this case it does not work and displays the following:

Since this post is focused more on the steps I use to analysis PDF files and shellcode than this particular exploit attempt, here is an example of this technique actually working.  First the shellcode extracted from another PDF and rendered in the same JavaScript manner:

Executing the exact same command string as before:

cat shellcode.file2 | perl -pe ’s/\%u(..)(..)/chr(hex($2)).chr(hex($1))/ge’

Results in this:

As you can see the shellcode is simple in that it downloads a file into the windows system directory via urlmon and executes it.

The second method I use to analyze shellcode is with the libemu library and test application sctest.  Libemu is a library providing basic x86 emulation and sctest is part of it’s test suite.  sctest will not work in all cases, but you can extend it’s functionality by writing your own test application using the libemu library.

The first thing we need to do is parse the JavaScript encoded shellcode and write it to a file.  We can do this using the same perl-fu from above like this:

cat shellcode.file | perl -pe ’s/\%u(..)(..)/chr(hex($2)).chr(hex($1))/ge’  > shellcode.out

As you can probably see we are simply redirecting the output to a file instead of to the console window.  To verify this step worked you can compare the newly created file using a tool called hexdump.  Simply comparing the output from the following two commands will verify this:

hexdump -C shellcode.out

cat shellcode.file | perl -pe ’s/\%u(..)(..)/chr(hex($2)).chr(hex($1))/ge’  | hexdump -C

These two commands should result in the exact same output and look something like this:

The reason we can not simply just highlight, copy and paste the output from the perl-fu command above from the console window and then paste it into a text file is because the console character set can not display the full range of characters correctly resulting in “???” being displayed.  This is not the case when you redirect to a file as the characters don’t have to be interrupted and displayed to the console.  If you don’t believe me simply cat out the shellcode.out file and compare it to what it looks like when you open it in a text editor like vi.

Now that we have dumped the shellcode into a file we can pass it into sctest via a stdin redirection for analysis.  Here is the command I use:

sctest -Ss 100000 < shellcode.out

This results in the following output:

Looking at the output we can clearly see a url, which you can with a fair amount of confidence conclude that this is the url used in droping a binary using something like urlmon as we saw before.   There are plenty of more in-depth procedures in analyzing PDFs and shellcode, but I have found the procedures I explained in this post to work on about 90% of all the PDF files and shellcode I have looked at in the past.

This pretty much concludes my post on how to analysis a PDF file and JavaScript shellcode. The following are commands and links you may find useful in expanding the analysis procedures above.

PDF Related:

The following command will output PDF document Metadata, Bookmarks and Page Labels:

pdftk bad.pdf data_dump output

PDF document metadata can be very useful in finding out information about the author of the pdf document, date created, and modified dates.  Speaking of gathering information on the author and using metadata to investigate a pdf document I found the article “Shoulder Surfing a Malicious PDF Author” by Didier Stevens to be an outstanding example of what can be learned from this data.  Didier has published a pdf parsing tool written in python called pdf-parser.py, which looks to be very promising in analyzing pdf files.  I just started playing with the tool today, so I can’t really elaborate on it’s functionalities and usability, but I can say that only after a few minutes I was able to extract the same data as I did with the pdftk tool.

JavaScript Related:
Malzilla 1.2.0 was just recently released and Bobby has kindly added several new features and some documentation in the zip file.  Malzilla can be a time savor for anyone that has to deobfusticate JavaScript on a regular basis and can handle lots of obfusticated code with very little JavaScript knowledge required.  I would definitely recommend this tool to anyone wanting to get into deobfusticating JavaScript.  Another really cool option found in Malzilla is the shellcode emulator, which is a wrapper for the sctest libemu test tool.

Shellcode Related
Another tool written by Didier that I find useful is for analyzing shellcode is XORSearch.  It’s basically a small light weight application that will try to brute force shellcode that has been XORed, which is very common.  A little hint to any Mac OS X users out there, to compile XORSearch you have to remove the #include<malloc.h>  from the header, as it is depreciated and not installed.

As always if you have any questions or comments regarding this post feel free to hit me up anytime, I always enjoy hearing from someone that actually read my post.  ;)

Posted in Tools, Tutorials | 1 Comment »