Recently I have been spending my time focusing on targeted PDF files. Something I have always ignored were PDFs that just acted as a vehicle for SWF files, but a lot of these CVE2011-0611 exploits are finding their way in my inbox, so I thought it was time to spend some time doing analysis. Having a couple samples in hand, I expected this to be a pretty simple task to finish, but I noticed there isn’t a really good set of tools on hand to get this done.
Googling for the basic terms brought me to Lenny Zeltser’s May blog post on doing just what I wanted, extracting SWFs. The post was written well and documented an extraction from a recent PDF, but it was using Didier’s tools. While those tools and methods worked for that particular PDF, it failed on most of the samples I had. Reason being because the tools have several limitations in regards to parsing a PDF such as not handling ObjStms, not always decoding objects and not properly handling object versions. Now no tool is perfect, but we have better parsers to pick from and should be able to perform simple tasks like this with ease.
Rather than complain, I turned my efforts over to Peepdf and the PDF X-RAY build. Utilizing functions within Peepdf, I wrote a simple command line tool called swf_mastah to extract a SWF file from a PDF. The benefits with this tool are that it does handle ObjStms, it decodes all the samples I have, handles encryption and it accounts for multiple object versions.
Putting SWF Mastah to Use
Any working tool needs testing, so I picked out two samples from my repository and gave it a whirl.
Sample One: 2368a8f55ee78d844896f05f94866b07
I haven’t tested this tool against all my samples, but I made sure to pick the most complex ones and it faired out well. If anything, this tool should serve as another option for analysts when performing PDF research. As I use it I will make changes and updates to the GitHub repository for PDF X-RAY, so always make sure your repositories are synced.