Behind the scenes I have been workoing on improving the creation, information stored and parsing of the malpdfobj format. In its current state there is duplication in a couple areas. I found this to be annoying when parsing the data as it felt natural for some pieces of information to be stored together and not separate. This was mainly an issue when pulling object content from each object. Ideally I wanted the hash, length and version also contained in this area.
To handle this issue I removed objects from the hash portion and it will now only be hash_data.file information instead of each object. The object information that was in hash_data has since been moved to the object contents area. Also added was a hexidecimal representation of each object to go along with the output. Since the whole purpose of this format is to share data and find better ways to detect malicious PDFs, I also included some simple checks to classify a object as suspicious. Right now this information is based off other malware characteristics, simple string detection and some other unusual characteristics. The suspicious flag should be taken lightly, but so far seems to do a good job of hitting the payload when using PDF XRAY (tool built on top of the format).
I need to separate out these changes with some of the PDF XRAY work to avoid any issues on releasing. I plan on commiting the changes soon, so keep post on Twitter or Github for the pushed changes. In the next few days there will be a couple more posts on some storage related work I have been doing to harvest data from the entire collection, the work on the API and the core work on PDF XRAY. I am pleased to say that I have been using the tool daily now and it seems to do a good job of helping me out throughout the day. I will be interested to see how it works for others.
Note: MalPdfObj will be changing frequently until a stable release is published. If you are using the format in your environment then please contact me as I can provide a more detailed roadmap and private updates for the tool/output.