• Malware Sample Format in MongoDB

    by  • December 30, 2010 • Uncategorized

    I finally got a chance to sit down and work on the format for a malicious sample that would then get inserted into MongoDB. I am not certain if this is exactly how the final format will be represented, but it does work for now in storing a lot of the data I have. In its current form I am storing hash data, structure data and scanning related results. 

    { "hashes" : { "md5" : "04a82f084e8e61bcc513fe738e983b01", "sha1" : "0", "sha256" : "0"}, "structure" : { "countChatAfterLastEof": "", "errorMessage": "", "dates": {"date": []}, "filename": "04a82f084e8e61bcc513fe738e983b01.pdf.vir", "nonStreamEntropy": "", "header": "%PDF-1.7", "version": "0.0.11", "entropy": "", "errorOccured": "False", "isPdf": "True", "keywords": { "keyword": [ {"count": 18, "hexcodecount": 0, "name": "obj"}, {"count": 18, "hexcodecount": 0, "name": "endobj"}, {"count": 4, "hexcodecount": 0, "name": "stream"}, {"count": 4, "hexcodecount": 0, "name": "endstream"}, {"count": 1, "hexcodecount": 0, "name": "xref"}, {"count": 1, "hexcodecount": 0, "name": "trailer"}, {"count": 1, "hexcodecount": 0, "name": "startxref"}, {"count": 1, "hexcodecount": 0, "name": "/Page"}, {"count": 0, "hexcodecount": 0, "name": "/Encrypt"}, {"count": 0, "hexcodecount": 0, "name": "/ObjStm"}, {"count": 3, "hexcodecount": 0, "name": "/JS"}, {"count": 4, "hexcodecount": 0, "name": "/JavaScript"}, {"count": 0, "hexcodecount": 0, "name": "/AA"}, {"count": 1, "hexcodecount": 0, "name": "/OpenAction"}, {"count": 0, "hexcodecount": 0, "name": "/AcroForm"}, {"count": 0, "hexcodecount": 0, "name": "/JBIG2Decode"}, {"count": 0, "hexcodecount": 0, "name": "/RichMedia"}, {"count": 0, "hexcodecount": 0, "name": "/Launch"}, {"count": 0, "hexcodecount": 0, "name": "/Colors > 2^24"} ]}, "countEof": "", "streamEntropy": "", "totalEntropy": ""}, "scanners" : { "virustotal" : {'nProtect': '', 'CAT-QuickHeal': '', 'McAfee': 'Generic.dx!rkx', 'TheHacker': 'Trojan/VB.gen', 'VirusBuster': '', 'NOD32': 'a variant of Win32/Qhost.NTY', 'F-Prot': '', 'Symantec': '', 'Norman': '', 'Avast': 'Win32:Malware-gen', 'eSafe': 'Win32.TRVB.Acgy', 'ClamAV': '', 'Kaspersky': 'Trojan.Win32.VB.acgy', 'BitDefender': 'Trojan.Generic.3611249', 'Comodo': 'Heur.Suspicious', 'F-Secure': 'Trojan.Generic.3611249', 'DrWeb': 'Trojan.Hosts.37', 'AntiVir': 'TR/VB.acgy.1', 'TrendMicro': '', 'McAfee-GW-Edition': 'Trojan.VB.acgy.1', 'Sophos': '', 'eTrust-Vet': '', 'Authentium': '', 'Jiangmin': '', 'Antiy-AVL': 'Trojan/Win32.VB', 'a-squared': 'Trojan.Win32.VB!IK', 'Microsoft': '', 'ViRobot': '', 'Prevx': 'Medium Risk Malware', 'GData': 'Trojan.Generic.3611249', 'AhnLab-V3': '', 'VBA32': '', 'Sunbelt': 'Trojan.Win32.Generic!BT', 'PCTools': '', 'Rising': '', 'Ikarus': 'Trojan.Win32.VB', 'Fortinet': '', 'AVG': 'Generic17.ASTJ', 'Panda': 'Adware/AccesMembre', 'Avast5': 'Win32:Malware-gen'}, "wepawet" : {} } }

    The sample I showed above contains just a small portion of all the information I plan to store. In the end I will be storing all name calls in a PDF with their counts, object details to include lengths, hashes, etc and much more. In MongoDB I am able to ensure indexes on different parts of the main object. This is helpful on so many levels, but to help provide an understanding why, lets say I wanted to find all entries that maybe had an object with the hash of 09a0f7aae0e22b5d80c7950890f3f738.

    In Mongo I could ensure an index:

    db.malpdfs.ensureIndex( { "structure.objects.object.hashes.md5" : 1 } );

    Now if I wanted I could go through all my samples and look for the given hash:

    db.malpdfs.find( { "structure.objects.object.hashes.md5" : "09a0f7aae0e22b5d80c7950890f3f738" } );

    The end result could potentially be a reuse of certain malicious objects between samples that would then provide me with a way to better identify malicious files. This is just one of the ways I see this query/indexing being useful and I plan to work on it more. 

    About