Over the past few days I have been grabbing more and more characteristics from this malware and I have reached a problem. The data is quite dynamic on multiple levels which makes it hard to store in a standard database. What exactly do I mean? Well, lets say I want to store details about each object within a PDF in a column-based database. This becomes an issue because PDFs vary greatly in the amount of objects they have; one PDF could contain 10 while another contains 100.
Keeping the above example in mind, imagine running into that problem in many other areas across the data. This has caused me some issue as I have been using a MySQL database up until this point to store all my data. Despite the setback, I think I have found a suitable solution to store the data that may actually work out way better given the format most of my data is in. MongoDB is a document based database that allows me to store all my data in a database without conforming to a strict schema meaning I can dump dynamic data all over the place. While the querying language is not as robust as a SQL database, the malware data fits quite well into the design of MongoDB.
Rather than breaking all data associated with a PDF entry into separate tables, MongoDB allows me to embed that relevant data within the PDF object. This sort of structure so far appears to be extremely beneficial as I can still have separation from each PDF entry, but store all data (no matter the length) associated with the entry in the same spot. I am hoping that this does not cause issues in the future, but I think it is worth the risk given I have such a small dataset right now and plenty of time. I am currently in the process of diving into how MongoDB works and developing a structure for my data, but I am hoping to have something releasable soon.