This post is filled with a lot of information pertaining to targeted attacks and introduces several branch-paths one could take to gather further information. It is worth detailing how this research came to be so that one can follow the paths taken and choices made.
I find that there isn’t a large focus on the vehicle or dropper used in attacks. More often than not it is ignored or analyzed only after a write-up on the payload has been released (speaking in terms of public analysis). By separating out these details, one loses context and additional information that could further narrow the attacker’s motives, goals and behavior. This post will cover a mini research project based on small details shared between what later appeared to be a set of targeted attacks lasting several months across multiple years.
Here is a direct link to the data referenced and shared (note – some columns (k) are hidden – mainly the MD5 file listing from dynamic analysis for every file):
C2s were extracted from the PCAPs using my PCAP Tools scripts and manual verification. It was only a few cases that I found what appeared to be a working C2 or an attempt to post data back out to a server. Those cases appeared to be similar and could likely be used to identify the payload on the system. Sharing between each file was represented by color in the spreadsheet so it could easily be seen when old attacks were being reused.
Timeframes and Outside Open Intelligence
Now that I knew the files were connected in a few ways, I felt like it would be good to do some searching for information about the hash (when it was first seen, write-ups, etc.). For a large portion of my files, most had some presence back to Contagio. This provided me with original emails, dates in which these files were sent, who they were sent to and in some cases confirmed what I had seen dropped.
I did not factor in any of the email content, but doing so could have identified even more trending. Upon sorting the files, it was clear to see a pattern emerge. For the entire year of 2010, there was almost solid coverage in terms of targeted attacks being sent out. Files with matching payloads, clean PDFs and C2s could easily be seen. In some cases the attackers changed nothing at all after months of waiting.
Thoughts and Observations
Command and Control
In some cases the command and control addresses overlapped, but this was often not the case. However, several net blocks did appear more often than others and in a few cases, domains were re-used. Not much was done past observing the call backs given the lengthy period of time that has passed. Those with historical data may be able to make more sense of these details and identify more connections.
Many of the targeted files sent had a policy/government theme to them. This was determined based on the clean files dropped and some of the email information collected. Many of the files detailed discussions over policy change, threats in regards to war readiness, human rights and general news content ripped from the web.
This post doesn’t really convey or provide as much detail necessary to really link these attacks to a single grouping or entity. They do however provide a high-level view of operations for some or many of the actors involved in infiltrating systems with the intent on remaining there for quite some time.
I am releasing my spreadsheet of data in hopes that some others interested in this will look into it more. This data has not been shared elsewhere to avoid any conflicts, so it is now open to the public and free to use. I plan on spending some more time looking at the final payload dropped to see if any commonalities exist between them and will provide updates if anything interesting is found.
This project identified a lot of issues with how this sort of work is done in the public space. You are often left to your own devices and what others have preserved. In this case, the dates and times of these files were valuable to identifying clusters or payload reuse, but it was still limited. Unfortunately, there really isn’t a single source that one can go to and get this information. You often walk away with a small portion of the attack and thus can’t make a full determination.
Having a large amount of potentially related data also poses the issue of how it is stored and queried upon. This project ideally should have entered some sort of database, but I didn’t feel the planning involved with that was worth the effort. It should be noted that developing a scheme to deal with all this data is very difficult to maintain. For that reason alone I sway towards choosing a NoSQL database yet again, but more thought needs to be put into this problem as a whole before attempting to solve it.
Attribution is difficult, but finding commonalities and shared resources between attacks like this make grouping much easier. Trends are impossible to glean without data and it seems that the best analysis and connection making is done when a bountiful amount of data is present. Those interested in joining this fight and making a difference should begin thinking of open ways into storing all this data and easily identifying relationships. Questions, comments, feedback and emails are welcome.