Tuesday, 29 October 2013

Researchers Isolate Blackhole Exploit Kit Symptoms, Pinpoint Infected Twitter Accounts

Blackhole Study Cluster Graph
If you wanted to research how a program could distinguish malicious email messages from ordinary mail, you'd want to analyze millions of real-world samples, bad and good. However, unless you have a friend at the NSA you'd have a hard time getting those samples. Twitter, on the other hand, is a broadcast medium. Virtually every tweet is visible to anyone who's interested. Professor Jeanna Matthews and Ph.D. student Joshua White at Clarkson University leveraged this fact to discover a reliable identifier for tweets generated by the Blackhole Exploit Kit. Their presentation was recognized as the best paper at the 8th International Conference on Malicious and Unwanted Software (Malware 2013 for short).
Anybody with an urge to send spam, create an army of bots, or steal personal information can get started by purchasing the Blackhole Exploit Kit. Matthews reported that one estimate suggest the BEK was involved in more than half of all malware infestations in 2012. Another report ties the BEK to 29 percent of all malicious URLs. Despite the recent arrest of Blackhole's alleged author the kit is a significant problem, and one of its many ways of spreading involves taking over Twitter accounts. The infected accounts send tweets containing links that, if clicked, claim their next victim.
Below the Line
Matthews and White collected multiple terabytes of data from Twitter over the course of 2012. She estimates that their data set contains from 50 to 80 percent of all tweets during that time. What they got was much more than just 140 characters per tweet. Each tweet's JSON header contains a wealth of information about the sender, the tweet, and its connection with other accounts.
They started with a simple fact: some BEK-generated tweets include specific phrases like "It's you on photo?" or more provocative phrases like "You were nude at party) cool photo)." By mining the huge dataset for these known phrases, they identified infected accounts. This in turn let them turn up new phrases and other markers of BEK-generated tweets.
The paper itself is scholarly and complete, but the end result is quite simple. They developed a relatively simple metric that, when applied to the output of a given Twitter account, could reliably separate infected accounts from clean ones. If the account scores above a certain line, the account is fine; below the line, it's infected.
Who Infected Who?
With this clear method for distinguishing infected accounts in place, they went on to analyze the contagion process. Suppose account B, which is clean, follows account A, which is infected. If account B becomes infected shortly after a BEK post by account A, chances are very good that account A was the source. The researchers modeled these relationships in a cluster graph that very clearly showed a small number of accounts causing huge numbers of infections. These are accounts set up by a Blackhole Exploit Kit owner specifically for the purpose of spreading infection.
Matthews noted that at this point they had the capability to notify users whose accounts are infected, but they felt this could be seen as too invasive. She's working on getting together with Twitter to see what can be done.
Modern data mining and big-data analysis techniques allow researchers to find patterns and relationships that would have been simply impossible to reach just a few years ago. Not every quest for knowledge pays off, but this one did, in spades. I sincerely hope Professor Matthews manages to get Twitter interested in a practical application of this research.

No comments:

Post a Comment