Zeus. Poison Ivy. Conficker. Stuxnet. WannaCry. Even years after discovery, the names of these malware families are still infamous. But new digital threats are constantly arising. Malware production is booming. And that means network defenders must learn to categorize newly discovered malware in a blink. To succeed, they’ll need the right tools. Until now, crucial data has been unavailable. Booz Allen’s new dataset will help cybersecurity teams accurately analyze malware faster than ever.
The ability to quickly pin down the family of malware used during a cyber attack can be a massive boon to an incident responder. Not only does family classification provide immediate insights about the characteristics and behaviors of a malware sample, but it is a core part of the triage, remediation, and attribution efforts. But figuring all this out quickly under pressure is hard. Organizations need new tools to automate the process of malware family classification and empower defenders so that they can swiftly understand the nature of threats and take action—leading to the need for better data.
The lack of reliably labeled data is a major obstacle to the development of any malware family classification tool. One reason is that manual analysis is the only way to be sure of which family a particular sample belongs to—only labels derived this way are said to have “ground truth” confidence. And it’s very time-consuming to do such analysis on even a single file—hence, nearly all datasets label malware with less reliable methods (such as relying on antivirus products).
Using low-quality labels to judge the performance of a malware family classifier can lead to biased or misleading evaluation results—and that’s a big problem. A cybersecurity team charged with defending an organization must be able to have confidence in its analysis toolset. To enable high-confidence benchmarking of malware classification tools, Booz Allen has created the Malware Open-source Threat Intelligence Family (MOTIF) dataset.