Universal Adversarial Perturbations for Malware
Machine learning has become a very valuable tool for different security applications, allowing to predict new threats and attacks by learning patterns from data. However, the popularity of machine learning also brings new risks, as it has been shown that machine learning algorithms have also their own vulnerabilities. This offers attackers new opportunities to evade detection by exploiting the vulnerabilities of machine learning systems. For instance, in 2017, it was found that CEBER family of ransomware was including specific capabilities to evade detection by the machine learning components present in some antivirus products. Thus, understanding the vulnerabilities of machine learning algorithms is fundamental for mitigating the effect of possible attacks and for the development of more robust systems.
The research community on adversarial machine learning has shown the vulnerability of these algorithms to adversarial examples , inputs that are indistinguishable from genuine data points but that are capable of producing errors and evade detection. The analysis of the impact of this threat in cyber security applications requires a careful modelling of the adversary’s capabilities to manipulate the objects in the input space rather than in the feature representation used internally by the machine learning algorithms. For example, in the context of malware detection, many of the adversarial examples found in the feature space (the space of features used by the learning algorithms) do not correspond to valid or achievable modifications in the problem space, i.e. they may not result in valid manipulations of the software or they can produce non-functional malware. Thus, the characterisation of the attacker’s capabilities for specific cyber security applications is essential to understand better these vulnerabilities and to mitigate them in a more effective way [1].
Understanding the systemic vulnerabilities
Our research teams at Imperial College London and the Research Institute CODE in collaboration with the Systems Security Research Lab (S2Lab) at King’s College London are investigating the systemic vulnerabilities of machine learning algorithms to adversarial examples for malware detection. For this, we have analysed the impact of Universal Adversarial Perturbations (UAPs) on these classifiers [2]. UAPs are a class of adversarial examples in which the same perturbation or manipulation can be used to induce errors in the target machine learning system when applied to many inputs. In this context, UAPs enable attackers to cheaply reuse the same collection of predefined perturbations and evade detection for different types of input malware with a high probability. Thus, UAP attacks represent a significant threat, as attackers naturally gravitate towards using low-effort/high-reward strategies to maximize profit, and a promising approach for the Malware-as-a-Service (MaaS) business model. Our experimental analyses, on Windows and Android domains, show that unprotected machine learning classifiers to detect malware are highly vulnerable to UAP attacks, both in the feature and in the problem space, even if the attacker has limited knowledge about the target algorithm. Thus, our empirical results show that applying the same type of transformations to different malware inputs generates functional malware capable of evading detection with high probability.
Patching the vulnerabilities
Adversarial training has proven to be one of the most effective approaches to defend against these threats. The idea is to use adversarial examples during the training of the learning algorithms to make them robust to adversarial examples and UAP attacks. However, protecting against multiple types of perturbations at the same time is challenging . In this sense, adversarial training strategies in the related work typically consider adversarial examples in the feature space. In the context of malware detection, this is not necessarily the best strategy, as most of these adversarial examples used for training the learning algorithm are not realistic, as the attackers cannot reproduce those adversarial examples in the problem space. In other words, adversarial training in the feature space patches vulnerabilities that do not necessarily exist in the problem space, and thus, can offer a limited protection in practical settings. To overcome this limitation, we have proposed a novel approach to perform adversarial training using the knowledge of the UAP attacks performed in the problem space [2]. In particular, by means of a probabilistic model, we transfer the knowledge of the perturbations in the problem space to the feature space. This allows to patch vulnerabilities that are plausible in the problem space, resulting in adversarially trained learning algorithms that offer a better level of protection against realistic attacks. Although, this does not guarantee that the learning algorithms are robust to all possible perturbations in the problem space, this methodology can be applied incrementally to patch the algorithms when new adversarial transformations or vulnerabilities are found in the problem space.
[1] F. Pierazzi, F. Pendlebury, J. Cortellazzi, L. Cavallaro. “Intriguing Properties of Adversarial ML Attacks in the Problem Space.” IEEE Symposium on Security and Privacy, pp. 1332-1349, 2020.
[2] R. Labaca-Castro, L. Muñoz-González, F. Pendlebury, G. Dreo Rodosek, F. Pierazzi, L. Cavallaro. “Universal Adversarial Perturbations for Malware.” arXiv preprint: arXiv:2102.06747, 2021.
(By Dr. Luis Muñoz-González (Research Associate at Imperial College London), Raphael Labaca-Castro (PhD Candidate at the Research Institute CODE) )