- Research
- Open access
- Published:
Maxwell’s Demon in MLP-Mixer: towards transferable adversarial attacks
Cybersecurity volume 7, Article number: 6 (2024)
Abstract
Models based on MLP-Mixer architecture are becoming popular, but they still suffer from adversarial examples. Although it has been shown that MLP-Mixer is more robust to adversarial attacks compared to convolutional neural networks (CNNs), there has been no research on adversarial attacks tailored to its architecture. In this paper, we fill this gap. We propose a dedicated attack framework called Maxwell’s demon Attack (MA). Specifically, we break the channel-mixing and token-mixing mechanisms of the MLP-Mixer by perturbing inputs of each Mixer layer to achieve high transferability. We demonstrate that disrupting the MLP-Mixer’s capture of the main information of images by masking its inputs can generate adversarial examples with cross-architectural transferability. Extensive evaluations show the effectiveness and superior performance of MA. Perturbations generated based on masked inputs obtain a higher success rate of black-box attacks than existing transfer attacks. Moreover, our approach can be easily combined with existing methods to improve the transferability both within MLP-Mixer based models and to models with different architectures. We achieve up to 55.9% attack performance improvement. Our work exploits the true generalization potential of the MLP-Mixer adversarial space and helps make it more robust for future deployments.
Introduction
Convolutional Neural Networks (CNNs) have become the de facto standard in the field of computer vision. Deep Neural Networks (DNNs) based on CNNs continue to improve classification performance in computer vision, such as Densenet (Huang et al. 2017), MobileNet (Sandler et al. 2018), EfficientNet (Tan and Le 2019), ReXNet (Han et al. 2021). However, with the development of attention-based transformers in the field of natural language processing, some new models applying this transformer structure have emerged, such as ViT (Dosovitskiy et al. 2020), T2T-ViT (Yuan et al. 2021) and DeiT (Touvron et al. 2021). The performance of these models has caught up with CNNs and is challenging the position of CNNs in the field of computer vision. With further research, researchers found that convolution and attention mechanisms are not unique to good performance, and only using MultiLayer Perceptrons (MLPs) can also achieve good performance, so MLP-Mixer (Tolstikhin et al. 2021) is proposed.
As we all know, DNNs have security risks and are vulnerable to adversarial examples. The adversary adds a well-designed and imperceptible perturbations to the clean input, leading DNNs to incorrect results. Due to the potential risks of DNNs, it is very important to understand whether the recently proposed ViTs and MLP-Mixer are vulnerable to adversarial attacks. The adversarial transferability of ViTs has been well studied (Naseer et al. 2021). In contrast, MLP-Mixer has not been carefully studied in the context of black-box adversarial, and there is no research on the transferability of adversarial attacks against MLP-Mixer. In this work, we focus specifically on transfer-based adversarial attacks and study how to improve the transferability of adversarial examples generated by MLP-Mixer.
Our analysis of MLP-Mixer is based on the following findings. MLP-Mixer differs in architecture from CNNs. Similar to ViTs, MLP-Mixer uses image patches as input, but does not use any convolution and attention mechanisms. Instead, the architecture of MLP-Mixer is entirely based on MLPs. MLP-Mixer contains two types of layers, one is mixing spatial location information, called token-mixing MLPs, and the other is mixing channel information, called channel-mixing MLPs. Information from different patches and channels can be fully mixed, enabling MLP-Mixer to capture the main information of the image. Disturbing the information mixing mechanism of MLP-Mixer can prevent the generated adversarial examples from overfitting the source models, which can improve the cross-architecture transferability of adversarial examples.
We propose a novel adversarial attack called Max-well’s demon Attack (MA). The channel-mixing MLPs allow communication between different channels, and the token-mixing MLPs allow communication between different spatial locations. By randomly masking the inputs of each Mixer layer of MLP-Mixer, we break the channel-mixing and token-mixing mechanisms of MLP-Mixer, making it impossible for different locations and different channels to communicate normally, which makes MLP-Mixer unable to capture the main information of the picture. This achieves an effect similar to Dropout (Srivastava et al. 2014), prevents the generated adversarial examples from overfitting the MLP-Mixer, and improves the fooling rate of the adversarial examples attacking the target models. As shown in Fig. 1, our method can further force the target model to focus on the wrong regions in the adversarial examples compared to the original method.
Our proposed MA method is a detachable component that can be easily combined with existing methods. We conduct extensive experiments on models with multiple architectures using the ImageNet validation set. The adversarial examples generated by our method can improve the fool rate by 55.9%. Our work opens new perspectives for exploring the vulnerabilities of MLP-Mixer and explaining transfer attacks across architectures. It provides insights for enhancing the robustness of MLP-Mixer to fuel its future deployments.
In summary, our main contributions are as follows:
-
We analyze the channel-mixing and token-mixing mechanism in MLP-Mixer and generate adversarial examples with high transferability by breaking them. The proposed MA is applicable to any model based on the MLP-Mixer architecture.
-
Our approach can be combined with existing attacks and significantly improves the performance, bridging the gap that existing attacks cannot execute the cross-architecture attack. Our results demonstrate the feasibility of cross-architecture black-box attacks.
-
We conducted transfer attack experiments using 2 different white-box MLP-Mixers against 7 blackbox MLPs, 10 black-box ViTs, and 39 CNNs. In extensive experimental evaluations, our methods all exhibit optimal performance.
Related work
Adversarial attack and transferability
Adversarial attacks are divided into white-box attacks and black-box attacks. The white-box attacks require access to all information about the target model, such as FGSM (Goodfellow et al. 2014) and PGD (Madry et al. 2017). The black-box attacks do not need to know the target model information, and the mainstream approach is the transfer-based attack. The transfer-based attacks require an alternative model that is similar to the target model, and white-box attacks on the alternative model to generate adversarial examples. The target model is attacked by virtue of the transferability of the adversarial examples, and thus the goal of transfer-based attacks is to improve the transferability of the adversarial examples. The MIM (Dong et al. 2018) enhances the transferability of FGSM by adding a momentum term to the gradient. The DIM (Xie et al. 2019) improves the transferability of adversarial examples by creating different input modes. The TIM (Dong et al. 2019) improves transferability by using a predefined kernel convolution on the gradient. Our method can be easily combined with these existing methods to further improve the transferability of adversarial examples.
Robustness of new architectures
For the robustness of the new models, we mainly focus on the robustness of ViTs and MLP-Mixer. Benz et al. (2021) investigated the adversarial robustness of ViTs and MLP-Mixer. They found that MLP-Mixer is vulnerable to universal adversarial perturbations. But they did not explore the adversarial transferability of MLP-Mixer. To our knowledge, there is currently no work investigating the transferability of adversarial examples generated by MLP-Mixer. Naseer et al. (2021) introduced two strategies to enhance the transferability of adversarial examples generated by ViTs. One is to obtain the output of each ViT block to generate adversarial examples, called Self-Ensemble, and the other is to train a classifier head for each ViT block and use the output of each classifier head to generate adversarial examples, called Token Refinement. We try to introduce these two strategies to MLP-Mixer, but the effect is not significant. After our modification and the introduction of our proposed MA, the transferability of the adversarial examples generated by MLP-Mixer is substantially improved.
Methodology
Consider a clean image sample \(x \in X\) and its ground-truth label \(y \in Y\), a source model \({\mathcal {F}}(x): X \rightarrow Y\) and a target model \({\mathcal {M}}\) which is under-attack. We focus on untargeted adversarial attack, the goal of the transfer-based attack is generating an adversarial example \(x_{adv}\), using the information of source model \({\mathcal {F}}\), which can change the target model’s prediction (\({\mathcal {M}}(x_{adv})\ne y\)). In order to make the adversarial example imperceptible to the human eye, it is necessary to limit the modification magnitude of the adversarial example relative to the clean sample, and we use the \(l_{\infty }\) for the restriction, i.e., \(\Vert x_{adv}-x \Vert _{\infty } < \epsilon\). The optimization problem of generating the adversarial example is defined as follows:
where \(J(\cdot ,\cdot )\) is the loss function (e.g. cross-entropy).
For the MLP-Mixer model \({\mathcal {F}}\) with n Mixer layers can be defined as:
where \(l_i\) represents a single Mixer layer comprising of token-mixing layer and channel-mixing layer and \(f_c\) is the final classification head.
Our MA method is able to control the input of each Mixer layer. We multiply the input I of each Mixer layer of MLP-Mixer by a masking matrix M, which can be defined as follows:
\(\odot\) represents element-wise product. In the case of probability P, we mask the input of each Mixer layer, and M is the matrix directly generated from the Bernoulli distribution.
As shown in Fig. 2, our MA method controls the input of each Mixer layer, and by masking part of the input, we destroy the channel-mixing and token-mixing mechanism of MLP-Mixer, thereby improving the transferability of adversarial examples against MLP-based models. Meanwhile, by masking the input of each Mixer layer of MLP-Mixer, our method achieves a Dropout-like effect. But unlike Dropout dropping neurons, our method is to drop the feature maps of each layer, which can prevent adversarial examples from overfitting the source model MLP-Mixer, thereby improving the transferability of adversarial examples to non-MLP models, such as CNNs and ViTs.
Our method benefits from the fact that each Mixer layer structure of MLP-Mixer is the same, so it only needs to generate a masking matrix of one size, thus saving computational overhead, which is not possible in most CNNs. Our method is a detachable component that can be easily combined with existing gradient-based methods, such as PGD, MIM, DIM, TIM.
Our method can also be combined with the Self-Ensemble (SE) method and Token Refinement (TR) method that attack ViTs. SE and TR methods are also components that can be combined with gradient-based methods. SE obtains the output of each block of ViTs, and then inputs it into the final classifier head respectively. After obtaining all the outputs of the classifier head, SE calculates the average of the outputs as the input of the loss function. We transplant SE into MLP-Mixer and combined them with our MA. For the SE combined with MA method, it can be defined as follows:
Specifically, as shown in Algorithm 1, when combined with the SE method, the inputs of each layer of the Mixer Layer are masked and input to the final classifier head respectively, thus generating a Self-Ensemble of n MLP-Mixer networks with different depths. Finally, obtain all the outputs of the classifier head, calculate their average, and perform backpropagation to obtain the updated gradient of the adversarial example.
TR trains a classifier head for each block of ViTs, uses the output of each classifier head to calculate the loss value, and then averages all the loss values as the final loss value. We also transplant TR into MLP-Mixer and combined them with our MA. For the TR combined with MA method, it can be defined as follows:
where \(f_c^k\) is the classifier head we trained for each Mixer layers. The algorithm combining MA with SE and TR is shown in Algorithm 1.
Experiments
In this section, experimental results of the proposed method are presented. The experimental settings are introduced in “Settings” section, the experimental results of attacking different architecture models are introduced in “Improve transferability to MLP-based models”–“Attack against defense approaches” sections, and the parameter selection and ablation experiments are introduced in “Effect of probability values” and “Ablation study” sections respectively.
Settings
We choose Mixer-B/16 and Mixer-L/16 in MLP-Mixer as source models. For the target model, we report experimental results on the following models, VGG (VGG-13, VGG-16, VGG-19) (Simonyan and Zisserman 2014), ResNet (Resnet-18, Resnet-34, Resnet-50, Resnet-101, Resnet-152) (He et al. 2016), DenseNet (Densenet-121, Densenet-161, Densenet-169, Densenet-201) (Huang et al. 2017), ReXNet (ReXNetV1-10, ReX-NetV1-13, ReXNetV1-15, ReXNetV1-20, ReXNetV1-30) (Han et al. 2021) and MobileNet-V2 based on the CNN architecture, ResNet152-denoise (Xie et al. 2019), ResNet50-FreeAT (Shafahi et al. 2019), ResNet50-FastAT (Wong et al. 2020) and EfficientNet (AdvEfficient-Net-b0, AdvEfficientNet-b1, AdvEfficientNet-b2) (Tan and Le 2019) after adversarial training, ViT-B/16 (Dosovitskiy et al. 2020), T2T-ViT (T2T-12, T2T-14, T2T-19) (Yuan et al. 2021) and DeiT-B (Touvron et al. 2021) based on the transformer architecture, and ResMLP-36 (Touvron et al. 2021) based on the MLP architecture. These models are provided by TIMM (Wightman 2019), the experimental results of more models can be found in the supplementary material. We randomly selected 1k samples from the ImageNet validation set, and these samples can be correctly classified by all the above models. We use the fooling rate to assess the transferability of adversarial examples, i.e. the percentage of adversarial examples whose predicted labels on the target model are inconsistent with ground-truth labels. We uniformly set the perturbation budget \(\epsilon\) to 16, and the number of attack iterations T to 50.
Improve transferability to MLP-based models
In this section, we discuss the experimental results of adversarial transferability between MLP-Mixer and black-box MLP-based models. As shown in the first two columns of Table 1, we report the experimental results of white-box attack and black-box attack on the MLP-Mixer models. Since our method prevents adversarial examples from overfitting the source model, the fooling rate may drop slightly during white-box attacks. But when attacking models of the same architecture, our method can greatly improve the fooling rate. After DIM combined with our method, Mixer-L/16 generated adversarial examples that can improve the fooling rate by 55.9% on Mixer-B/16.
As shown in the third column of Table 1, we report the experimental results of the MLP-based ResMLP-36 as target model. For the basic adversarial attack methods PGD, MIM, DIM and TIM, combined with our method, the adversarial examples generated by Mixer-B/16 can improve the fooling rate on ResMLP-36 by about 20%. After DIM is combined with our method, Mixer-L/16 generated adversarial examples can improve the fooling rate by 38.0% on ResMLP-36. The SE and TR methods combined with our method can further improve the transferability of adversarial examples on ResMLP-36. Experimental results demonstrate that our method is able to break the channel-mixing and token-mixing mechanisms of MLP-Mixer by masking the input of each Mixer layer and improving the transferability of adversarial examples on the MLP-based models.
Improve transferability to ViTs
As shown in the last three columns of Table 1, we report the experimental results of CNN-based VGG-16, ResNet-50 and MobileNet-V2 as target models. PGD, MIM, DIM and TIM, when combined with our method, can improve the fooling rate of adversarial examples generated by Mixer-B/16 on VGG-16 by more than 20%.
It is worth noting that, the SE and TR method combined with our method improves the transferability by about 20% compared to the original method on VGG-16, ResNet-50 and MobileNet-V2. Although SE and TR are not designed for CNNs, the fooling rate of generated adversarial examples attacking CNNs is further improved after SE and TR are combined with our method. As shown in Fig. 1, we show adversarial examples generated on MLP-B16 and GradCam (Gildenblat 2021) images generated on VGG-16, our method can further force the target model to focus on the wrong regions in the adversarial examples compared to the original method. This fully demonstrates the effectiveness and great potential of our method.
Attack against defense approaches
As shown in Fig. 3, the results on the ResNet50-FastAT, ResNet50-FreeAT, AdvEfficientNet-b0 and ResNet152-denoise models after adversarial training demonstrate that our method can further overcome adversarial defense methods such as FastAT, FreeAT, adversarial training and denoise. Our attack method is not only effective for ordinary CNNs, but also for robust models. PGD, MIM, DIM and TIM, as well as SE and TR combined with our method can further improve the fooling rate of adversarial examples on the robust models.
The experimental results demonstrate that our MA method combined with existing adversarial attack methods can comprehensively improve the transferability of adversarial examples on MLP-based models, transformer-based models, CNN-based models and robust models. This means that using only our method, generating adversarial examples on a single model transfers well on MLP-based models, CNN-based models and transformer-based models. Our method achieves the effect of using an ensemble model on a single model, but uses fewer resources and is faster.
Effect of probability values
As shown in Eq. 3, there are two probability values that need to be set in our method, one is the probability P of whether to mask the input and the other is the probability p of generating the masking matrix using the Bernoulli distribution. We report the mean fooling rate of adjusted probability on each kind of models, the source model is Mixer-B/16, and the attack method is a combination of MIM, TR and our MA method. We first test the probability P of whether to mask the input. We randomly set the probability p of the Bernoulli distribution to 0.8. As shown by the solid line in Fig. 4, when P is 0.7, the fooling rate of generated adversarial examples reaches the maximum value on multiple models. Then we test the probability p of the Bernoulli distribution, setting the probability P to 0.7. As shown by the dotted line in Fig. 4, the fooling rate of the generated adversarial examples on multiple models first rises and then declines. When p is 0.8, the fooling rate reaches the maximum value. So we set P to 0.7 and p to 0.8.
Ablation study
We perform ablation experiments on our method. The attack method is a combination of MIM, TR and our method. The source model is Mixer-B/16, and the target model DenseNet-201. The experimental results are shown in Fig. 5. Deleting the probability P of whether to mask the channel, the fooling rate is decreased compared to setting P to 0.7. Although the fooling rate is flat after the probability p of Bernoulli distribution is greater than 0.9, it does not exceed the maximum value at \(P=0.7\), which proves that every part of our method contributes.
Conclusion
We propose a novel transfer-based attack, called Maxwell’s demon Attack (MA). By using MA to mask the part input of each Mixer layer of the MLP-Mixer, we are able to greatly improve the transferability of its generated adversarial examples. On some CNN-based models, the adversarial examples generated by our method on the MLP-Mixer even exceed the transferability of the adversarial examples generated using CNNs. Our method can be simply combined with existing adversarial attack methods against CNNs and ViTs. We conduct experiments on models with multiple architectures, and the experimental results demonstrate the superiority of our method. To our knowledge, we are the first work to study the transferability of MLP-Mixer.
Availability of data and materials
Our data comes from the open source dataset ImageNet at http://image-net.org/.
References
Benz P, Ham S, Zhang C, Karjauv A, Kweon IS (2021) Adversarial robustness comparison of vision transformer and mlp-mixer to cnns. arXiv preprint arXiv:2110.02797
Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, Li J (2018) Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9185–9193
Dong Y, Pang T, Su H, Zhu J (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4312–4321
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16 x 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Gildenblat J (2021) contributors: PyTorch library for CAM methods. GitHub
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
Han D, Yun S, Heo B, Yoo Y (2021) Rethinking channel dimensions for efficient model design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 732–741
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083
Naseer M, Ranasinghe K, Khan S, Khan FS, Porikli F (2021) On improving adversarial transferability of vision transformers. arXiv preprint arXiv:2106.04169
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Shafahi A, Najibi M, Ghiasi A, Xu Z, Dickerson JP, Studer C, Davis LS, Taylor G, Goldstein T (2019) Adversarial training for free! In: NeurIPS
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114
Tolstikhin I, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Keysers D, Uszkoreit J, Lucic M, et al (2021) Mlp-mixer: an all-mlp architecture for vision. arXiv preprint arXiv:2105.01601
Touvron H, Bojanowski P, Caron M, Cord M, El-Nouby A, Grave E, Izacard G, Joulin A, Synnaeve G, Verbeek J, et al (2021) Resmlp: feedforward networks for image classification with data-efficient training. arXiv preprint arXiv:2105.03404
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR, pp 10347–10357
Wightman R (2019) PyTorch image models. GitHub. https://doi.org/10.5281/zenodo.4414861
Wong E, Rice L, Kolter JZ (2020) Fast is better than free: revisiting adversarial training. ArXiv arXiv:2001.03994
Xie C, Wu Y, van der Maaten L, Yuille AL, He K (2019) Feature denoising for improving adversarial robustness. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 501–509
Xie C, Zhang Z, Zhou Y, Bai S, Wang J, Ren Z, Yuille AL (2019) Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2730–2739
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986
Acknowledgements
There is no any third person/ organisation to acknowledge.
Funding
The authors declare that the research doesn’t used any funding sources for the work. There are no any funding sources to disclose.
Author information
Authors and Affiliations
Contributions
H.L.: Research, Data analysis, Documentation, Reporting, Implementations, Problem formulation, Coding, Testing. Y.W.: Supervision, Management, Validation. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lyu, H., Wang, Y., Tan, Ya. et al. Maxwell’s Demon in MLP-Mixer: towards transferable adversarial attacks. Cybersecurity 7, 6 (2024). https://doi.org/10.1186/s42400-023-00196-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s42400-023-00196-3