DTA: distribution transform-based attack for query-limited scenario

Liu, Renyang; Zhou, Wei; Jin, Xin; Gao, Song; Wang, Yuanyu; Wang, Ruxin

doi:10.1186/s42400-023-00197-2

Research
Open access
Published: 02 April 2024

DTA: distribution transform-based attack for query-limited scenario

Renyang Liu ORCID: orcid.org/0000-0002-7121-1257^1,2,
Wei Zhou¹,
Xin Jin¹,
Song Gao¹,
Yuanyu Wang³ &
…
Ruxin Wang⁴

Cybersecurity volume 7, Article number: 8 (2024) Cite this article

535 Accesses
Metrics details

Abstract

In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., hard-label) to the client and a system equipped with certain defense mechanisms could easily detect malicious queries. By contrast, a feasible way is a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries. To implement this idea, in this paper, we bypass the dependency on the to-be-attacked model and benefit from the characteristics of the distributions of adversarial examples to reformulate the attack problem in a distribution transform manner and propose a distribution transform-based attack (DTA). DTA builds a statistical mapping from the benign example to its adversarial counterparts by tackling the conditional likelihood under the hard-label black-box settings. In this way, it is no longer necessary to query the target model frequently. A well-trained DTA model can directly and efficiently generate a batch of adversarial examples for a certain input, which can be used to attack un-seen models based on the assumed transferability. Furthermore, we surprisingly find that the well-trained DTA model is not sensitive to the semantic spaces of the training dataset, meaning that the model yields acceptable attack performance on other datasets. Extensive experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.

Introduction

The recent progress in machine learning reveals a critical problem of deep neural networks (DNNs), which states that most of DNNs are vulnerable to adversarial examples, i.e., being misled by particular examples corrupted by human imperceptible noise (Szegedy et al. 2014; Goodfellow et al. 2015; Kurakin et al. 2017; Dong et al. 2018). Such an unrobust property and its inexplicability have attracted extensive research attention that was devoted to improving the model’s robustness and AI security. While most of the existing studies focus on adversarial attacks in a synthesizing way, i.e., the adversarial examples are generated by directly modifying the pixels of digital images, certain trials have shown that it is possible to attack an AI system physically (Duan et al. 2020; Liu et al. 2022). A typical scenario is autonomous driving, where the driving system relies on deep learning-based techniques to identify traffic signs or other road information for accurate driving decisions. The studies by Liu et al. (2019a) and Eykholt et al. (2018) show that the well-designed disturbances imposed on the traffic signs can easily deceive the recognition module in the driving system, bringing a significant threat to people lives and properties.

While various defense methods for adversarial attacks are constantly being proposed (Akhtar et al. 2018; Wang et al. 2021; Madaan et al. 2020; Zhang et al. 2020; Guo et al. 2023), more powerful attack methods (Carlini and Wagner 2017; Sun et al. 2018; Mirsky 2023) are emerging increasingly and have been able to fight against those defense methods. This attack-defense game will continue along with the development of deep learning and modern AI systems.

The literature on adversarial attacks can be grouped into two classes: white-box and black-box attacks. The white-box attack conveys the case that the details of the target model, such as the structure and the parameters, are known before designing the attack method. In contrast, the black-box setting states that the model details are inaccessible, but only the hard-label or the label probability returned by the target model concerning a specific input can be obtained via a querying-based attack. Clearly, the black-box attack is more feasible than the white-box attack in real applications since the technical details of an online artificial intelligence system are invisible to the public in a general sense, especially for the hard-label setting.

A typical option for the attackers in the black-box setting is to use thousands of queries to collect enough feedback for optimizing the adversarial example iteratively, which is called the optimization-based attack. Nevertheless, the problem here is that the querying-and-optimizing process could result in massive consumption of computing resources and time (Guo et al. 2019; Tu et al. 2019), which results in an inefficient way to perform a successful attack. On the other hand, an advanced AI system could be equipped with certain defense mechanisms that resist intentional attacks (Wu et al. 2020), a good case in point is the Google Cloud Vision API (GCV).^{Footnote 1} In this case, too many trials of attacking can be easily detected by the system. Hence, all these conditions dramatically limit the applicability of the black-box attack and pose the necessity of a query-limited hard-label attack strategy in practical applications.

Besides, the optimization-based attack methods target overly fitting the adversarial examples on the target model so as to achieve a high attack performance. We empirically find that those generated adversarial examples have low transferability and cannot attack other target models effectively. This limits the possibility of exploring the cross-model knowledge about adversarial robustness.

To solve the problems discussed above, in this paper, we formulate the synthesis of adversarial examples in a distribution transform manner. We advocate that the adversarial distribution and the normal distribution are misaligned but transferable. The common parts during transfer can be well-conditioned on the input example itself. The misaligned parts are enriched by employing a generative model that recovers the distributions based on random noise and conditions. Assuming that the vulnerability of different deep models exhibits similar effects, we can reasonably collect many adversarial examples from the existing attack methods, which are then used to characterize the adversarial distribution. In this way, it is possible to optimize the generative model that synthesizes the adversarial examples in a statistical pipeline. As a result, the model can generate batches of examples for attacking without too many queries. To be clear, this advantage benefits from the distribution of the existing adversarial examples and their transferability, which are encoded by the generative model. It is also allowed to apply the attack model on a different data source that has not been involved in training. In the implementation, we develop a conditional normalizing flow-based model to achieve the above goal. The main contributions of this paper can be summarized as follows:

We formulate the black-box attack problem as a generative framework from the perspective that the adversarial distribution can be translated from the normal distribution under certain conditions. Within this perspective, the adversarial examples are transferable across different models and different image contents.
We develop a conditional normalizing flow-based attack method (DTA) that simulates the transformation from the normal distribution to the adversarial distribution. Unlike the existing black-box methods, which need thousands of queries, DTA significantly reduces the query times during an attack while achieving an acceptable attack success rate. Notably, DTA requires only ONE query to perform a successful attack in most cases.
The proposed DTA can generate adversarial examples with high transferability to different black-box models. The well-trained model is not sensitive to the semantic spaces of the training dataset, and we empirically demonstrate that the model trained on ImageNet can be used to generate effective adversarial examples on other datasets.
Extensive evaluations on black-box attacks show that the proposed DTA beats the state-of-the-art hard-label attacks in the aspects of attack success rate, query times and transferability, which demonstrate the validity of the proposed DTA in the adversarial attack.

The rest of this paper is organized as follows. We briefly review the methods relating to adversarial attacks in “Related work” section. In “Preliminary” section, we provide the preliminaries of adversarial attack and normalizing flow. “Methodology” section introduces the details of the proposed DTA framework. The experiments are presented in “Experiments” section, with the conclusion drawn in “Conclusions” section.

Related work

In this section, we briefly review the most relevant methods to the current work. For comprehensive literature on adversarial attacks (including white-box and black-box), please refer to Ding and Xu (2020), Chakraborty et al. (2018).

Black-box attack

A typical case of adversarial attack is the black-box setting that concerns the practice in real applications. Due to the limited information about the target model, the black-box attack is more difficult than the white-box one and receives limited attention from the community. The rationale of most existing methods is the transferability of the adversarial examples across models, which allows the examples generated using the white-box methods to attack the black-box models. For example, the integrated adversarial training method proposed by Tramèr et al. (2018) and the image transformation method proposed by Guo et al. (2018) could effectively carry out the transfer attack.

The ZOO attack proposed by Chen et al. (2017) was one of the earliest black-box attack methods based on queries, which employed the zero-order optimization to construct a zero-step estimator by querying a target model and then, used the estimated gradient to minimize the Carlini and Wagner (C &W) loss (Carlini and Wagner 2017) to find adversarial examples. Ilyas et al. (2018) employed the normal distribution search density to estimate the gradient of the DNN classifier F(x) and adopted the projected gradient descent method to minimize the loss of generating adversarial examples. Instead of minimizing the target of adversarial example generation, $\mathcal {N}$ ATTACK (Li et al. 2019) tried to fit the distribution around the clean data, which was followed by the adversarial examples. In another work, Ilyas et al. (2019) observed that the gradient used by PGD showed a high correlation in time and data and then, used the slot machine optimization techniques to integrate the prior knowledge about gradients into the attack, thus proposing a method called Bandits & Priors which reduced the number of queries during an attack.

Adversarial attacks using generative models

The existing adversarial attack methods based on generative models generally rely on the generative adversarial network (GAN), which is used to synthesize adversarial examples (Baluja and Fischer 2018; Wang and Yu 2019; Huang and Zhang 2020). Most of these methods focus on the white-box attack, where the gradient of the target model is required to update the parameters of GAN. In the black-box setting, a surrogate model is used to approximate the output of the target model, which also drives the gradients of the former to approximate that of the latter, such that the optimized model has similar vulnerability to the target model (Huang and Zhang 2020; Xiao et al. 2018). The previous works which synthesize adversarial examples by the Normalizing Flow model are AdvFlow (Dolatabadi et al. 2020) and $\mathcal{C}\mathcal{G}$-Attack (Feng et al. 2022). AdvFlow first map the input image to a hidden representation by the pre-trained Flow model and find a suitable disturbance in the hidden space, which then uses the natural evolution strategies (NES) to optimize the most helpful disturbance in an iterative updating manner. While $\mathcal{C}\mathcal{G}$-Attack training a conditional Flow model (i.e., c-Glow Lu and Huang 2020) relies on the local white box models with an additional adv loss first and then carries black-box attack with this well-trained flow model. Note that both AdvFlow and $\mathcal{C}\mathcal{G}$-Attack are also query-based and need the models’ whole outputs (soft-label) for attacking, which requires many queries or the more detailed outputs from the target model for performing a successful attack and is limited to attacking the physically deployed black-box models that only return the true label.

The discussion above shows that the existing black-box attack methods mostly require thousands of queries and more detailed outputs on the target model to estimate the gradient and then carry out the attack iteratively to obtain a compelling adversarial example. In this situation, the attack is inefficient and impractical, while the time and computational consumption could be very considerable. In addition, the transferability of the adversarial examples obtained by querying and optimization is often limited; in other words, the generated adversarial examples are overly fit on the target model and are unqualified to attack other target models. When considering the attack under different datasets, the existing methods possess very limited capability to perform a successful attack. However, both the cross-model attack ability and the cross-dataset attack ability are sometimes valuable for real applications, especially when we do not have many chances to perform attack trials.

Therefore, the black-box attack poses the request for a method that is direct, efficient, and effective to perform attacks for different models and different datasets within limited queries and information. To achieve this goal, we know from the previous studies that the adversarial examples have a particular distribution related to the normal examples, and learning from such an adversarial distribution could help us to explore the vulnerability of different models. Hence, we are well motivated to develop a generative model that transfers from (or conditioned on) the distribution of clean examples to adversarial ones. It is also possible to achieve cross-dataset attacks by involving an increasing number of adversarial examples during offline learning.

Preliminary

Before introducing the details of the proposed framework, in this section, we first present the preliminary knowledge about adversarial attacks and normalizing flows.

Adversarial attack

Given a well-trained DNN classifier f and a correctly classified input $(\varvec{x},y) \sim D$, we have $f(\varvec{x})=y$, where D denotes the accessible dataset. The adversarial example $\varvec{x}'$ is a neighbor of $\varvec{x}$ and satisfies that $f(\varvec{x} ') \ne y$ and $\left\| \varvec{x}' - \varvec{x} \right\| _p \le \epsilon$, where the $\ell _p$ norm is used as the metric function and $\epsilon$ is usually a small value such as 8 and 16 with the image intensity [0, 255]. With this definition, the problem of finding an adversarial example becomes a constrained optimization problem:

$$\begin{aligned} \varvec{x}_{adv}= \underset{\left\| \varvec{x}'-\varvec{x} \right\| _p \le \epsilon }{arg\ max\ \ell } (f(\varvec{x}') \ne y), \end{aligned}$$

(1)

where $\ell$ stands for a loss function that measures the confidence of the model outputs.

In the optimization-based methods, the above problem is solved by computing the gradients of the loss function in Eq. 1 to generate the adversarial example. By contrast, in this work, we formulate a statistical transformation from $P(\varvec{x})$ to $P(\varvec{x}')$ instead of involving an online optimization process.

Normalizing flow

The normalizing flows (Dinh et al. 2015; Kingma and Dhariwal 2018) are a class of probabilistic generative models, which are constructed based on a series of completely reversible components. The reversible property allows to transform from the original distribution to a new one and vice versa. By optimizing the model, a simple distribution (such as Gaussian distribution) can be transformed into a complex distribution of real data. The training process of normalizing flows is indeed an explicit likelihood maximization. Considering that the model is expressed by a fully invertible and differentiable function which transfers a random vector $\varvec{\varvec{z}}$ from the Gaussian distribution to another vector $\varvec{x}$, we can employ such a model to generate high dimensional and complex data.

Specifically, given a reversible function ${f:} \ \mathbb {R}^d\rightarrow \mathbb {R}^d$ and two random variables $\varvec{z}\sim p(\varvec{z})$ and $\varvec{z}'\sim p(\varvec{z}')$ where $\varvec{z}' = f(\varvec{z})$, the change of variable rule tells that

$$\begin{aligned} p(\varvec{z}')= & {} p(\varvec{z})\left| det \frac{\partial {f^{-1}}}{\partial {\varvec{z}'}} \right| , \end{aligned}$$

(2)

$$\begin{aligned} p(\varvec{z})= & {} p(\varvec{z}')\left| det \frac{\partial {f}}{\partial {\varvec{z}}} \right| , \end{aligned}$$

(3)

where det denotes the determinant operation. The above equation follows a chaining rule, in which a series of invertible mappings can be chained to approximate a sufficiently complex distribution, i.e.,

$$\begin{aligned} \varvec{z}_K = f_K \odot \cdots \odot f_2 \odot f_1(\varvec{z}_0), \end{aligned}$$

(4)

where each f is a reversible function called a flow step. Equation 4 is the shorthand of $f_K(f_{k-1}(...f_1(\varvec{x})))$. Assuming that $\varvec{x}$ is the observed example and $\varvec{z}$ is the hidden representation, we write the generative process as

$$\begin{aligned} \varvec{x}=f_{\theta }(\varvec{z}), \end{aligned}$$

(5)

where $f_{\theta }$ is the accumulate sum of all f in Eq. 4. Based on the change-of-variables theorem, we write the log-density function of $\varvec{x}=\varvec{z}_K$ as follows:

$$\begin{aligned} -\log {p_K}(\varvec{z}_K)=-\log p_0(\varvec{z}_0)-\sum _{k=1}^{K}\log \left| det\frac{\partial \varvec{z}_{k-1}}{\partial \varvec{z}_{k}} \right| , \end{aligned}$$

(6)

where we use $\varvec{z}_k=f_k(\varvec{z}_{k-1})$ implicitly. The training process of normalizing flow is minimizing the above function, which exactly maximizes the likelihood on the observed training data. Hence, the optimization is stable and easy to implement.

Conditional normalizing flow

In certain cases, the transformation between distributions is conditioned on external variables, for example a face is conditioned on age, gender, expression, etc. This has already been considered in the generative models such as CVAE (Sohn et al. 2015) and CGAN (Mirza and Osindero 2014). In the flow-based models, the conditional normalizing flows allow us to involve the conditions in each flow step. Specifically, the reversible function f accepts both the input variable $\varvec{z}$ and the condition variable c as inputs, which is formally expressed as $\varvec{z}'=f(\varvec{z};c)$, while the inverse mapping is $\varvec{z}=f^{-1}(\varvec{z}';c)$. By denoting the K-th flow step as $f_K$, the change of variables theorem says that

$$\begin{aligned} &-\log {p_K}(\varvec{z}_K;c)) \\& =-\log p_0(\varvec{z}_0;c) -\sum _{k=1}^{K}\log \left| det\frac{\partial \varvec{z}_{k-1}}{\partial f_k(\varvec{z}_{k-1};c)} \right| . \end{aligned}$$

(7)

Given a well-trained flow model, we first sample $\varvec{z}_0$ from the Gaussian distribution and then perform a forward flow as

$$\begin{aligned} \varvec{x}=f_\theta (\varvec{z}_0;c). \end{aligned}$$

(8)

If we are interested in computing the probability density of an observed example $\varvec{x}$, the inverse mapping is expressed as

$$\begin{aligned} \varvec{z}_0=f^{-1}_\theta (\varvec{x};c). \end{aligned}$$

(9)

Methodology

In this section, we introduce the whole framework of the proposed adversarial attack in the generative manner, and the details of the model learning and inference.

The DTA framework

Recall that the conventional attack methods generate the adversarial perturbation by performing a complex inference based on the target model, which is then added to the original example, resulting in the final adversarial example. This process is highly dependent on the inference result, which yields heavy computational cost and generally produces a single “optimal” example according to certain criteria. By contrast, in this paper, we start from a novel perspective and propose a novel generative adversarial attack method, which is called the distribution transform-based attack (DTA). Specifically, we advocate that all adversarial examples could follow a certain distribution that is misaligned with the normal distribution. This is mainly caused by the fixed training data involved in optimizing different deep models. In other words, the training data characterize a fixed distribution that is approximated by those models during training and hence, the distribution of the unseen data in training is common as well to the models. This explains why we consider that the adversarial examples (most of which are unseen data in training) follow a misaligned distribution. At this point, we reasonably assume a transformation from the distribution of normal examples to the distribution of adversarial examples. Since those two types of data exhibit similar appearances, the two distributions ideally overlap with each other and can be transformed mutually.

The whole framework of the proposed method is illustrated in Fig. 1. Based on the above discussion, we propose to collect a large number of adversarial examples $\varvec{X}'$ by employing the existing white-box attack methods. While these examples look similar to the normal examples $\varvec{X}$, a direct transformation between these two types of examples is nevertheless difficult or even prohibitive. This is because the small perturbation could be overwhelmed by the complex structures and textures in the normal example, and is therefore insensitive to the generation model. To alleviate this issue, we consider that the small perturbations should be conditioned on the normal inputs, which provide cues in the generative process. Specifically, the conditional normalizing flow is employed to implement the conditional generation process, which allows synthesizing the adversarial example based on the normal example and a random variable (Lu and Huang 2020; Pumarola et al. 2020; Liu et al. 2019b). The random variable could diversify the generated example, that is, when the flow model is well trained, we can randomly sample in the latent space $\varvec{Z}$ to generate a batch of adversarial examples, which are inferenced forwardly by the flow model. The details of the flow model and the training and inference processes are discussed in the following sections.

Conditional normalizing flow for attack

To implement a powerful normalizing flow that has a strong ability of processing image textures, we employ the basic GLOW model (Kingma and Dhariwal 2018), which involves the convolutional operation, the coupling operation, and the normalization operation in the model construction. Since the original GLOW model does not consider conditions in the probabilistic modeling, we follow the work in Ardizzone et al. (2019), Lu and Huang (2020) to properly integrate the image content in conditions. The architecture of the flow model is illustrated in Fig. 2. As seen, a basic flow step is a stack of the Actnorm layer, the 1x1 convolutional layer, and the affine coupling layer. A single flow block is constructed by cascading a squeeze layer, K flow steps, and a split layer. Then, the whole architecture is built up by repeating the flow block for $L-1$ times, followed by the final layers, which consist of a squeeze layer and K flow steps. The details of the Actnorm layer, the 1x1 convolutional layer, the affine coupling layer, and the squeeze and split layers can be found from GLOW (Kingma and Dhariwal 2018). Regarding the conditions involved in each layer, it is proved that the original image is unsuitable to be directly fed to the condition. This is because the original image provides very low-level features which are insufficient for feature modeling and can burden the sub-networks in the affine coupling layer. Instead, high-level features are preferable. Hence, we follow the options in Ardizzone et al. (2019), Lu and Huang (2020), which suggests employing a pre-trained deep model to extract high-level features that are used as the condition. Specifically, we use the VGG-19 model pre-trained on CIFAR-10, SVHN, and ImageNet, respectively, and extract the features from the last conv layers. It is also possible to replace the VGG-19 model with other proper choices. During model training, the VGG-19 model can be fixed or optimized jointly with the flow model. In the current work, we fix this feature extraction model for simplicity.

Adversarial data collection

Recall that the adversarial examples obtained by using the existing white-box attack methods play a key role in the proposed framework. Hence, regarding how these examples are obtained, we present the details here instead of in the experiment section.

The concerned datasets in the current work include CIFAR-10 (Krizhevsky et al. 2009), SVHN (Netzer et al. 2011), and ImageNet (Russakovsky et al. 2015), while the to-be-attacked models are also trained on these datasets. Specifically, the training sets of CIFAR-10 and SVHN are selected, while for ImageNet, we choose about 30,000 images from the validation set. All these data are used as normal examples by the white-box attack methods to generate adversarial examples. On CIFAR-10, the PGD method (Carlini and Wagner 2017) is employed as the attacker, whereas the pre-trained ResNet-50 is employed as the target model. The MI-FGSM method based on multi-model integration (Dong et al. 2018) is employed on SVHN and ImageNet. For SVHN, ResNet-50 (He et al. 2016b), InceptionV3 (Szegedy et al. 2016), and SeNet-18 (Hu et al. 2018) are integrated, while the models are modified versions of the public ones^{Footnote 2} and trained from scratch. For ImageNet, InceptionV4 (Szegedy et al. 2017), InceptionResnetV2 (Szegedy et al. 2017), and ResNetV2-101 (He et al. 2016a) are integrated, while the models are pre-trained and publicly accessible.^{Footnote 3} The adversarial examples are generated under two perturbation levels, including $\epsilon = 8$ and $\epsilon = 16$. The other hyperparameter settings of the attack methods follow the respective papers (Madry et al. 2018; Guo et al. 2019; Dolatabadi et al. 2020). In this way, we collect a batch of adversarial examples that will be used to optimize the proposed flow model. Note that the generated adversarial examples on a certain dataset are used to train the flow model that attacks the target model of the corresponding dataset. The cross-dataset attack is not applicable.

To make a fair comparison in experiments, the normal examples for test are different from those for training. Specifically, for CIFAR-10 and SVHN, the test sets are employed as the input examples. For ImageNet, we randomly select 1000 images from the validation set, which are completely different from the 30,000 ones mentioned above.

Training details

As introduced in “Conditional normalizing flow” section, the training of the conditional normalizing flow is to maximize the likelihood function on the training data with respect to the model parameters. Formally, assume that the collected adversarial example is denoted by $\varvec{x}'\sim \varvec{X}'$. The normal example is denoted by $\varvec{x}\sim \varvec{X}$, where the condition network produces the features as $c(\varvec{x})$ (short for c). The hidden representation follows the Gaussian distribution, i.e. $\varvec{z} \sim \mathcal {N}(0,1)$. The flow model is denoted by f, parameterized $\theta$, which have $\varvec{x}'=f_\theta (\varvec{z};c)$ and $\varvec{z}=f^{-1}(\varvec{x}';c)$. Then, the loss function to be minimized is expressed as

$$\begin{aligned} \begin{aligned} L(\theta ;\varvec{z},\varvec{x}',c)&=-\log p(\varvec{x}'|\varvec{z};c,\theta ) \\&= -\log p_{\varvec{z}}(f^{-1}_\theta (\varvec{x}';c);c)-\log \left| det \frac{\partial f^{-1}_\theta (\varvec{x}';c)}{\partial \varvec{x}'}\right| , \end{aligned} \end{aligned}$$

(10)

where the right-hand side of the above equation can be expanded layer-wisely according to Eq. 7. By optimizing the above objective, the learned distribution $p(\varvec{x}'|\varvec{z};c,\theta )$ characterizes the adversarial distribution as expected.

Considering that the interested task here is to generate an adversarial example that has a similar appearance to the example fed into the condition. Hence, we must ensure that the generation process from $\varvec{z}$ to $\varvec{x}'$ would bring no surprising result. To implement this, we impose an MSE loss in the training process. Specifically, the difference between the generated adversarial example $\varvec{x}'$ and the original input $\varvec{x}_{tr}^{'}$ is minimized according to

$$\begin{aligned} L_{MSE}(\theta ;\varvec{z},c) = ||f_\theta (\varvec{z};c) - \varvec{x}_{tr}^{'} ||_2, \end{aligned}$$

(11)

where $\varvec{z}$ is randomly sampled from the Gaussian distribution in each training iteration.

Note that the above losses in Eqs. 10 and 11 consider the supervision in different spaces, where the former computes the loss in the hidden space while the latter concerns the adversarial space. Optimizing the losses simultaneously can bring unexpected effects since the loss propagation directions are conflicting. Hence, we propose to perform back-propagation based on the two losses alternatively. To be clear, in each iteration, we first update the model parameters based on Eq. 10. Then, given the input batch just used which contains $\varvec{x}$, we randomly sample a batch of $\varvec{z}$ and perform a forward flow to generate a batch of $\varvec{x}'$. The MSE loss between $\varvec{x}'$ and $\varvec{x}_{tr}^{'}$ is computed to update the model parameters, followed by the next iteration.

In the training process, we use the Adam algorithm to optimize the model parameters, while the learning rate is set as $10^{-4}$, the momentum is set to 0.999, and the maximal iteration number is 10,000.

Generation of adversarial examples

Given a well-trained flow model $f_\theta$, the hidden representations of the collected adversarial examples are expected to follow the assumed Gaussian distribution $\mathcal {N}(0, 1)$. But in practice, we find that these representations have shifted mean and standard deviation (std) values. This may be because the training data is insufficient. We may consider that the involved MSE loss could bias the center of the Gaussian distribution, but experiments tell that the shift occurs even without the MSE loss. Based on this observation, we also surprisingly find that sampling $\varvec{z}$ based on the shifted mean and std values can bring improved performance than sampling from $\mathcal {N}(0, 1)$. Hence, before generating adversarial examples, we compute the hidden representations of all the training adversarial examples, which are used to calculate the mean value $\varvec{\mu }$ and the std value $\sigma$, resulting in a new distribution $\mathcal {N}(\varvec{\mu }, \sigma ^2)$.

To generate an adversarial example, given an input normal example $\varvec{x}$, we first randomly sample $\varvec{z}$ from $\mathcal {N}(\varvec{\mu }, \sigma ^2)$ and then perform a forward process via $\varvec{x}_{gen}=f_\theta (\varvec{z}; c(\varvec{x}))$. For the fairness of comparison, we follow the existing attack methods which constrain the perturbation within a certain range. Once we obtain the adversarial example $\varvec{x}_{gen}$, we employ the clip function

$$\begin{aligned} \varvec{x}'=Clip(\varvec{x}+Clip( \varvec{x}_{gen}-\varvec{x}, -\epsilon , \epsilon ),0,1) \end{aligned}$$

(12)

to ensure the imperceptible property of the perturbation, where $\epsilon$ is the acceptable noise budget during the attack. Two common cases are considered, including $\epsilon =8$ and $\epsilon =16$ for the pixel value $\in [0,255]$ (it will be scaled to $\epsilon =8/255.$ and $\epsilon =16/255.$ as the pixel value $\in [0,1]$ in code implementation).

The whole algorithm of DTA is listed in Alg. 1, which could help readers to reimplement our method step-by-step.

Experiments

In this section, we evaluate the performance of the proposed DTA on black-box adversarial attacks through extensive experiments and comparisons.

Table 1 The performance comparison of black-box adversarial attack on the CIFAR-10 dataset, with the perturbation $\epsilon =8$ and $\epsilon =16$

Full size table

Table 2 The performance comparison of black-box adversarial attack on the SVHN dataset, with the perturbation $\epsilon =8$ and $\epsilon =16$

Full size table

Settings

As mentioned previously, three popular datasets are considered, including CIFAR-10 (Krizhevsky and Hinton 2009), SVHN (Netzer et al. 2011), and ImageNet (Russakovsky et al. 2015).

Regarding the target models to be attacked, we employ the public models pre-trained on the corresponding datasets or the models that are trained from scratch if not publicly accessible. Specifically, we mainly target models include the VGG-16 (Simonyan and Zisserman 2015), the MobileNetV2 (Sandler et al. 2018), and the ShuffleNetV2 (Ma et al. 2018). For CIFAR-10 and ImageNet, we use their pre-trained weights from the GitHub repository pytorch-cifar-models^{Footnote 4} and the PyTorch,^{Footnote 5} respectively. While for SVHN, we trained these models from scratch, where the training process of each model is stopped until the best performance is obtained, in which condition the classification accuracy on the test set is above 90%.

To objectively evaluate the performance of the proposed framework, we make a comparison with the related state-of-the-art decision-based (hard-label) methods, including Bandits (Ilyas et al. 2019), Sign-OPT (Cheng et al. 2020), Rays (Chen and Gu 2020), Tangent Attack (Tangent) (Ma et al. 2021), Triangle Attack (TA) (Wang et al. 2022) and CGBA (Reza et al. 2023). The implementations of these methods are based on the released codes with default settings in the corresponding papers. The proposed DTA is implemented by using the PyTorch framework. To make a quantitative comparison, we use the metrics of attack success rate (ASR), average query count and median query count as the previous works use (Chen and Gu 2020; Dong et al. 2022).

All the experiments are conducted on a GPU server with a single Tesla V100 32GB GPU, 2 x Xeon Silver 4208 CPU, and RAM 256GB.

Table 3 The performance comparison of black-box adversarial attack on the ImageNet dataset, with the perturbation $\epsilon =8$ and $\epsilon =16$

Full size table

Quantitative comparison with the state-of-the-arts

Evaluation on ASR and query times: Recall that the proposed DTA aims to lower the query times while maintaining a pleasing attack success rate. The success rate with sufficiently high query times may reach a certain bound, but this is not the scope of the current work. Hence, we make the comparison under a set of limited queries by setting the maximal number of queries to 100, 200, 300, 400, and 500. The selected competitors are all hard-label attacks. Thus, an attack is successful only within the predefined query number and otherwise, failure occurs. The comparisons on CIFAR-10 under $\epsilon = 8$ and $\epsilon = 16$ are shown in Table 1, while the results on SVHN are listed in Table 2. It can be seen that DTA achieves higher attack success rates than the competitors in most cases, which validates that the proposed generative model can synthesize effective adversarial examples. It should be especially noted that the average query number required by DTA is much smaller than that required by the other methods.

The experiment on ImageNet poses a challenging case for our end-to-end adversarial example generation since the data is much more complex than CIFAR-10 and SVHN. The results on ImageNet are listed in Table 3, where we consider the perturbation level of $\epsilon =16$. The maximal number of queries is limited to 100, 200, 300, 400, and 500. We see that our method is superior to baselines on all metrics and is competitive with Rays on ASR in most instances. Again, DTA requires a very limited number of queries to perform a successful attack. Considering the attack performance in query-limited scenarios, we report the empirical results of Bandits, Rays, TA, and CGBA in the following sections. Besides, we prefer to report the results under the noise budget $\epsilon =16$ in most cases.

Evaluation on defense model To evaluate the performance of the DTA on attacking robust models, we make a comparison by employing adv-inception-v3 (Adv-Inc-v3) (Tramèr et al. 2018), Ens3-adv-inception-v3 (Inc-v3$_{ens3}$) (Tramèr et al. 2018), Ens4-adv-inception-v3 (Inc-V3$_{ens4}$) (Tramèr et al. 2018), and Ens-adv-inception-resnet-v2 (IncRes-v2$_{ens}$) (Tramèr et al. 2018) as the target models, all of which are adversarially trained. We first employ the selected 1000 images mentioned above to generate corresponding adversarial images on VGG-16 and then to test these generated examples’ attack performance on these four defense models. All these pre-trained models’ parameters can be available from the GitHub repository tf_to_pytorch_model.^{Footnote 6} The results illustrated in Table 4 show that DTA has achieved about 11.32–15.77% attack success rate on these robust models. The baseline methods, Bandits, Rays, TA and CGBA, however, can only obtain 5.74–13.65%, 3.14–7.79%, 2.88–5.19% and 0.77–3.44%, respectively. It implies that the adversarial examples generated by DTA are more prone to attack deep models successfully, even on defense models.

Table 4 Evaluation adversarial robust accuracy (lower is better, $\downarrow$) on defense models

Full size table

Table 5 Evaluation of attacks by DEEPSEC

Full size table

Besides the above comparisons, we are also interested in evaluating the performance of our method and the competitors by using different metrics. DEEPSEC (Ling et al. 2019) is a useful tool for the assessment of adversarial examples, which provides ten evaluation indicators. Specifically, from the perspective of classification outcomes, DEEPSEC provides (1) Misclassification Ratio (MR), (2) Average Confidence of Adversarial Class (ACAC), and (3) Average Confidence of True Class (ACTC). From the perspective of imperceptibility, DEEPSEC provides (1) Average $L_p$ Distortion $ALD_{p}$, including $L_{0}$, $L_{2}$, and $L_{\infty }$, (2) Average Structural Similarity (ASS), and (3) Perturbation Sensitivity Distance (PSD). From the perspective of the robustness of adversarial samples, DEEPSEC provides (1) Noise Tolerance Estimation (NTE), (2) Robustness to Gaussian Blur (RGB), (3) Robustness to Image Compression (RIC), and (4) Computation Cost (CC). We select 7 indicators as the evaluation metrics, as shown in Table 5. In this experiment, we optimize the ResNet-20 model (He et al. 2016b) on CIFAR-10 until the best performance ($\ge 90\%$) on the test set is obtained. Then, 1000 images are selected as the normal examples by DEEPSEC (according to the given instructions). The adversarial examples are generated by Bandits, Rays, TA, CGBA, and DTA. The maximal query number is set to 100. The target model is ResNet-20. Given all generated adversarial examples during the attack, we finally employ DEEPSEC to compute the corresponding metrics. As shown in Table 5, our method is superior to other methods in terms of misclassification rate and robustness by MR (45.98%), ACAC (0.73), ACTC (0.19), PSD (153.63) and NTE (0.51), which reveals that the adversarial examples generated by DTA have stronger attack capabilities and anti-detection capabilities.

Query distribution

To see the advantage of the proposed framework on query number for each attack, we plot the histogram of query numbers used to perform a successful attack in Fig. 3 for CIFAR-10 and SVHN. The test sets of CIFAR-10 and SVHN are used to compute the statistics, while ShuffleNetV2 (Ma et al. 2018) is employed as the target model. The maximal query number is limited to 500. For clearance, each bar denotes how many normal examples yield successful attacks with the times as noted in the x-axis. As observed, in all cases, the proposed DTA can perform a successful attack based on most examples with only ONE time. The average counts of query times by DTA for CIFAR-10 and SVHN are only 17.05 and 38.08 under $\epsilon =16$, respectively. Notably, on ShuffleNetV2, DTA helps 88% and 90% examples to attack successfully within a handful of query times when $\epsilon =8$ and $\epsilon =16$, respectively. On the other hand, Rays and Bandits often require hundreds of queries to perform a successful attack, and a small number of queries (such as $\le 100$) could not allow these methods to work well. As Guo et al. (2019) indicates, the distribution of the histogram is highly right-skewed and hence, the median query count is a more representative aggregate statistic than the average query count. The results show that the median values of our method are only ONE in all cases on CIFAR-10, which sufficiently validates the proposed generative idea on generating adversarial examples.

Transferability

The motivation of the current work states that the generation of adversarial examples is generally based on the assumption of transferability, which is saying that an adversarial example generated according to a model can be used to attack the other different models. To see this assumption is valid for black-box attacks, here, we follow the previous work (Zhao et al. 2020; Dolatabadi et al. 2020) and examine the transferability of the generated adversarial examples across different models on CIFAR-10 and SVHN. Specifically, we select 8 models including ResNet-50 (He et al. 2016b), VGG-16 (Simonyan and Zisserman 2015), VGG-19 (Simonyan and Zisserman 2015), ShuffleNetV2 (Ma et al. 2018), MobileNetV2 (Sandler et al. 2018), InceptionV3 (Szegedy et al. 2016), DenseNet-169 (Huang et al. 2017), and GoogLeNet (Szegedy et al. 2015). Following the settings in Kurakin et al. (2017), we randomly select 1000 images from the test set, which are classified correctly by the model whereas the corresponding adversarial examples are misclassified. The generated adversarial examples are used to attack the other models. For a fair comparison, we set $\epsilon =16$ and the maximal query number to 500 for all cases.

DTA is compared with Bandits, Rays and TA in the untargeted black-box attack settings. The ASR matrix on the two datasets is shown in Fig. 4. The row represents which model is targeted during the generation of adversarial examples (we only preserve these adversarial examples that can attack the target model successfully), while the column represents which model is attacked by the aforementioned generated examples. From this figure, we can see that the transferability ASR on CIFAR-10 of DTA is from 33.6 to 79.6%, while the baseline methods are 11.6–52.0%, 7.5–40.3% and 8.7–52.9%, respectively. It means that the examples generated by DTA produce a higher(about 26.1–27.0% higher in most cases) attack success rate on changed models than those by Bandits and Rays, validating the superior transferability of DTA. This is because baseline methods heavily rely on the feedback of the target model during each query and cannot extract transferable features. By contrast, our method learns the adversarial distribution that does not collapse to a certain model.

Dataset- and model-agnostic attack

To evaluate the performance of DTA on the examples with different semantics and model structures, we first conduct the attack experiments on other datasets than the training ImageNet dataset. Specifically, the test datasets include VOC 2007 (VOC-07) (Everingham et al. 2010), VOC 2012 (VOC-12) (Everingham et al. 2010), Plasces365 (Pla-365) (Zhou et al. 2017), and Caltech101 (Cal-101) (Fei-Fei et al. 2004). The target models include VGG-19 (Simonyan and Zisserman 2015), InceptionV3 (Szegedy et al. 2016), ResNet-152 (He et al. 2016b), and WideResNet-50 (Zagoruyko and Komodakis 2016), all of which are implemented in PyTorch. The attack results are illustrated in Table 6, which shows that the DTA trained on ImageNet is available to generate effective adversarial examples on other datasets without retraining. In certain situations, the attack success rate can exceed 90%, where the maximal query size is limited to 100. To be clear, we do not care about how the ground truth labels of those datasets affect the current DTA, but only calculate the attack success rate by comparing the outputs of the original clean image and the corresponding adversarial counterpart, just as the Evasion Rate (Matachana et al. 2020).

Table 6 The attack success rates on other datasets that are not involved in the DTA training progress

Full size table

We further apply our DTA to attack transformers, which are pretty different from traditional CNN, including ViT-16 (Dosovitskiy et al. 2021), ViT-32 (Dosovitskiy et al. 2021), and Swin-B (Liu et al. 2021). Where the DTA is trained on the collected data pairs from CNNs with noise budget $\epsilon =16$, the empirical results we report in Table 7 show that DTA can obtain 26–41% attack success rate with limited queries. This phenomenon demonstrates that even in attacking transformers, DTA can still generate adversarial examples and achieve an acceptable attack effect on different ViT models. Furthermore, it illustrates the high adaptability of DTA in model-agnostic black-box scenarios.

Table 7 The attack success rate (ASR, %) versus average queries (Avg.Q) of different transformers trained on the ImageNet dataset

Full size table

Ablation study

Loss and hyper-parameters

The proposed method concerns the settings of the MSE loss and the hyper-parameters, such as L and K, which affect the model depth. We examine the influence of these factors on the CIFAR-10 dataset. The target model is the pre-trained VGG-16 (Simonyan and Zisserman 2015). During the attack, the maximal number of queries is limited to 500.

Table 8 The attack performance comparison of DTA optimized with and without the MSE loss, w. means with MSE loss, w.o. means without MSE loss

Full size table

Table 9 TThe attack success rate (ASR(%)) and average query number (Avg.Q) under different K’s, here we fix the flow block L as $L=3$

Full size table

Table 10 The attack success rate (ASR(%)) and average query number (Avg.Q) under different L’s, here we fix the flow steps K as $K=3$

Full size table

First, we evaluate the performance of DTA with and without the MSE loss. If the MSE loss is not used, we mean that the updating step in the 4-th line of Alg. 1 is omitted. The comparison is listed in Table 8, which shows that the flow model could benefit from the MSE loss, yielding notable improvement on both attack success rate and average query number.

Next, we test how model depth or representative capacity affects the attack performance. Two experiments are considered here. In the first one, we fix $L=3$ and examine the influence of K from $\{2, 4, 6, 8\}$. In the second one, we fix $K=2$ and evaluate the performance of L from $\{1, 2, 3, 4\}$. The results are shown in Tables 9 and 10, respectively. As seen, different settings produce a similar performance on both ASR and average query number, which suggests that the attack ability of the proposed model on CIFAR-10 does not benefit from the increasing of the model depth. This may be because the data in CIFAR-10 is simple and hence, we set $K=2$ and $L=2$ in small datasets, e.g., CIFAR-10 and SVHN. But in ImageNet, which contains complex data, we set $K=8$ and $L=5$.

Furthermore, we examine the generalization ability of the adversarial examples generated by DTA, i.e., testing the performance of DTA by involving different numbers of white-box attack models. Specifically, we use ten pre-trained models on ImageNet, including VGG-19 (Simonyan and Zisserman 2015), ResNet-152 (He et al. 2016b), InceptionV3 (Szegedy et al. 2016), DenseNet-201 (Huang et al. 2017), WideResNet-50 (Zagoruyko and Komodakis 2016), VGG-16 (Simonyan and Zisserman 2015), ResNet-101 (He et al. 2016b), MobileNetV2 (Sandler et al. 2018), DenseNet-121 (Huang et al. 2017), and DenseNet-169 (Huang et al. 2017). The first five models are used for generating the training adversarial examples, while the rest are used as the black-box test models to evaluate the attack effect of DTA. In this experiment, we select different numbers of models from the first five ones for example generation, where the results are plotted in Fig. 5. It can be clearly observed that the more models used in performing the attack, the better performance can be obtained on all the target models. This indicates that we can use more models in the process of sample adversarial examples to gain an increased universal success rate on black-box attacks.

Improved performance by shifted means and stds

It is common to sample from a Gaussian distribution during the inverse of a normalized flow model to generate the expected data; however, in this work, we found that if we simply sample from $\mathcal {N}(0,1)$ to obtain the adversarial examples, the attack performance will behave badly. Instead, for the well-trained normalized flow model, we first input the training data to obtain its corresponding latent space $\varvec{z}$, then count the mean $\hat{\varvec{\mu }}$ and variance $\hat{\delta }$ of $\varvec{z}$ and use it as the Gaussian distribution $\hat{\mathcal {N}}(\hat{\varvec{\mu }},\hat{\delta }^2)$ as the adversarial latent space $\hat{\varvec{z}}$ for sampling to enhance the attack performance. We report the empirical results in Table 11, as the results show, equipment the shifted mean $\hat{\varvec{\mu }}$ and std $\hat{\delta }$, the attack success rate are improved by 49.16–59.65%, 35.35–36.72% and 26.38–37.49% over CIRAR-10, SVHN and ImageNet, respectively. These results have demonstrated well that the latent space has been shifted guided by the adversarial-clean example pairs.

Table 11 Attack success rate of with (w.) and without (w.o.) shifted mean and std on three benchmark datasets under noise budget $\epsilon =8$ and $\epsilon =16$

Full size table

Compare with GAN-based

As we declared above, the adversarial and normal examples come from different distributions, which are misaligned but transferable, and we characterize the adversarial distribution with locally collected adversarial examples in a generative manner; more specifically, a conditional normalized flow is involved in learning the transformation in this paper. To verify whether other generative models are qualified for this task well or not, we apply the whole same pipeline to the GAN borrowed from GAP (Poursaeed et al. 2018) and present the attack performance in Fig. 6. As the results show, the GAN can also learn the mapping relationship between these two types of samples, but its attack capability is unsatisfactory; in addition, as the query budget increases, the attack success rate of DTA will increase significantly, while the GAN-based method will not. Again, these results demonstrate the superiority of our proposed conditional likelihood-based DTA method in generating examples belonging to adversarial distributions.

Conclusions

In this paper, we propose a novel hard-label black-box adversarial attack framework based on a generative idea. The motivation states that the public datasets enforce the public models to learn a common distribution, causing that the models exhibit similar vulnerability. Hence, the adversarial distributions of different models could also be similar, which inspires the transferability assumption in many adversarial attack methods. Based on such an assumption, we advocate that there could be a certain mapping from the distribution of normal examples to the distribution of adversarial examples. Along with this, a conditional normalizing flow-based generative model is developed to implement the mapping function. We can optimize the flow model to explicitly correlate the adversarial examples with Gaussian-style hidden representations by collecting a batch of adversarial examples from the existing white-box attacks. To diversify the generation process, the normal examples are fed into the conditions of the probabilistic model. An elaborated generation process helps us to improve the performance of the generated examples. Extensive experiments validate the proposed idea and demonstrate the superiority of DTA on attack success rate, average query number and median query number. Especially, our method can achieve a successful attack within only ONE query, which verifies that we have learned the adversarial distribution. By contrast, the other hard-label methods generally require hundreds of queries to accomplish an attack. We also surprisingly find that the proposed model can perform effective cross-dataset attacks, which means that the model is not sensitive to the label space of the classification task. In summary, this work provides a promising framework with the advantages of low query times, high success rate, and an efficient inference process, which could guide future research on adversarial attacks in a new direction.

Availability of data and materials

The data that support the findings of this study are openly available in at https://www.cs.toronto.edu/kriz/cifar.html, http://ufldl.stanford.edu/housenumbers/, and https://image-net.org/, reference number Krizhevsky and Hinton (2009), Sandler et al. (2018), and Russakovsky et al. (2015).

Notes

References

Akhtar N, Liu J, Mian A (2018) Defense against universal adversarial perturbations. In: CVPR, pp. 3389–3398. https://doi.org/10.1109/CVPR.2018.00357
Ardizzone L, Lüth C, Kruse J, Rother C, Köthe U (2019) Guided image generation with conditional invertible neural networks. CoRR arXiv:abs/1907.02392
Baluja S, Fischer I (2018) Learning to attack: adversarial transformation networks. In: AAAI, pp 2687–2695
Carlini N, Wagner DA (2017) Towards evaluating the robustness of neural networks. In: S &P. https://doi.org/10.1109/SP.2017.49
Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2018) Adversarial attacks and defences: a survey. CoRR arXiv:abs/1810.00069
Chen J, Gu Q (2020) Rays: a ray searching method for hard-label adversarial attack. In: KDD, pp 1739–1747. https://doi.org/10.1145/3394486.3403225
Chen P, Zhang H, Sharma Y, Yi J, Hsieh C (2017) ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: ACM AISec@CCS, pp 15–26. https://doi.org/10.1145/3128572.3140448
Cheng M, Singh S, Chen PH, Chen P, Liu S, Hsieh C (2020) Sign-opt: a query-efficient hard-label adversarial attack. In: ICLR
Ding J, Xu Z (2020) Adversarial attacks on deep learning models of computer vision: a survey. ICA3PP 12454:396–408. https://doi.org/10.1007/978-3-030-60248-2_27
Article Google Scholar
Dinh L, Krueger D, Bengio Y (2015) NICE: non-linear independent components estimation. In: ICLR
Dolatabadi HM, Erfani SM, Leckie C (2020) Advflow: inconspicuous black-box adversarial attacks using normalizing flows. In: NeurIPS
Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, Li J (2018) Boosting adversarial attacks with momentum. In: CVPR. https://doi.org/10.1109/CVPR.2018.00957
Dong Y, Cheng S, Pang T, Su H, Zhu J (2022) Query-efficient black-box adversarial attacks guided by a transfer-based prior. IEEE Trans Pattern Anal Mach Intell 44(12):9536–9548. https://doi.org/10.1109/TPAMI.2021.3126733
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR
Duan R, Ma X, Wang Y, Bailey J, Qin AK, Yang Y (2020) Adversarial camouflage: Hiding physical-world attacks with natural styles. In: CVPR. https://doi.org/10.1109/CVPR42600.2020.00108
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song, D (2018) Robust physical-world attacks on deep learning visual classification. In: CVPR. https://doi.org/10.1109/CVPR.2018.00175
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: CVPR, p 178. https://doi.org/10.1109/CVPR.2004.383
Feng Y, Wu B, Fan Y, Liu L, Li Z, Xia S (2022) Boosting black-box attack with partially transferred conditional adversarial distribution. In: CVPR, pp 15074–15083. https://doi.org/10.1109/CVPR52688.2022.01467
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: ICLR
Guo C, Rana M, Cissé M, van der Maaten L (2018) Countering adversarial images using input transformations. In: ICLR
Guo C, Gardner JR, You Y, Wilson AG, Weinberger KQ (2019) Simple black-box adversarial attacks. ICML 97:2484–2493
Google Scholar
Guo F, Sun Z, Chen Y, Ju L (2023) Towards the universal defense for query-based audio adversarial attacks on speech recognition system. Cybersecurity 6(1):1–18
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016a) Identity mappings in deep residual networks. ECCV 9908:630–645
Google Scholar
He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: CVPR, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Huang Z, Zhang T (2020) Black-box adversarial attack with transferable model-based embedding. In: ICLR
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: CVPR, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Ilyas A, Engstrom L, Athalye A, Lin J (2018) Black-box adversarial attacks with limited queries and information. ICML 80:2142–2151
Google Scholar
Ilyas A, Engstrom L, Madry A (2019) Prior convictions: black-box adversarial attacks with bandits and priors. In: ICLR
Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1x1 convolutions. In: NeurIPS, pp 10236–10245
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, technical report 1
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. In: Handbook of systemic autoimmune diseases, vol 1, no 4
Kurakin A, Goodfellow IJ, Bengio S (2017) Adversarial examples in the physical world. In: ICLR
Li Y, Li L, Wang L, Zhang T, Gong B (2019) NATTACK: learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. ICML 97:3866–3876
Google Scholar
Ling X, Ji S, Zou J, Wang J, Wu C, Li B, Wang T (2019) DEEPSEC: a uniform platform for security analysis of deep learning model. In: S &P, pp 673–690. https://doi.org/10.1109/SP.2019.00023
Liu A, Liu X, Fan J, Ma Y, Zhang A, Xie H, Tao D (2019a) Perceptual-sensitive GAN for generating adversarial patches. In: AAAI. https://doi.org/10.1609/aaai.v33i01.33011028
Liu R, Liu Y, Gong X, Wang X, Li H (2019b) Conditional adversarial generative flow for controllable image synthesis. In: CVPR, pp 7992–8001. https://doi.org/10.1109/CVPR.2019.00818
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp 9992–10002
Liu P, Xu X, Wang W (2022) Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. Cybersecurity 5(1):1–19
Article Google Scholar
Lu Y, Huang B (2020) Structured output learning with conditional generative flows. In: AAAI, pp 5005–5012
Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet V2: practical guidelines for efficient CNN architecture design. ECCV 11218:122–138. https://doi.org/10.1007/978-3-030-01264-9_8
Article Google Scholar
Ma C, Guo X, Chen L, Yong J, Wang Y (2021) Finding optimal tangent points for reducing distortions of hard-label attacks. In: NeurIPS, pp 19288–19300
Madaan D, Shin J, Hwang SJ (2020) Adversarial neural pruning with latent vulnerability suppression. ICML 119:6575–6585
Google Scholar
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: ICLR
Matachana AG, Co KT, Muñoz-González L, Martínez-Rego D, Lupu EC (2020) Robustness and transferability of universal attacks on compressed models. CoRR arXiv:abs/2012.06024
Mirsky Y (2023) Ipatch: a remote adversarial patch. Cybersecurity 6(1):18
Article Google Scholar
Mirza M, Osindero S (2014) Conditional generative adversarial nets. CoRR arXiv:abs/1411.1784
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning
Poursaeed O, Katsman I, Gao B, Belongie SJ (2018) Generative adversarial perturbations. In: CVPR, pp 4422–4431
Pumarola A, Popov S, Moreno-Noguer F, Ferrari V (2020) C-flow: conditional generative flow models for images and 3d point clouds. In: CVPR, pp 7946–7955. https://doi.org/10.1109/CVPR42600.2020.00797
Reza MF, Rahmati A, Wu T, Dai H (2023) Cgba: curvature-aware geometric black-box attack. In: ICCV, pp 124–133
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Li F (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR arXiv:abs/1801.04381
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: NeurIPS, pp 3483–3491
Sun L, Tan M, Zhou Z (2018) A survey of practical adversarial example attacks. Cybersecurity 1:1–9
Article Google Scholar
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2014) Intriguing properties of neural networks. In: ICLR
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp 4278–4284
Tramèr F, Kurakin A, Papernot N, Goodfellow IJ, Boneh D, McDaniel PD (2018) Ensemble adversarial training: attacks and defenses. In: ICLR
Tu C, Ting P, Chen P, Liu S, Zhang H, Yi J, Hsieh C, Cheng S (2019) Autozoom: autoencoder-based zeroth order optimization method for attacking black-box neural networks. In: AAAI, pp 742–749. https://doi.org/10.1609/aaai.v33i01.3301742
Wang H, Yu C (2019) A direct approach to robust deep learning using adversarial networks. In: ICLR
Wang J, Chang X, Wang Y, Rodríguez RJ, Zhang J (2021) Lsgan-at: enhancing malware detector robustness against adversarial examples. Cybersecurity 4:1–15
Article Google Scholar
Wang X, Zhang Z, Tong K, Gong D, He K, Li Z, Liu W (2022) Triangle attack: a query-efficient decision-based adversarial attack. ECCV 13665:156–174. https://doi.org/10.1007/978-3-031-20065-6_10
Article Google Scholar
Wu H, Liu AT, Lee H (2020) Defense for black-box attacks on anti-spoofing models by self-supervised learning. In: INTERSPEECH, pp 3780–3784. https://doi.org/10.21437/Interspeech.2020-2026
Xiao C, Li B, Zhu J, He W, Liu M, Song D (2018) Generating adversarial examples with adversarial networks. In: IJCAI, pp 3905–3911. https://doi.org/10.24963/ijcai.2018/543
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: BMVC
Zhang Y, Li Y, Liu T, Tian X (2020) Dual-path distillation: a unified framework to improve black-box attacks. ICML 119:11163–11172
Google Scholar
Zhao P, Chen P, Wang S, Lin X (2020) Towards query-efficient black-box adversary with zeroth-order natural gradient descent. In: AAAI, pp 6909–6916
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40:1452–1464
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work is supported in part by the National Natural Science Foundation of China under Grant 62162067, 62101480 and 62362068, Research and Application of Object Detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202305AF150078 and the Scientific Research Fund Project of Yunnan Provincial Education Department under 2023Y0249.

Author information

Authors and Affiliations

Yunnan University, Kunming, China
Renyang Liu, Wei Zhou, Xin Jin & Song Gao
Nanyang Technological University, Singapore, Singapore
Renyang Liu
Kunming Institute of Physics, Kunming, China
Yuanyu Wang
Alibaba Group, Beijing, 100102, China
Ruxin Wang

Authors

Renyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Song Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruxin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RL: conceptualization; methodology; writing-original draft. WZ: validation; supervision; funding acquisition. XJ: investigation; software. SG: visualization; investigation; formal analysis. YW: data curation; resources. RW: writing—review and editing; project administration, funding acquisition.

Corresponding author

Correspondence to Ruxin Wang.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, R., Zhou, W., Jin, X. et al. DTA: distribution transform-based attack for query-limited scenario. Cybersecurity 7, 8 (2024). https://doi.org/10.1186/s42400-023-00197-2

Download citation

Received: 10 September 2023
Accepted: 27 November 2023
Published: 02 April 2024
DOI: https://doi.org/10.1186/s42400-023-00197-2

DTA: distribution transform-based attack for query-limited scenario

Abstract

Introduction

Related work

Black-box attack

Adversarial attacks using generative models

Preliminary

Adversarial attack

Normalizing flow

Conditional normalizing flow

Methodology

The DTA framework

Conditional normalizing flow for attack

Adversarial data collection

Training details

Generation of adversarial examples

Experiments

Settings

Quantitative comparison with the state-of-the-arts

Query distribution

Transferability

Dataset- and model-agnostic attack

Ablation study

Loss and hyper-parameters

Improved performance by shifted means and stds

Compare with GAN-based

Conclusions

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords