 Research
 Open access
 Published:
DTA: distribution transformbased attack for querylimited scenario
Cybersecurity volume 7, Article number: 8 (2024)
Abstract
In generating adversarial examples, the conventional blackbox attack methods rely on sufficient feedback from the tobeattacked models by repeatedly querying until the attack is successful, which usually results in thousands of trials during an attack. This may be unacceptable in real applications since Machine Learning as a Service Platform (MLaaS) usually only returns the final result (i.e., hardlabel) to the client and a system equipped with certain defense mechanisms could easily detect malicious queries. By contrast, a feasible way is a hardlabel attack that simulates an attacked action being permitted to conduct a limited number of queries. To implement this idea, in this paper, we bypass the dependency on the tobeattacked model and benefit from the characteristics of the distributions of adversarial examples to reformulate the attack problem in a distribution transform manner and propose a distribution transformbased attack (DTA). DTA builds a statistical mapping from the benign example to its adversarial counterparts by tackling the conditional likelihood under the hardlabel blackbox settings. In this way, it is no longer necessary to query the target model frequently. A welltrained DTA model can directly and efficiently generate a batch of adversarial examples for a certain input, which can be used to attack unseen models based on the assumed transferability. Furthermore, we surprisingly find that the welltrained DTA model is not sensitive to the semantic spaces of the training dataset, meaning that the model yields acceptable attack performance on other datasets. Extensive experiments validate the effectiveness of the proposed idea and the superiority of DTA over the stateoftheart.
Introduction
The recent progress in machine learning reveals a critical problem of deep neural networks (DNNs), which states that most of DNNs are vulnerable to adversarial examples, i.e., being misled by particular examples corrupted by human imperceptible noise (Szegedy et al. 2014; Goodfellow et al. 2015; Kurakin et al. 2017; Dong et al. 2018). Such an unrobust property and its inexplicability have attracted extensive research attention that was devoted to improving the model’s robustness and AI security. While most of the existing studies focus on adversarial attacks in a synthesizing way, i.e., the adversarial examples are generated by directly modifying the pixels of digital images, certain trials have shown that it is possible to attack an AI system physically (Duan et al. 2020; Liu et al. 2022). A typical scenario is autonomous driving, where the driving system relies on deep learningbased techniques to identify traffic signs or other road information for accurate driving decisions. The studies by Liu et al. (2019a) and Eykholt et al. (2018) show that the welldesigned disturbances imposed on the traffic signs can easily deceive the recognition module in the driving system, bringing a significant threat to people lives and properties.
While various defense methods for adversarial attacks are constantly being proposed (Akhtar et al. 2018; Wang et al. 2021; Madaan et al. 2020; Zhang et al. 2020; Guo et al. 2023), more powerful attack methods (Carlini and Wagner 2017; Sun et al. 2018; Mirsky 2023) are emerging increasingly and have been able to fight against those defense methods. This attackdefense game will continue along with the development of deep learning and modern AI systems.
The literature on adversarial attacks can be grouped into two classes: whitebox and blackbox attacks. The whitebox attack conveys the case that the details of the target model, such as the structure and the parameters, are known before designing the attack method. In contrast, the blackbox setting states that the model details are inaccessible, but only the hardlabel or the label probability returned by the target model concerning a specific input can be obtained via a queryingbased attack. Clearly, the blackbox attack is more feasible than the whitebox attack in real applications since the technical details of an online artificial intelligence system are invisible to the public in a general sense, especially for the hardlabel setting.
A typical option for the attackers in the blackbox setting is to use thousands of queries to collect enough feedback for optimizing the adversarial example iteratively, which is called the optimizationbased attack. Nevertheless, the problem here is that the queryingandoptimizing process could result in massive consumption of computing resources and time (Guo et al. 2019; Tu et al. 2019), which results in an inefficient way to perform a successful attack. On the other hand, an advanced AI system could be equipped with certain defense mechanisms that resist intentional attacks (Wu et al. 2020), a good case in point is the Google Cloud Vision API (GCV).^{Footnote 1} In this case, too many trials of attacking can be easily detected by the system. Hence, all these conditions dramatically limit the applicability of the blackbox attack and pose the necessity of a querylimited hardlabel attack strategy in practical applications.
Besides, the optimizationbased attack methods target overly fitting the adversarial examples on the target model so as to achieve a high attack performance. We empirically find that those generated adversarial examples have low transferability and cannot attack other target models effectively. This limits the possibility of exploring the crossmodel knowledge about adversarial robustness.
To solve the problems discussed above, in this paper, we formulate the synthesis of adversarial examples in a distribution transform manner. We advocate that the adversarial distribution and the normal distribution are misaligned but transferable. The common parts during transfer can be wellconditioned on the input example itself. The misaligned parts are enriched by employing a generative model that recovers the distributions based on random noise and conditions. Assuming that the vulnerability of different deep models exhibits similar effects, we can reasonably collect many adversarial examples from the existing attack methods, which are then used to characterize the adversarial distribution. In this way, it is possible to optimize the generative model that synthesizes the adversarial examples in a statistical pipeline. As a result, the model can generate batches of examples for attacking without too many queries. To be clear, this advantage benefits from the distribution of the existing adversarial examples and their transferability, which are encoded by the generative model. It is also allowed to apply the attack model on a different data source that has not been involved in training. In the implementation, we develop a conditional normalizing flowbased model to achieve the above goal. The main contributions of this paper can be summarized as follows:

We formulate the blackbox attack problem as a generative framework from the perspective that the adversarial distribution can be translated from the normal distribution under certain conditions. Within this perspective, the adversarial examples are transferable across different models and different image contents.

We develop a conditional normalizing flowbased attack method (DTA) that simulates the transformation from the normal distribution to the adversarial distribution. Unlike the existing blackbox methods, which need thousands of queries, DTA significantly reduces the query times during an attack while achieving an acceptable attack success rate. Notably, DTA requires only ONE query to perform a successful attack in most cases.

The proposed DTA can generate adversarial examples with high transferability to different blackbox models. The welltrained model is not sensitive to the semantic spaces of the training dataset, and we empirically demonstrate that the model trained on ImageNet can be used to generate effective adversarial examples on other datasets.

Extensive evaluations on blackbox attacks show that the proposed DTA beats the stateoftheart hardlabel attacks in the aspects of attack success rate, query times and transferability, which demonstrate the validity of the proposed DTA in the adversarial attack.
The rest of this paper is organized as follows. We briefly review the methods relating to adversarial attacks in “Related work” section. In “Preliminary” section, we provide the preliminaries of adversarial attack and normalizing flow. “Methodology” section introduces the details of the proposed DTA framework. The experiments are presented in “Experiments” section, with the conclusion drawn in “Conclusions” section.
Related work
In this section, we briefly review the most relevant methods to the current work. For comprehensive literature on adversarial attacks (including whitebox and blackbox), please refer to Ding and Xu (2020), Chakraborty et al. (2018).
Blackbox attack
A typical case of adversarial attack is the blackbox setting that concerns the practice in real applications. Due to the limited information about the target model, the blackbox attack is more difficult than the whitebox one and receives limited attention from the community. The rationale of most existing methods is the transferability of the adversarial examples across models, which allows the examples generated using the whitebox methods to attack the blackbox models. For example, the integrated adversarial training method proposed by Tramèr et al. (2018) and the image transformation method proposed by Guo et al. (2018) could effectively carry out the transfer attack.
The ZOO attack proposed by Chen et al. (2017) was one of the earliest blackbox attack methods based on queries, which employed the zeroorder optimization to construct a zerostep estimator by querying a target model and then, used the estimated gradient to minimize the Carlini and Wagner (C &W) loss (Carlini and Wagner 2017) to find adversarial examples. Ilyas et al. (2018) employed the normal distribution search density to estimate the gradient of the DNN classifier F(x) and adopted the projected gradient descent method to minimize the loss of generating adversarial examples. Instead of minimizing the target of adversarial example generation, \(\mathcal {N}\) ATTACK (Li et al. 2019) tried to fit the distribution around the clean data, which was followed by the adversarial examples. In another work, Ilyas et al. (2019) observed that the gradient used by PGD showed a high correlation in time and data and then, used the slot machine optimization techniques to integrate the prior knowledge about gradients into the attack, thus proposing a method called Bandits & Priors which reduced the number of queries during an attack.
Adversarial attacks using generative models
The existing adversarial attack methods based on generative models generally rely on the generative adversarial network (GAN), which is used to synthesize adversarial examples (Baluja and Fischer 2018; Wang and Yu 2019; Huang and Zhang 2020). Most of these methods focus on the whitebox attack, where the gradient of the target model is required to update the parameters of GAN. In the blackbox setting, a surrogate model is used to approximate the output of the target model, which also drives the gradients of the former to approximate that of the latter, such that the optimized model has similar vulnerability to the target model (Huang and Zhang 2020; Xiao et al. 2018). The previous works which synthesize adversarial examples by the Normalizing Flow model are AdvFlow (Dolatabadi et al. 2020) and \(\mathcal{C}\mathcal{G}\)Attack (Feng et al. 2022). AdvFlow first map the input image to a hidden representation by the pretrained Flow model and find a suitable disturbance in the hidden space, which then uses the natural evolution strategies (NES) to optimize the most helpful disturbance in an iterative updating manner. While \(\mathcal{C}\mathcal{G}\)Attack training a conditional Flow model (i.e., cGlow Lu and Huang 2020) relies on the local white box models with an additional adv loss first and then carries blackbox attack with this welltrained flow model. Note that both AdvFlow and \(\mathcal{C}\mathcal{G}\)Attack are also querybased and need the models’ whole outputs (softlabel) for attacking, which requires many queries or the more detailed outputs from the target model for performing a successful attack and is limited to attacking the physically deployed blackbox models that only return the true label.
The discussion above shows that the existing blackbox attack methods mostly require thousands of queries and more detailed outputs on the target model to estimate the gradient and then carry out the attack iteratively to obtain a compelling adversarial example. In this situation, the attack is inefficient and impractical, while the time and computational consumption could be very considerable. In addition, the transferability of the adversarial examples obtained by querying and optimization is often limited; in other words, the generated adversarial examples are overly fit on the target model and are unqualified to attack other target models. When considering the attack under different datasets, the existing methods possess very limited capability to perform a successful attack. However, both the crossmodel attack ability and the crossdataset attack ability are sometimes valuable for real applications, especially when we do not have many chances to perform attack trials.
Therefore, the blackbox attack poses the request for a method that is direct, efficient, and effective to perform attacks for different models and different datasets within limited queries and information. To achieve this goal, we know from the previous studies that the adversarial examples have a particular distribution related to the normal examples, and learning from such an adversarial distribution could help us to explore the vulnerability of different models. Hence, we are well motivated to develop a generative model that transfers from (or conditioned on) the distribution of clean examples to adversarial ones. It is also possible to achieve crossdataset attacks by involving an increasing number of adversarial examples during offline learning.
Preliminary
Before introducing the details of the proposed framework, in this section, we first present the preliminary knowledge about adversarial attacks and normalizing flows.
Adversarial attack
Given a welltrained DNN classifier f and a correctly classified input \((\varvec{x},y) \sim D\), we have \(f(\varvec{x})=y\), where D denotes the accessible dataset. The adversarial example \(\varvec{x}'\) is a neighbor of \(\varvec{x}\) and satisfies that \(f(\varvec{x} ') \ne y\) and \(\left\ \varvec{x}'  \varvec{x} \right\ _p \le \epsilon\), where the \(\ell _p\) norm is used as the metric function and \(\epsilon\) is usually a small value such as 8 and 16 with the image intensity [0, 255]. With this definition, the problem of finding an adversarial example becomes a constrained optimization problem:
where \(\ell\) stands for a loss function that measures the confidence of the model outputs.
In the optimizationbased methods, the above problem is solved by computing the gradients of the loss function in Eq. 1 to generate the adversarial example. By contrast, in this work, we formulate a statistical transformation from \(P(\varvec{x})\) to \(P(\varvec{x}')\) instead of involving an online optimization process.
Normalizing flow
The normalizing flows (Dinh et al. 2015; Kingma and Dhariwal 2018) are a class of probabilistic generative models, which are constructed based on a series of completely reversible components. The reversible property allows to transform from the original distribution to a new one and vice versa. By optimizing the model, a simple distribution (such as Gaussian distribution) can be transformed into a complex distribution of real data. The training process of normalizing flows is indeed an explicit likelihood maximization. Considering that the model is expressed by a fully invertible and differentiable function which transfers a random vector \(\varvec{\varvec{z}}\) from the Gaussian distribution to another vector \(\varvec{x}\), we can employ such a model to generate high dimensional and complex data.
Specifically, given a reversible function \({f:} \ \mathbb {R}^d\rightarrow \mathbb {R}^d\) and two random variables \(\varvec{z}\sim p(\varvec{z})\) and \(\varvec{z}'\sim p(\varvec{z}')\) where \(\varvec{z}' = f(\varvec{z})\), the change of variable rule tells that
where det denotes the determinant operation. The above equation follows a chaining rule, in which a series of invertible mappings can be chained to approximate a sufficiently complex distribution, i.e.,
where each f is a reversible function called a flow step. Equation 4 is the shorthand of \(f_K(f_{k1}(...f_1(\varvec{x})))\). Assuming that \(\varvec{x}\) is the observed example and \(\varvec{z}\) is the hidden representation, we write the generative process as
where \(f_{\theta }\) is the accumulate sum of all f in Eq. 4. Based on the changeofvariables theorem, we write the logdensity function of \(\varvec{x}=\varvec{z}_K\) as follows:
where we use \(\varvec{z}_k=f_k(\varvec{z}_{k1})\) implicitly. The training process of normalizing flow is minimizing the above function, which exactly maximizes the likelihood on the observed training data. Hence, the optimization is stable and easy to implement.
Conditional normalizing flow
In certain cases, the transformation between distributions is conditioned on external variables, for example a face is conditioned on age, gender, expression, etc. This has already been considered in the generative models such as CVAE (Sohn et al. 2015) and CGAN (Mirza and Osindero 2014). In the flowbased models, the conditional normalizing flows allow us to involve the conditions in each flow step. Specifically, the reversible function f accepts both the input variable \(\varvec{z}\) and the condition variable c as inputs, which is formally expressed as \(\varvec{z}'=f(\varvec{z};c)\), while the inverse mapping is \(\varvec{z}=f^{1}(\varvec{z}';c)\). By denoting the Kth flow step as \(f_K\), the change of variables theorem says that
Given a welltrained flow model, we first sample \(\varvec{z}_0\) from the Gaussian distribution and then perform a forward flow as
If we are interested in computing the probability density of an observed example \(\varvec{x}\), the inverse mapping is expressed as
Methodology
In this section, we introduce the whole framework of the proposed adversarial attack in the generative manner, and the details of the model learning and inference.
The DTA framework
Recall that the conventional attack methods generate the adversarial perturbation by performing a complex inference based on the target model, which is then added to the original example, resulting in the final adversarial example. This process is highly dependent on the inference result, which yields heavy computational cost and generally produces a single “optimal” example according to certain criteria. By contrast, in this paper, we start from a novel perspective and propose a novel generative adversarial attack method, which is called the distribution transformbased attack (DTA). Specifically, we advocate that all adversarial examples could follow a certain distribution that is misaligned with the normal distribution. This is mainly caused by the fixed training data involved in optimizing different deep models. In other words, the training data characterize a fixed distribution that is approximated by those models during training and hence, the distribution of the unseen data in training is common as well to the models. This explains why we consider that the adversarial examples (most of which are unseen data in training) follow a misaligned distribution. At this point, we reasonably assume a transformation from the distribution of normal examples to the distribution of adversarial examples. Since those two types of data exhibit similar appearances, the two distributions ideally overlap with each other and can be transformed mutually.
The whole framework of the proposed method is illustrated in Fig. 1. Based on the above discussion, we propose to collect a large number of adversarial examples \(\varvec{X}'\) by employing the existing whitebox attack methods. While these examples look similar to the normal examples \(\varvec{X}\), a direct transformation between these two types of examples is nevertheless difficult or even prohibitive. This is because the small perturbation could be overwhelmed by the complex structures and textures in the normal example, and is therefore insensitive to the generation model. To alleviate this issue, we consider that the small perturbations should be conditioned on the normal inputs, which provide cues in the generative process. Specifically, the conditional normalizing flow is employed to implement the conditional generation process, which allows synthesizing the adversarial example based on the normal example and a random variable (Lu and Huang 2020; Pumarola et al. 2020; Liu et al. 2019b). The random variable could diversify the generated example, that is, when the flow model is well trained, we can randomly sample in the latent space \(\varvec{Z}\) to generate a batch of adversarial examples, which are inferenced forwardly by the flow model. The details of the flow model and the training and inference processes are discussed in the following sections.
Conditional normalizing flow for attack
To implement a powerful normalizing flow that has a strong ability of processing image textures, we employ the basic GLOW model (Kingma and Dhariwal 2018), which involves the convolutional operation, the coupling operation, and the normalization operation in the model construction. Since the original GLOW model does not consider conditions in the probabilistic modeling, we follow the work in Ardizzone et al. (2019), Lu and Huang (2020) to properly integrate the image content in conditions. The architecture of the flow model is illustrated in Fig. 2. As seen, a basic flow step is a stack of the Actnorm layer, the 1x1 convolutional layer, and the affine coupling layer. A single flow block is constructed by cascading a squeeze layer, K flow steps, and a split layer. Then, the whole architecture is built up by repeating the flow block for \(L1\) times, followed by the final layers, which consist of a squeeze layer and K flow steps. The details of the Actnorm layer, the 1x1 convolutional layer, the affine coupling layer, and the squeeze and split layers can be found from GLOW (Kingma and Dhariwal 2018). Regarding the conditions involved in each layer, it is proved that the original image is unsuitable to be directly fed to the condition. This is because the original image provides very lowlevel features which are insufficient for feature modeling and can burden the subnetworks in the affine coupling layer. Instead, highlevel features are preferable. Hence, we follow the options in Ardizzone et al. (2019), Lu and Huang (2020), which suggests employing a pretrained deep model to extract highlevel features that are used as the condition. Specifically, we use the VGG19 model pretrained on CIFAR10, SVHN, and ImageNet, respectively, and extract the features from the last conv layers. It is also possible to replace the VGG19 model with other proper choices. During model training, the VGG19 model can be fixed or optimized jointly with the flow model. In the current work, we fix this feature extraction model for simplicity.
Adversarial data collection
Recall that the adversarial examples obtained by using the existing whitebox attack methods play a key role in the proposed framework. Hence, regarding how these examples are obtained, we present the details here instead of in the experiment section.
The concerned datasets in the current work include CIFAR10 (Krizhevsky et al. 2009), SVHN (Netzer et al. 2011), and ImageNet (Russakovsky et al. 2015), while the tobeattacked models are also trained on these datasets. Specifically, the training sets of CIFAR10 and SVHN are selected, while for ImageNet, we choose about 30,000 images from the validation set. All these data are used as normal examples by the whitebox attack methods to generate adversarial examples. On CIFAR10, the PGD method (Carlini and Wagner 2017) is employed as the attacker, whereas the pretrained ResNet50 is employed as the target model. The MIFGSM method based on multimodel integration (Dong et al. 2018) is employed on SVHN and ImageNet. For SVHN, ResNet50 (He et al. 2016b), InceptionV3 (Szegedy et al. 2016), and SeNet18 (Hu et al. 2018) are integrated, while the models are modified versions of the public ones^{Footnote 2} and trained from scratch. For ImageNet, InceptionV4 (Szegedy et al. 2017), InceptionResnetV2 (Szegedy et al. 2017), and ResNetV2101 (He et al. 2016a) are integrated, while the models are pretrained and publicly accessible.^{Footnote 3} The adversarial examples are generated under two perturbation levels, including \(\epsilon = 8\) and \(\epsilon = 16\). The other hyperparameter settings of the attack methods follow the respective papers (Madry et al. 2018; Guo et al. 2019; Dolatabadi et al. 2020). In this way, we collect a batch of adversarial examples that will be used to optimize the proposed flow model. Note that the generated adversarial examples on a certain dataset are used to train the flow model that attacks the target model of the corresponding dataset. The crossdataset attack is not applicable.
To make a fair comparison in experiments, the normal examples for test are different from those for training. Specifically, for CIFAR10 and SVHN, the test sets are employed as the input examples. For ImageNet, we randomly select 1000 images from the validation set, which are completely different from the 30,000 ones mentioned above.
Training details
As introduced in “Conditional normalizing flow” section, the training of the conditional normalizing flow is to maximize the likelihood function on the training data with respect to the model parameters. Formally, assume that the collected adversarial example is denoted by \(\varvec{x}'\sim \varvec{X}'\). The normal example is denoted by \(\varvec{x}\sim \varvec{X}\), where the condition network produces the features as \(c(\varvec{x})\) (short for c). The hidden representation follows the Gaussian distribution, i.e. \(\varvec{z} \sim \mathcal {N}(0,1)\). The flow model is denoted by f, parameterized \(\theta\), which have \(\varvec{x}'=f_\theta (\varvec{z};c)\) and \(\varvec{z}=f^{1}(\varvec{x}';c)\). Then, the loss function to be minimized is expressed as
where the righthand side of the above equation can be expanded layerwisely according to Eq. 7. By optimizing the above objective, the learned distribution \(p(\varvec{x}'\varvec{z};c,\theta )\) characterizes the adversarial distribution as expected.
Considering that the interested task here is to generate an adversarial example that has a similar appearance to the example fed into the condition. Hence, we must ensure that the generation process from \(\varvec{z}\) to \(\varvec{x}'\) would bring no surprising result. To implement this, we impose an MSE loss in the training process. Specifically, the difference between the generated adversarial example \(\varvec{x}'\) and the original input \(\varvec{x}_{tr}^{'}\) is minimized according to
where \(\varvec{z}\) is randomly sampled from the Gaussian distribution in each training iteration.
Note that the above losses in Eqs. 10 and 11 consider the supervision in different spaces, where the former computes the loss in the hidden space while the latter concerns the adversarial space. Optimizing the losses simultaneously can bring unexpected effects since the loss propagation directions are conflicting. Hence, we propose to perform backpropagation based on the two losses alternatively. To be clear, in each iteration, we first update the model parameters based on Eq. 10. Then, given the input batch just used which contains \(\varvec{x}\), we randomly sample a batch of \(\varvec{z}\) and perform a forward flow to generate a batch of \(\varvec{x}'\). The MSE loss between \(\varvec{x}'\) and \(\varvec{x}_{tr}^{'}\) is computed to update the model parameters, followed by the next iteration.
In the training process, we use the Adam algorithm to optimize the model parameters, while the learning rate is set as \(10^{4}\), the momentum is set to 0.999, and the maximal iteration number is 10,000.
Generation of adversarial examples
Given a welltrained flow model \(f_\theta\), the hidden representations of the collected adversarial examples are expected to follow the assumed Gaussian distribution \(\mathcal {N}(0, 1)\). But in practice, we find that these representations have shifted mean and standard deviation (std) values. This may be because the training data is insufficient. We may consider that the involved MSE loss could bias the center of the Gaussian distribution, but experiments tell that the shift occurs even without the MSE loss. Based on this observation, we also surprisingly find that sampling \(\varvec{z}\) based on the shifted mean and std values can bring improved performance than sampling from \(\mathcal {N}(0, 1)\). Hence, before generating adversarial examples, we compute the hidden representations of all the training adversarial examples, which are used to calculate the mean value \(\varvec{\mu }\) and the std value \(\sigma\), resulting in a new distribution \(\mathcal {N}(\varvec{\mu }, \sigma ^2)\).
To generate an adversarial example, given an input normal example \(\varvec{x}\), we first randomly sample \(\varvec{z}\) from \(\mathcal {N}(\varvec{\mu }, \sigma ^2)\) and then perform a forward process via \(\varvec{x}_{gen}=f_\theta (\varvec{z}; c(\varvec{x}))\). For the fairness of comparison, we follow the existing attack methods which constrain the perturbation within a certain range. Once we obtain the adversarial example \(\varvec{x}_{gen}\), we employ the clip function
to ensure the imperceptible property of the perturbation, where \(\epsilon\) is the acceptable noise budget during the attack. Two common cases are considered, including \(\epsilon =8\) and \(\epsilon =16\) for the pixel value \(\in [0,255]\) (it will be scaled to \(\epsilon =8/255.\) and \(\epsilon =16/255.\) as the pixel value \(\in [0,1]\) in code implementation).
The whole algorithm of DTA is listed in Alg. 1, which could help readers to reimplement our method stepbystep.
Experiments
In this section, we evaluate the performance of the proposed DTA on blackbox adversarial attacks through extensive experiments and comparisons.
Settings
As mentioned previously, three popular datasets are considered, including CIFAR10 (Krizhevsky and Hinton 2009), SVHN (Netzer et al. 2011), and ImageNet (Russakovsky et al. 2015).
Regarding the target models to be attacked, we employ the public models pretrained on the corresponding datasets or the models that are trained from scratch if not publicly accessible. Specifically, we mainly target models include the VGG16 (Simonyan and Zisserman 2015), the MobileNetV2 (Sandler et al. 2018), and the ShuffleNetV2 (Ma et al. 2018). For CIFAR10 and ImageNet, we use their pretrained weights from the GitHub repository pytorchcifarmodels^{Footnote 4} and the PyTorch,^{Footnote 5} respectively. While for SVHN, we trained these models from scratch, where the training process of each model is stopped until the best performance is obtained, in which condition the classification accuracy on the test set is above 90%.
To objectively evaluate the performance of the proposed framework, we make a comparison with the related stateoftheart decisionbased (hardlabel) methods, including Bandits (Ilyas et al. 2019), SignOPT (Cheng et al. 2020), Rays (Chen and Gu 2020), Tangent Attack (Tangent) (Ma et al. 2021), Triangle Attack (TA) (Wang et al. 2022) and CGBA (Reza et al. 2023). The implementations of these methods are based on the released codes with default settings in the corresponding papers. The proposed DTA is implemented by using the PyTorch framework. To make a quantitative comparison, we use the metrics of attack success rate (ASR), average query count and median query count as the previous works use (Chen and Gu 2020; Dong et al. 2022).
All the experiments are conducted on a GPU server with a single Tesla V100 32GB GPU, 2 x Xeon Silver 4208 CPU, and RAM 256GB.
Quantitative comparison with the stateofthearts
Evaluation on ASR and query times: Recall that the proposed DTA aims to lower the query times while maintaining a pleasing attack success rate. The success rate with sufficiently high query times may reach a certain bound, but this is not the scope of the current work. Hence, we make the comparison under a set of limited queries by setting the maximal number of queries to 100, 200, 300, 400, and 500. The selected competitors are all hardlabel attacks. Thus, an attack is successful only within the predefined query number and otherwise, failure occurs. The comparisons on CIFAR10 under \(\epsilon = 8\) and \(\epsilon = 16\) are shown in Table 1, while the results on SVHN are listed in Table 2. It can be seen that DTA achieves higher attack success rates than the competitors in most cases, which validates that the proposed generative model can synthesize effective adversarial examples. It should be especially noted that the average query number required by DTA is much smaller than that required by the other methods.
The experiment on ImageNet poses a challenging case for our endtoend adversarial example generation since the data is much more complex than CIFAR10 and SVHN. The results on ImageNet are listed in Table 3, where we consider the perturbation level of \(\epsilon =16\). The maximal number of queries is limited to 100, 200, 300, 400, and 500. We see that our method is superior to baselines on all metrics and is competitive with Rays on ASR in most instances. Again, DTA requires a very limited number of queries to perform a successful attack. Considering the attack performance in querylimited scenarios, we report the empirical results of Bandits, Rays, TA, and CGBA in the following sections. Besides, we prefer to report the results under the noise budget \(\epsilon =16\) in most cases.
Evaluation on defense model To evaluate the performance of the DTA on attacking robust models, we make a comparison by employing advinceptionv3 (AdvIncv3) (Tramèr et al. 2018), Ens3advinceptionv3 (Incv3\(_{ens3}\)) (Tramèr et al. 2018), Ens4advinceptionv3 (IncV3\(_{ens4}\)) (Tramèr et al. 2018), and Ensadvinceptionresnetv2 (IncResv2\(_{ens}\)) (Tramèr et al. 2018) as the target models, all of which are adversarially trained. We first employ the selected 1000 images mentioned above to generate corresponding adversarial images on VGG16 and then to test these generated examples’ attack performance on these four defense models. All these pretrained models’ parameters can be available from the GitHub repository tf_to_pytorch_model.^{Footnote 6} The results illustrated in Table 4 show that DTA has achieved about 11.32–15.77% attack success rate on these robust models. The baseline methods, Bandits, Rays, TA and CGBA, however, can only obtain 5.74–13.65%, 3.14–7.79%, 2.88–5.19% and 0.77–3.44%, respectively. It implies that the adversarial examples generated by DTA are more prone to attack deep models successfully, even on defense models.
Besides the above comparisons, we are also interested in evaluating the performance of our method and the competitors by using different metrics. DEEPSEC (Ling et al. 2019) is a useful tool for the assessment of adversarial examples, which provides ten evaluation indicators. Specifically, from the perspective of classification outcomes, DEEPSEC provides (1) Misclassification Ratio (MR), (2) Average Confidence of Adversarial Class (ACAC), and (3) Average Confidence of True Class (ACTC). From the perspective of imperceptibility, DEEPSEC provides (1) Average \(L_p\) Distortion \(ALD_{p}\), including \(L_{0}\), \(L_{2}\), and \(L_{\infty }\), (2) Average Structural Similarity (ASS), and (3) Perturbation Sensitivity Distance (PSD). From the perspective of the robustness of adversarial samples, DEEPSEC provides (1) Noise Tolerance Estimation (NTE), (2) Robustness to Gaussian Blur (RGB), (3) Robustness to Image Compression (RIC), and (4) Computation Cost (CC). We select 7 indicators as the evaluation metrics, as shown in Table 5. In this experiment, we optimize the ResNet20 model (He et al. 2016b) on CIFAR10 until the best performance (\(\ge 90\%\)) on the test set is obtained. Then, 1000 images are selected as the normal examples by DEEPSEC (according to the given instructions). The adversarial examples are generated by Bandits, Rays, TA, CGBA, and DTA. The maximal query number is set to 100. The target model is ResNet20. Given all generated adversarial examples during the attack, we finally employ DEEPSEC to compute the corresponding metrics. As shown in Table 5, our method is superior to other methods in terms of misclassification rate and robustness by MR (45.98%), ACAC (0.73), ACTC (0.19), PSD (153.63) and NTE (0.51), which reveals that the adversarial examples generated by DTA have stronger attack capabilities and antidetection capabilities.
Query distribution
To see the advantage of the proposed framework on query number for each attack, we plot the histogram of query numbers used to perform a successful attack in Fig. 3 for CIFAR10 and SVHN. The test sets of CIFAR10 and SVHN are used to compute the statistics, while ShuffleNetV2 (Ma et al. 2018) is employed as the target model. The maximal query number is limited to 500. For clearance, each bar denotes how many normal examples yield successful attacks with the times as noted in the xaxis. As observed, in all cases, the proposed DTA can perform a successful attack based on most examples with only ONE time. The average counts of query times by DTA for CIFAR10 and SVHN are only 17.05 and 38.08 under \(\epsilon =16\), respectively. Notably, on ShuffleNetV2, DTA helps 88% and 90% examples to attack successfully within a handful of query times when \(\epsilon =8\) and \(\epsilon =16\), respectively. On the other hand, Rays and Bandits often require hundreds of queries to perform a successful attack, and a small number of queries (such as \(\le 100\)) could not allow these methods to work well. As Guo et al. (2019) indicates, the distribution of the histogram is highly rightskewed and hence, the median query count is a more representative aggregate statistic than the average query count. The results show that the median values of our method are only ONE in all cases on CIFAR10, which sufficiently validates the proposed generative idea on generating adversarial examples.
Transferability
The motivation of the current work states that the generation of adversarial examples is generally based on the assumption of transferability, which is saying that an adversarial example generated according to a model can be used to attack the other different models. To see this assumption is valid for blackbox attacks, here, we follow the previous work (Zhao et al. 2020; Dolatabadi et al. 2020) and examine the transferability of the generated adversarial examples across different models on CIFAR10 and SVHN. Specifically, we select 8 models including ResNet50 (He et al. 2016b), VGG16 (Simonyan and Zisserman 2015), VGG19 (Simonyan and Zisserman 2015), ShuffleNetV2 (Ma et al. 2018), MobileNetV2 (Sandler et al. 2018), InceptionV3 (Szegedy et al. 2016), DenseNet169 (Huang et al. 2017), and GoogLeNet (Szegedy et al. 2015). Following the settings in Kurakin et al. (2017), we randomly select 1000 images from the test set, which are classified correctly by the model whereas the corresponding adversarial examples are misclassified. The generated adversarial examples are used to attack the other models. For a fair comparison, we set \(\epsilon =16\) and the maximal query number to 500 for all cases.
DTA is compared with Bandits, Rays and TA in the untargeted blackbox attack settings. The ASR matrix on the two datasets is shown in Fig. 4. The row represents which model is targeted during the generation of adversarial examples (we only preserve these adversarial examples that can attack the target model successfully), while the column represents which model is attacked by the aforementioned generated examples. From this figure, we can see that the transferability ASR on CIFAR10 of DTA is from 33.6 to 79.6%, while the baseline methods are 11.6–52.0%, 7.5–40.3% and 8.7–52.9%, respectively. It means that the examples generated by DTA produce a higher(about 26.1–27.0% higher in most cases) attack success rate on changed models than those by Bandits and Rays, validating the superior transferability of DTA. This is because baseline methods heavily rely on the feedback of the target model during each query and cannot extract transferable features. By contrast, our method learns the adversarial distribution that does not collapse to a certain model.
Dataset and modelagnostic attack
To evaluate the performance of DTA on the examples with different semantics and model structures, we first conduct the attack experiments on other datasets than the training ImageNet dataset. Specifically, the test datasets include VOC 2007 (VOC07) (Everingham et al. 2010), VOC 2012 (VOC12) (Everingham et al. 2010), Plasces365 (Pla365) (Zhou et al. 2017), and Caltech101 (Cal101) (FeiFei et al. 2004). The target models include VGG19 (Simonyan and Zisserman 2015), InceptionV3 (Szegedy et al. 2016), ResNet152 (He et al. 2016b), and WideResNet50 (Zagoruyko and Komodakis 2016), all of which are implemented in PyTorch. The attack results are illustrated in Table 6, which shows that the DTA trained on ImageNet is available to generate effective adversarial examples on other datasets without retraining. In certain situations, the attack success rate can exceed 90%, where the maximal query size is limited to 100. To be clear, we do not care about how the ground truth labels of those datasets affect the current DTA, but only calculate the attack success rate by comparing the outputs of the original clean image and the corresponding adversarial counterpart, just as the Evasion Rate (Matachana et al. 2020).
We further apply our DTA to attack transformers, which are pretty different from traditional CNN, including ViT16 (Dosovitskiy et al. 2021), ViT32 (Dosovitskiy et al. 2021), and SwinB (Liu et al. 2021). Where the DTA is trained on the collected data pairs from CNNs with noise budget \(\epsilon =16\), the empirical results we report in Table 7 show that DTA can obtain 26–41% attack success rate with limited queries. This phenomenon demonstrates that even in attacking transformers, DTA can still generate adversarial examples and achieve an acceptable attack effect on different ViT models. Furthermore, it illustrates the high adaptability of DTA in modelagnostic blackbox scenarios.
Ablation study
Loss and hyperparameters
The proposed method concerns the settings of the MSE loss and the hyperparameters, such as L and K, which affect the model depth. We examine the influence of these factors on the CIFAR10 dataset. The target model is the pretrained VGG16 (Simonyan and Zisserman 2015). During the attack, the maximal number of queries is limited to 500.
First, we evaluate the performance of DTA with and without the MSE loss. If the MSE loss is not used, we mean that the updating step in the 4th line of Alg. 1 is omitted. The comparison is listed in Table 8, which shows that the flow model could benefit from the MSE loss, yielding notable improvement on both attack success rate and average query number.
Next, we test how model depth or representative capacity affects the attack performance. Two experiments are considered here. In the first one, we fix \(L=3\) and examine the influence of K from \(\{2, 4, 6, 8\}\). In the second one, we fix \(K=2\) and evaluate the performance of L from \(\{1, 2, 3, 4\}\). The results are shown in Tables 9 and 10, respectively. As seen, different settings produce a similar performance on both ASR and average query number, which suggests that the attack ability of the proposed model on CIFAR10 does not benefit from the increasing of the model depth. This may be because the data in CIFAR10 is simple and hence, we set \(K=2\) and \(L=2\) in small datasets, e.g., CIFAR10 and SVHN. But in ImageNet, which contains complex data, we set \(K=8\) and \(L=5\).
Furthermore, we examine the generalization ability of the adversarial examples generated by DTA, i.e., testing the performance of DTA by involving different numbers of whitebox attack models. Specifically, we use ten pretrained models on ImageNet, including VGG19 (Simonyan and Zisserman 2015), ResNet152 (He et al. 2016b), InceptionV3 (Szegedy et al. 2016), DenseNet201 (Huang et al. 2017), WideResNet50 (Zagoruyko and Komodakis 2016), VGG16 (Simonyan and Zisserman 2015), ResNet101 (He et al. 2016b), MobileNetV2 (Sandler et al. 2018), DenseNet121 (Huang et al. 2017), and DenseNet169 (Huang et al. 2017). The first five models are used for generating the training adversarial examples, while the rest are used as the blackbox test models to evaluate the attack effect of DTA. In this experiment, we select different numbers of models from the first five ones for example generation, where the results are plotted in Fig. 5. It can be clearly observed that the more models used in performing the attack, the better performance can be obtained on all the target models. This indicates that we can use more models in the process of sample adversarial examples to gain an increased universal success rate on blackbox attacks.
Improved performance by shifted means and stds
It is common to sample from a Gaussian distribution during the inverse of a normalized flow model to generate the expected data; however, in this work, we found that if we simply sample from \(\mathcal {N}(0,1)\) to obtain the adversarial examples, the attack performance will behave badly. Instead, for the welltrained normalized flow model, we first input the training data to obtain its corresponding latent space \(\varvec{z}\), then count the mean \(\hat{\varvec{\mu }}\) and variance \(\hat{\delta }\) of \(\varvec{z}\) and use it as the Gaussian distribution \(\hat{\mathcal {N}}(\hat{\varvec{\mu }},\hat{\delta }^2)\) as the adversarial latent space \(\hat{\varvec{z}}\) for sampling to enhance the attack performance. We report the empirical results in Table 11, as the results show, equipment the shifted mean \(\hat{\varvec{\mu }}\) and std \(\hat{\delta }\), the attack success rate are improved by 49.16–59.65%, 35.35–36.72% and 26.38–37.49% over CIRAR10, SVHN and ImageNet, respectively. These results have demonstrated well that the latent space has been shifted guided by the adversarialclean example pairs.
Compare with GANbased
As we declared above, the adversarial and normal examples come from different distributions, which are misaligned but transferable, and we characterize the adversarial distribution with locally collected adversarial examples in a generative manner; more specifically, a conditional normalized flow is involved in learning the transformation in this paper. To verify whether other generative models are qualified for this task well or not, we apply the whole same pipeline to the GAN borrowed from GAP (Poursaeed et al. 2018) and present the attack performance in Fig. 6. As the results show, the GAN can also learn the mapping relationship between these two types of samples, but its attack capability is unsatisfactory; in addition, as the query budget increases, the attack success rate of DTA will increase significantly, while the GANbased method will not. Again, these results demonstrate the superiority of our proposed conditional likelihoodbased DTA method in generating examples belonging to adversarial distributions.
Conclusions
In this paper, we propose a novel hardlabel blackbox adversarial attack framework based on a generative idea. The motivation states that the public datasets enforce the public models to learn a common distribution, causing that the models exhibit similar vulnerability. Hence, the adversarial distributions of different models could also be similar, which inspires the transferability assumption in many adversarial attack methods. Based on such an assumption, we advocate that there could be a certain mapping from the distribution of normal examples to the distribution of adversarial examples. Along with this, a conditional normalizing flowbased generative model is developed to implement the mapping function. We can optimize the flow model to explicitly correlate the adversarial examples with Gaussianstyle hidden representations by collecting a batch of adversarial examples from the existing whitebox attacks. To diversify the generation process, the normal examples are fed into the conditions of the probabilistic model. An elaborated generation process helps us to improve the performance of the generated examples. Extensive experiments validate the proposed idea and demonstrate the superiority of DTA on attack success rate, average query number and median query number. Especially, our method can achieve a successful attack within only ONE query, which verifies that we have learned the adversarial distribution. By contrast, the other hardlabel methods generally require hundreds of queries to accomplish an attack. We also surprisingly find that the proposed model can perform effective crossdataset attacks, which means that the model is not sensitive to the label space of the classification task. In summary, this work provides a promising framework with the advantages of low query times, high success rate, and an efficient inference process, which could guide future research on adversarial attacks in a new direction.
Availability of data and materials
The data that support the findings of this study are openly available in at https://www.cs.toronto.edu/kriz/cifar.html, http://ufldl.stanford.edu/housenumbers/, and https://imagenet.org/, reference number Krizhevsky and Hinton (2009), Sandler et al. (2018), and Russakovsky et al. (2015).
References
Akhtar N, Liu J, Mian A (2018) Defense against universal adversarial perturbations. In: CVPR, pp. 3389–3398. https://doi.org/10.1109/CVPR.2018.00357
Ardizzone L, Lüth C, Kruse J, Rother C, Köthe U (2019) Guided image generation with conditional invertible neural networks. CoRR arXiv:abs/1907.02392
Baluja S, Fischer I (2018) Learning to attack: adversarial transformation networks. In: AAAI, pp 2687–2695
Carlini N, Wagner DA (2017) Towards evaluating the robustness of neural networks. In: S &P. https://doi.org/10.1109/SP.2017.49
Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2018) Adversarial attacks and defences: a survey. CoRR arXiv:abs/1810.00069
Chen J, Gu Q (2020) Rays: a ray searching method for hardlabel adversarial attack. In: KDD, pp 1739–1747. https://doi.org/10.1145/3394486.3403225
Chen P, Zhang H, Sharma Y, Yi J, Hsieh C (2017) ZOO: zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In: ACM AISec@CCS, pp 15–26. https://doi.org/10.1145/3128572.3140448
Cheng M, Singh S, Chen PH, Chen P, Liu S, Hsieh C (2020) Signopt: a queryefficient hardlabel adversarial attack. In: ICLR
Ding J, Xu Z (2020) Adversarial attacks on deep learning models of computer vision: a survey. ICA3PP 12454:396–408. https://doi.org/10.1007/9783030602482_27
Dinh L, Krueger D, Bengio Y (2015) NICE: nonlinear independent components estimation. In: ICLR
Dolatabadi HM, Erfani SM, Leckie C (2020) Advflow: inconspicuous blackbox adversarial attacks using normalizing flows. In: NeurIPS
Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, Li J (2018) Boosting adversarial attacks with momentum. In: CVPR. https://doi.org/10.1109/CVPR.2018.00957
Dong Y, Cheng S, Pang T, Su H, Zhu J (2022) Queryefficient blackbox adversarial attacks guided by a transferbased prior. IEEE Trans Pattern Anal Mach Intell 44(12):9536–9548. https://doi.org/10.1109/TPAMI.2021.3126733
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR
Duan R, Ma X, Wang Y, Bailey J, Qin AK, Yang Y (2020) Adversarial camouflage: Hiding physicalworld attacks with natural styles. In: CVPR. https://doi.org/10.1109/CVPR42600.2020.00108
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Prakash A, Kohno T, Song, D (2018) Robust physicalworld attacks on deep learning visual classification. In: CVPR. https://doi.org/10.1109/CVPR.2018.00175
FeiFei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: CVPR, p 178. https://doi.org/10.1109/CVPR.2004.383
Feng Y, Wu B, Fan Y, Liu L, Li Z, Xia S (2022) Boosting blackbox attack with partially transferred conditional adversarial distribution. In: CVPR, pp 15074–15083. https://doi.org/10.1109/CVPR52688.2022.01467
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: ICLR
Guo C, Rana M, Cissé M, van der Maaten L (2018) Countering adversarial images using input transformations. In: ICLR
Guo C, Gardner JR, You Y, Wilson AG, Weinberger KQ (2019) Simple blackbox adversarial attacks. ICML 97:2484–2493
Guo F, Sun Z, Chen Y, Ju L (2023) Towards the universal defense for querybased audio adversarial attacks on speech recognition system. Cybersecurity 6(1):1–18
He K, Zhang X, Ren S, Sun J (2016a) Identity mappings in deep residual networks. ECCV 9908:630–645
He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: CVPR, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Huang Z, Zhang T (2020) Blackbox adversarial attack with transferable modelbased embedding. In: ICLR
Hu J, Shen L, Sun G (2018) Squeezeandexcitation networks. In: CVPR, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
Ilyas A, Engstrom L, Athalye A, Lin J (2018) Blackbox adversarial attacks with limited queries and information. ICML 80:2142–2151
Ilyas A, Engstrom L, Madry A (2019) Prior convictions: blackbox adversarial attacks with bandits and priors. In: ICLR
Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1x1 convolutions. In: NeurIPS, pp 10236–10245
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, technical report 1
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. In: Handbook of systemic autoimmune diseases, vol 1, no 4
Kurakin A, Goodfellow IJ, Bengio S (2017) Adversarial examples in the physical world. In: ICLR
Li Y, Li L, Wang L, Zhang T, Gong B (2019) NATTACK: learning the distributions of adversarial examples for an improved blackbox attack on deep neural networks. ICML 97:3866–3876
Ling X, Ji S, Zou J, Wang J, Wu C, Li B, Wang T (2019) DEEPSEC: a uniform platform for security analysis of deep learning model. In: S &P, pp 673–690. https://doi.org/10.1109/SP.2019.00023
Liu A, Liu X, Fan J, Ma Y, Zhang A, Xie H, Tao D (2019a) Perceptualsensitive GAN for generating adversarial patches. In: AAAI. https://doi.org/10.1609/aaai.v33i01.33011028
Liu R, Liu Y, Gong X, Wang X, Li H (2019b) Conditional adversarial generative flow for controllable image synthesis. In: CVPR, pp 7992–8001. https://doi.org/10.1109/CVPR.2019.00818
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV, pp 9992–10002
Liu P, Xu X, Wang W (2022) Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. Cybersecurity 5(1):1–19
Lu Y, Huang B (2020) Structured output learning with conditional generative flows. In: AAAI, pp 5005–5012
Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet V2: practical guidelines for efficient CNN architecture design. ECCV 11218:122–138. https://doi.org/10.1007/9783030012649_8
Ma C, Guo X, Chen L, Yong J, Wang Y (2021) Finding optimal tangent points for reducing distortions of hardlabel attacks. In: NeurIPS, pp 19288–19300
Madaan D, Shin J, Hwang SJ (2020) Adversarial neural pruning with latent vulnerability suppression. ICML 119:6575–6585
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: ICLR
Matachana AG, Co KT, MuñozGonzález L, MartínezRego D, Lupu EC (2020) Robustness and transferability of universal attacks on compressed models. CoRR arXiv:abs/2012.06024
Mirsky Y (2023) Ipatch: a remote adversarial patch. Cybersecurity 6(1):18
Mirza M, Osindero S (2014) Conditional generative adversarial nets. CoRR arXiv:abs/1411.1784
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning
Poursaeed O, Katsman I, Gao B, Belongie SJ (2018) Generative adversarial perturbations. In: CVPR, pp 4422–4431
Pumarola A, Popov S, MorenoNoguer F, Ferrari V (2020) Cflow: conditional generative flow models for images and 3d point clouds. In: CVPR, pp 7946–7955. https://doi.org/10.1109/CVPR42600.2020.00797
Reza MF, Rahmati A, Wu T, Dai H (2023) Cgba: curvatureaware geometric blackbox attack. In: ICCV, pp 124–133
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Li F (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s112630150816y
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR arXiv:abs/1801.04381
Simonyan K, Zisserman A (2015) Very deep convolutional networks for largescale image recognition. In: ICLR
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: NeurIPS, pp 3483–3491
Sun L, Tan M, Zhou Z (2018) A survey of practical adversarial example attacks. Cybersecurity 1:1–9
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2014) Intriguing properties of neural networks. In: ICLR
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inceptionv4, inceptionresnet and the impact of residual connections on learning. In: AAAI, pp 4278–4284
Tramèr F, Kurakin A, Papernot N, Goodfellow IJ, Boneh D, McDaniel PD (2018) Ensemble adversarial training: attacks and defenses. In: ICLR
Tu C, Ting P, Chen P, Liu S, Zhang H, Yi J, Hsieh C, Cheng S (2019) Autozoom: autoencoderbased zeroth order optimization method for attacking blackbox neural networks. In: AAAI, pp 742–749. https://doi.org/10.1609/aaai.v33i01.3301742
Wang H, Yu C (2019) A direct approach to robust deep learning using adversarial networks. In: ICLR
Wang J, Chang X, Wang Y, Rodríguez RJ, Zhang J (2021) Lsganat: enhancing malware detector robustness against adversarial examples. Cybersecurity 4:1–15
Wang X, Zhang Z, Tong K, Gong D, He K, Li Z, Liu W (2022) Triangle attack: a queryefficient decisionbased adversarial attack. ECCV 13665:156–174. https://doi.org/10.1007/9783031200656_10
Wu H, Liu AT, Lee H (2020) Defense for blackbox attacks on antispoofing models by selfsupervised learning. In: INTERSPEECH, pp 3780–3784. https://doi.org/10.21437/Interspeech.20202026
Xiao C, Li B, Zhu J, He W, Liu M, Song D (2018) Generating adversarial examples with adversarial networks. In: IJCAI, pp 3905–3911. https://doi.org/10.24963/ijcai.2018/543
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: BMVC
Zhang Y, Li Y, Liu T, Tian X (2020) Dualpath distillation: a unified framework to improve blackbox attacks. ICML 119:11163–11172
Zhao P, Chen P, Wang S, Lin X (2020) Towards queryefficient blackbox adversary with zerothorder natural gradient descent. In: AAAI, pp 6909–6916
Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40:1452–1464
Acknowledgements
Not applicable.
Funding
This work is supported in part by the National Natural Science Foundation of China under Grant 62162067, 62101480 and 62362068, Research and Application of Object Detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202305AF150078 and the Scientific Research Fund Project of Yunnan Provincial Education Department under 2023Y0249.
Author information
Authors and Affiliations
Contributions
RL: conceptualization; methodology; writingoriginal draft. WZ: validation; supervision; funding acquisition. XJ: investigation; software. SG: visualization; investigation; formal analysis. YW: data curation; resources. RW: writing—review and editing; project administration, funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liu, R., Zhou, W., Jin, X. et al. DTA: distribution transformbased attack for querylimited scenario. Cybersecurity 7, 8 (2024). https://doi.org/10.1186/s42400023001972
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s42400023001972