Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives

Empirical attacks on Federated Learning (FL) systems indicate that FL is fraught with numerous attack surfaces throughout the FL execution. These attacks can not only cause models to fail in specific tasks, but also infer private information. While previous surveys have identified the risks, listed the attack methods available in the literature or provided a basic taxonomy to classify them, they mainly focused on the risks in the training phase of FL. In this work, we survey the threats, attacks and defenses to FL throughout the whole process of FL in three phases, including Data and Behavior Auditing Phase, Training Phase and Predicting Phase. We further provide a comprehensive analysis of these threats, attacks and defenses, and summarize their issues and taxonomy. Our work considers security and privacy of FL based on the viewpoint of the execution process of FL. We highlight that establishing a trusted FL requires adequate measures to mitigate security and privacy threats at each phase. Finally, we discuss the limitations of current attacks and defense approaches and provide an outlook on promising future research directions in FL.


Introduction
As smart cities grow in popularity, the amounts of multisource heterogeneous data generated by various organizations and individuals have become increasingly diverse. However, businesses and people are hesitant to exchange data due to the concern about data privacy, leading to the emergence of data silos. Several attempts have been made to solve the data privacy threats, where FL has shown its superiority as it allows multiple local workers to train together without revealing sensitive information about local data . In December 2018, the IEEE Standards Committee approved the standard project of architectural framework and application of Federated machine learning. Subsequently, more and more scholars and technical experts joined the standards working group and participated in drafting IEEE Standards about FL.
At present, FL combined with Multi-task Learning (Smith et al. 2017), Reinforcement Learning (Qi et al. 2021), Graph Neural Network  or other artificial intelligence algorithms have been proposed and applied in many fields. In addition, similar to FL, some collaborative learning methods like Assisted Learning (Xian et al. 2020), Split Learning (Vepakomma et al. 2018) have also been proposed. In Natural Language Processing (NLP), the application of FL is also being widely studied. Lin et al. (2021) opened up a research-oriented FedNLP framework, which aims to study privacy-preserving methods in NLP with FL. Many aggregation algorithms and open-source frameworks for FL have also been proposed (Mothukuri et al. 2021), such as FedAvg (McMahan et al. 2017), SMC-AVG (Bonawitz et al. 2016), FedProx , and FATE, Tensorflow-Federated, PySyft etc.
Although FL can effectively break data silos, there are many inborn security and privacy threats. Before the model is trained, malicious local workers may destroy the integrity, confidentiality, and availability of data, and thus contaminate the model. In general, the key roles of FL include two parts: central server and local workers (or local clients). The adversary can compromise the central server or a part of local workers. When the model is being trained, the adversary can manipulate the global model by controlling the samples or model updates. This will result in degraded performance of the global model, or leave a backdoor. In addition, in the model training and predicting phases, the adversary can also infer the private information of other honest local workers, including membership inference and attribute inference. Even though differential privacy and other privacy-preserving algorithms have been implemented within FL, attacks against FL can still succeed (Cheu et al. 2021).
Many existing surveys mainly focused on listing and describing various attack methods and defense strategies Mothukuri et al. 2021;Enthoven and Al-Ars 2020). However, these surveys only analyze security and privacy threats in the training phase. In this work, we analyze the security and privacy threats according to the multi-phase framework of the FL execution, including Data and Behavior Auditing, Training and Predicting. We identify the issues and provide a taxonomy of threats, attacks and defenses on FL. We also provide perspectives on how to build a trusted FL.

FL concepts and challenges
Definition FL is defined as a machine learning paradigm in which multiple clients work together to train a model under the coordination of a central server, while the training data remains stored locally (Kairouz et al. 2019). According to the type of local workers, FL can be divided into crossdevice and cross-silo. Cross-device workers are primarily mobile phones, tablets, speakers, and other terminal IoT devices. These local workers may disconnect at any time in the process of model training. The workers of cross-silo are mainly large institutions that have high data storage and computing capabilities. In the fully decentralized setting, FL can be combined with blockchain (Warnat-Herresthal et al. 2021) or secure multi-party computing technology . In this work, we focus on security and privacy threats against centralized FL.

A Categorization of Federated Learning
In FL, models are trained locally and aggregated at a central server. A global model is obtained after several parameter/gradient aggregation updates. Unlike distributed machine learning, the central server of FL does not have access to the local worker's data. The data distribution among local workers can be independent and identically distributed (i.i.d) or non-independent and identically distributed (non-i.i.d). The types of FL mainly include Horizontal FL (HFL), Vertical FL (VFL) and Federated Transfer Learning (FTL) . The specific description of each type is as follows: HFL is suitable for local workers with less sample repetition and more overlapping features. Most existing work mainly focused on the security and privacy towards HFL. VFL is suitable for the scenarios where local workers have the same sample ID and less overlapping features. VFL consists of encrypted entity alignment and encrypted model training. As the number of workers increases, the amount of calculations increases accordingly. SecurBoost (Cheng et al. 2019) is the most representative model of vertical FL, which supports multiple workers to participate in VFL in the FATE framework. FTL  is suitable for scenes with few sample ID and feature overlap.

Fully decentralized learning
To avoid malicious or semi-honest third parties (central servers), fully decentralized learning emerged (Kim et al. 2018). The fully decentralized learning is usually combined with blockchain, which has proven to be effective in protecting data privacy . Warnat-Herresthal et al. (2021) proposed a decentralized collaborative computing method called Swarm Learning (SL), which combines privacy-preserving, edge computing and blockchain based peer-to-peer network. Weng et al. (2021) proposed DeepChain, realizing data confidentiality and calculating auditability based on blockchain incentive mechanism and privacy-preserving methods. Based on the combination of blockchain technique and privacy-preserving algorithms, it can be seen that fully decentralized learning enhances the trust guarantee of collaborative computing.

Learning mechanisms
The idea of FL is to jointly train a global model by optimizing the parameters θ with multiple local workers' updates. Basically, there are two aggregating mechanisms named synchronized SGD (Shokri and Shmatikov 2015) and FedAvg (McMahan et al. 2017). In synchronized SGD, each local worker computes the gradient at one batch from its own data and uploads it to the server. In FedAvg, each local worker performs several epochs of gradient descent and provides the updated parameters to the server. Then, the central server will aggregate those gradients or parameters.

The relationship between FL and privacy computing
Privacy computing refers to a range of information technologies that analyze data while ensuring that the data providers do not reveal the private information. In other words, privacy computing is a collection of "data available but not visible" technologies, including FL, secure multi-party computing (MPC), trusted execution environment (TEE), differential privacy (DP), etc. Among them, FL is a derivative technique that integrates distribution machine learning with privacy techniques; secure multi-party computing is a cryptography-based privacy computing technique; trusted execution environment is a trusted hardware-based privacy computing technique; differential privacy is a rigorous mathematical definition of privacy. These techniques are often used in combination to accomplish computing and analyzing data while ensuring the security and privacy of the original data.

The challenge of heterogeneity
With the diversification and complexity of the local workers, the concerns of mutual trust, efficiency, and convergence quality become increasingly obvious. In practical applications, FL needs to break through the heterogeneity of devices in storage, computing and communication capabilities, non-i.i.d data, and model requirements in different local application environments. One effective method to addressing these heterogeneous challenges is to implement personalized FL in three aspects: device , data  and model (Smith et al. 2017).

The challenge of communication
Reducing communication costs is a major bottleneck for federated computing, as local workers need to multiple interact with a central server and the connections are often unstable. Therefore, how to improve the transmission efficiency while ensuring the accuracy of the joint calculation is an important issue. Existing work indicated that sparse matrix (Konečný et al. 2016) and model compression  can significantly reduce the communication overhead with little impact on the model accuracy.

The challenge of security and privacy threats
The attack surfaces of FL have expanded due to the characteristics of distribution. For example, malicious local workers may try to steal the privacy information of honest local workers, or malicious local workers can launch collusive attacks to impairing the performance of the final global model.

Multi-phases framework of trusted FL
As shown in Fig. 1, the multi-phases framework of the FL execution can be divided into three phases, including Data and Behavior Auditing, Training and Predicting. The model faces different security and privacy threats at each phase of FL execution. We argue that establishing a trusted FL requires taking effective measures at each phase to fully mitigate security and privacy threats.

• Data and behavior auditing phase
In general, contaminated data and malicious behavior are the main factors affecting model performance. On the one hand, the data of local workers may be contaminated by label noise or feature noise. On the other hand, the historical behavior of local workers may be malicious. The local workers' systems may have some vulnerabilities. These vulnerabilities may have been exploited by adversaries. These threats will impact the subsequent training and prediction of FL. If the risk of data and behavior auditing phase is minimized, the probability of poisoning attacks and privacy inference attacks may decrease.
• Training phase FL requires multiple local workers working collaboratively to train a global model. In the model training phase, malicious local worker can manipulate their data, model gradients and parameters. Therefore, if adversaries compromise the local workers, they can disturb the integrity of the training dataset or model to impair the performance of the global model. Besides, the central server can also launch passive or active inference attacks. In addition, during the upload and download of model updates, the models may be eavesdropped by intermediaries in the communication channel, resulting in model updates being tampered or stolen. Therefore, it is necessary to protect the transfer of model updates between the local workers and the central server.

• Predicting phase
Once the model is trained, the global model is deployed onto the local worker devices, regardless of whether they participated in the training or not. In this phase, the evasion attacks and privacy inference attacks occur frequently. Evasion attacks usually do not change the target model, but cheat the model to produce false prediction. Privacy inference attacks can reconstruct the characteristics of the model and raw data. The effectiveness of these attacks depend on the knowledge available to the adversaries.

Data and behavior auditing phase
The performance of FL depends on the high quality data of the local workers and the bengin historical behavior of the local workers and central server. Once the data quality is low or there exists malicious behavior, the trained model may become ineffective or even harmful. This section analyzes the threat model, attacks and defenses during the data and behavior auditing phase.

Threat model
In FL, the data of each local worker is available and invisible. Local workers have absolute right of control over their data. This rule makes it difficult to audit the data quality and historical behavior of all local workers. Therefore, a malicious local worker can silently modify the training data to influence the final global model. In addition, the data quality issues, such as unlabeled, noisy or incomplete, may occur during data collection, transmission and processing. These may cause a significant impact on data-based decision-making (Jiang et al. 2021).

Attacks
The data and behavior auditing phase is the first line of defense to ensure the credibility of FL. If this line is breached, a malicious local worker can use low-quality or poisoned data to decrease the performance of the global model or even corrupt the model. In this phase, the local workers and central server are exposed to existing system, software, and network security threats. Adversaries can cause damage to data and systems by social engineering, penetration attacks, backdoor attacks, and advanced persistent threat (APT) attacks etc. For example, in the cross-device scenario, if the devices have some vulnerabilities, the adversaries can exploit these vulnerabilities to compromise the data and the model (Wang et al. 2014). In addition, insiders can also directly undermine core data and systems by abusing their authorities. Inadvertent errors, and environmental factors in the phases of data collection and transmission will also have a certain impact on the subsequent data analysis.

Defenses
Before the model training phase, one method to ensure the credibility of the FL model is auditing the data quality of the local workers. High-confidence data can effectively reduce the occurrence of poisoning attacks and improve the effectiveness of the model. However, there are few works on the data quality assessment in FL. The fact that the data of local workers cannot be aggregated poses some challenges to the overall data quality assessment in FL. Other method is evaluating the historical behavior of local workers and central server. Credibility measurement and credibility verification methods should be proposed based on the system logs.
In addition, the trustworthiness of the local workers should also be dynamically evaluated in the training process (Akujuobi et al. 2019). In general, malicious local workers usually behave differently than most trusted local workers. Therefore, by auditing the model behavior uploaded to the central server, the untrusted local workers can be eliminated.

Training phase
As mentioned earlier, the training phase of FL mainly involves poisoning attacks and privacy inference attacks (white-box). An adversary may launch privacy inference attacks to obtain the victim's privacy information, or launch poisoning attacks locally to affect the performance of the global model. We explain these two attacks in detail in the following subsections. FL (McMahan et al. 2016) has recently emerged as a solution to protect data privacy. However, existing work suggested that adversaries can infer different levels of sensitive information from the updated gradients in FL (Hitaj et al. 2017;Nasr et al. 2019;Zhu and Han 2020). In this section, we analyze the reasons for privacy leakage, threat models, attack methods and defense strategies for privacy inference attacks in the training phase.

The reasons of privacy leakage
Several common forms of privacy leakage are listed below.

• Leakage from embedding layer
When a deep learning model learns non-digital data with sparse and discrete input space, it will first convert the input into a low-dimensional vector representation through the embedding layer. For example, in the natural language processing scenario, each word in its vocabulary V signifies a discrete token, and is mapped to a vector after learning. The param-eters of the embedding layer can be represented by the matrix W emb ∈ R |V ×d| , where |V| donates the size of the vocabulary and d donates the dimensionality of the embedding. For a specific text, the gradient update of the embedding layer is only based on the words that appear in the text, and the gradient update corresponding to other words are 0. Based on this observation, an adversary can infer which words the local workers used during the FL training period directly ).

• Leakage from FC layer
The fully connected (FC) layer is usually an indispensable component in a deep learning model. The main function of the FC layer is to map the distributed features to the sample label space. Recent studies demonstrated that both the ground-truth labels )and the inputs to any FC layer (Geiping et al. 2020;Pan et al. 2020) can be restored from the gradients.
In regular classification tasks, the deep learning model generally ends with the FC layer, and the loss is calculated by cross-entropy after softmax activation. After the activation function, the output values are between 0-1. Therefore, the sign of gradient according to the correct label is negative and positive otherwise. Hence, the ground-truth labels can definitely be reconstructed from the shared gradients ). In addition, the input of the fully connected layer can always be calculated from the gradients, regardless of the position in the neural network (Geiping et al. 2020).

• Leakage from the model gradients
The model training is usually regarded as a highlevel representation of the data (Lyu 2018), which makes the gradient-based privacy inference attack possible (Aono et al. 2017). Recent work demonstrated that gradients can determine whether an exact sample was used to training Shokri et al. 2017), reveal the properties or the representatives of the training samples , and even completely restore the original training data (Stella et al. 2021;Zhang et al. 2020;Hitaj et al. 2017).

Threat model
In FL, the local workers, the central server, and the communication between the central server and the local workers are considered viable points for the implementation of attack methods. Since FL requires the central server and local workers to exchange gradients/parameters information, white-box attacks can be implemented in FL setting. A comparison of the threat models is summarized in Table 1.

• Sever-side attacks
A server can be assumed as an honest-but-curious server or a malicious server. The server's knowledge includes the model's structure, weights, and gradients for each epoch of the local workers. Basically, honest-but-curious adversaries may not modify the network structure or send malicious global parameters, while malicious servers vice. Meanwhile, it is usually assumed that the adversaries have unlimited computing resources.

• Eavesdropping attacks
The adversaries located in the communication channel between central server and local workers can launch eavesdropping attacks. The adversaries can steal or tamper some meaningful information, such as model weights or gradients, in each communication. • Worker-side attacks It can be assumed that K workers (of which K ≥ 2) collaboratively train a joint model using local datasets with negotiating a common FL algorithm.
When K = 2 , one of the workers is the adversary, whose goal is to steal information about the training data of another targeted local worker. In this case, an adversary can access the model structure, weights, and gradients of the target worker, just like a serverside adversary. In addition, the adversary takes the responsibility of training the model but cannot modify the model structure.
When K > 2 , there are workers who are neither the adversary nor the victim. In this case, the adversary cannot accurately obtain the gradient of the target victim, which increases the difficulty of the attack.

Attacks
According to different inference targets, privacy inference attacks can be summarized as membership inference attacks, class representative inference attacks, property inference attacks, and data reconstruction attacks. Table 2 lists the representative privacy inference attacks against FL in the training phase.  Membership inference attacks target on determining whether an exact sample was used to train the network (Shokri et al. 2017). An adversary can conduct both passive and active membership inference attacks (Nasr et al. 2019;Melis et al. 2019) to infer whether an exact data was used to train. Passive attacks generally do not modify the learning process, and only make inferences by observing the updated model parameters. Active adversaries can tamper with the training protocol of the FL model and trick other participants into exposing their privacy. A straightforward way is that the adversary shares malicious updates and induces the FL global model to reveal more information about the local data of other local workers. In Nasr et al. (2019), the author presented a comprehensive privacy analysis (CPA) of deep learning by exploiting the privacy vulnerabilities of the SGD algorithm. Experimental results concluded that the gradients are closer to the output layer leak more information, i.e., members and non-members produce different distributions during training. However, their work lacks theoretical proof of the boundaries of privacy breaches.

• Class representative inference attacks
Class Representatives inference attacks aim to obtain the prototypical samples of a target label that the adversary does not own. Hitaj et al. (2017) proposed an active inference attack at inside, called Generative Adversarial Networks Attack, on collaborative deep learning models. Experimental results demonstrated that any malicious local workers using this method could infer privacy information from other participants. However, the experiments require that all class members are similar, and the adversary has prior knowledge of the victim's data labels.

• Property inference attacks
The goal of property inference attacks is to infer meta characteristics of other participants' training data . Adversaries can obtain specific properties of victim's training data through active or passive inference based on auxiliary label information about the target properties. Passive adversaries can only observe model updates and train a binary attribute classifier of target property to perform inferences. Active adversaries can deceive the FL model to better separate data with and without target attributes, thereby stealing more information. However, the attack condition of auxiliary training data may limit its applicability. • Data reconstruction attacks Data reconstruction attacks aim to reconstruct training samples and/or associated labels accurately that were used during training.

DLG/iDLG
Previous work has made some contributions in inferring training data features from gradients, but these methods are generally considered "shallow" leakage. Deep Leakage from Gradient (DLG) (Zhu and Han 2020) was the first exploration to fully reveal the private training data from gradients, which can obtain the training inputs as well as the labels in only a few iterations. The core idea of DLG is to synthesis pairs of "dummy" inputs and labels by matching their "dummy" gradients close to the real ones, which can be described as a euclidean matching term (1).
Where (x, y) denotes the "dummy" input and the corresponding "dummy" label, and (x * , y * ) denotes the ground-truth training data and label. Experimental results demonstrated that the training image and label can be jointly reconstructed with a batch size up to 8 and image resolution up to 6464 in shallow and smooth architectures.
Although DLG has superior performance than the previous "shallow" leakage methods, it suffers from obtaining the ground-truth labels consistently and often fails due to a lousy initialization. In the following, the Improved Gradient Depth Leakage (iDLG) ) presented theoretically as well as empirically that the groundtruth labels can be recovered with 100% accuracy from the signs of corresponding gradients, such that it improves the fidelity of the extracted data. However, such an algorithm only works for sharing gradients of a single input data.

Inverting gradients
The effectiveness of DLG/iDLG is based on a strong assumption of shallow network and low-resolution recovery, but it is far from realistic scenarios. (Geiping et al. 2020) noted that these assumptions are not necessary if in a right attack. As such, it proposed to use cosine similarity (Chinram et al. 2021) with Total Variation (TV) restriction (Rudin et al. 1992) as the cost function.
(1) arg min Experimental results demonstrated that it is possible to restore the image even in realistic deep and non-smooth architectures 3. GradInversion The recovery of a single image's label in iDLG has yield great benefits for image restoration (Geiping et al. 2020). In GradInversion (Yin et al. 2021), it implemented a batch-wise labels reconstruction from the final FC layer gradients, enabling a larger batch images restoration in complex training settings. To recover more specific details, GradInversion also introduced a set of regularization, such as image fidelity regularization and group consistency regularization. The optimization function can be formulated as (3): Where x is a dummy input batch, and W denotes a network weights, W denotes a batch-averaged gradient of images x * and labels y * . Experimental results indicated that even for complex datasets and deep networks, batch-wise images can be reconstructed with high fidelity through GradInversion. However, this paper only discussed the gradient in one descent step at local.

GRNN
Generative Adversarial Networks (GAN) have been shown to be effective in recovering data information ). However, GAN based techniques require additional information, arg min such as class labels which are generally unavailable for privacy persevered learning (Hitaj et al. 2017). Recently,  proposed Generative Regression Neural Network (GRNN), which is capable of restoring training data and their corresponding labels without auxiliary data. Experimental results indicted that GRNN outperforms the DLG/IDLG method with stronger robustness, better stability and higher accuracy. However, same as GradInversion, it only discussed the gradient in one descent step at local.

Defenses
Existing strategies to resisting private inference are usually based on the processing of shared gradient information, including: (1) Compression Gradients; (2) Cryptology Gradients; and (3) Perturbation Gradients, as shown in Table 3.

• Compression gradients
The compressibility and sparsity of the gradients are mainly considered as tricks to reduce communication and computational overhead (Haddadpour et al. 2021). Abdelmoniem (2021) illustrated a statisticalbased gradient compression technique for distributed training systems, which effectively improves model communication and calculation efficiency. Intuitively, these methods can be directly transferred to FL privacy protection because they reduce the information sources for privacy inferences. In DLG , the experimental results suggested that compressing the gradients can successfully prevent deep leakage.  (Yao 1982) enables individual participants to perform joint calculations on their inputs without revealing their own information. This process ensures a high degree of privacy and accuracy. However, it is also computation and communication consuming ). In addition, SMC in FL scenario requires each worker to coordinate with each other during the training process, which is usually impractical.

• Perturbation gradients
The core idea of differential privacy(DP) (Abadi et al. 2016;Triastcyn and Faltings 2019) is to protect data privacy by adding random noise to sensitive information. Basically, DP can be divided into three categories: centralized DP (CDP), local DP (LDP) and distributed DP (DDP) . In FL, CDPs add noise to the aggregated local model gradient through a trusted aggregator to ensure the privacy of the entire data ). The effectiveness of CDPs requires numerous workers in the FL, which is not apply to H2B scenarios with smallscale workers . For LDPs and DDPs, the workers control noise disturbances, which can provide stronger privacy protection. However, LDPs usually need to add sufficient calibration noise to guarantee the data privacy, which may impair the performance of the model (Seif et al. 2020). DDPs guarantee the privacy of each worker by incorporating encryption protocols, which can lead to higher training costs.

Poisoning attacks
Poisoning attacks on machine learning models have been widely studied. These attacks occur in the training phase against FL. On the one hand, adversaries can impair the performance of the final global model on untargeted tasks. On the other hand, adversaries can inject a backdoor into the final global model. In general, poisoning attacks can be categorized as data poisoning and model poisoning.

Threat model
The adversaries can manipulate some local workers to participate in the training process of FL and modify the model updates. The modification methods include changing data features, labels, model parameters, or gradients. The proportion of local workers being manipulated and the amount of modification of training data are the key factors affecting the final training effect. Due to the distributed setting and practical application of FL, the data distribution can be i.i.d., and non-i.i.d. These attacks may be carried out under different data distribution conditions.

Attacks
In general, poisoning attacks can be divided into data poisoning attacks and model poisoning attacks, as well as targeted attacks (backdoor attacks) and untargeted attacks (byzantine attacks) Mothukuri et al. 2021; Enthoven and Al-Ars 2020).
• Data poisoning and model poisoning attacks Data poisoning attacks Data poisoning attacks are mainly changing the training dataset. The data can be changed by adding noise or flipping the labels.
Model poisoning attacks The purpose of model poisoning attacks is to arbitrarily manipulate the model updates. These attacks can cause the global model to deviate from the normal model, resulting in degraded model performance or leaving backdoors in the final global model.
Moreover, local workers sometimes just get the global model, but do not contribute data and computing resources. Such local workers can upload virtual updates, e.g. random parameters, to the central server. These attacks are called free riding attacks (Lin et al. 2019;Zong et al. 2018). Free riding attacks can also be classified as the model poisoning attacks.
What are the differences and similarities between data poisoning and model poisoning attacks?
For data poisoning attacks, adversaries can only add specific noise to the data or change the labels to affect the performance of the global model. For model poisoning attacks, adversaries usually actively influences the update of the model, e.g., changing objective function. Data poisoning attacks may not as effective as model poisoning attacks (Bhagoji et al. 2019).
The amount of existing data poisoning and model poisoning attacks to construct poisoned samples is to add a specific trigger to the data or to flip the labels. There are not many methods to implement poisoning attacks by adding triggers and unchanging labels. • Byzantine and backdoor attacks Byzantine attacks Byzantine attacks are the untargeted attacks and their goal is to cause the failure of the global model. Backdoor attacks The goal of a backdoor attack is to make the model fail in a particular task, while the normal task cannot be affected. To some degree, backdoor attack is one type of targeted poisoning attacks.
Backdoor attacks insert hidden triggers in the global model after training, generally by changing specific features. In the predicting phase, only when there are samples that can trigger backdoor task, the attack will succeed. Therefore, only the adversaries who know how to trigger the backdoor task can successfully launch the attack (this idea can also be applied to model identity authentication (Xiangrui et al. 2020)). However, the current work mainly focus on image datasets, and how to inject backdoor attacks on text datasets needs to be further explored.

• Perspectives of poisoning attacks
We summarize the perspectives of poisoning attacks with the following five questions. Q1. How to improve the effectiveness of poisoning attacks? Bagdasaryan et al. (2020) indicated that any local worker can upload a malicious model to the central server during the training phase. They presented a general method called "restrict-and-scale", which enabled adversaries to generate a model with high accuracy in both main task and backdoor task. In addition, they used an objective function to avoid being detected. The objective function includes rewarding the accuracy of the model and punishing the model that deviates from the "normal" of the aggregator. By adding the penalty item L ano , the objective (loss) function is modified as follows: The adversary's training data include both normal inputs and backdoor inputs, so that L class can balance the accuracy of main task and backdoor task. L ano can be any type of regularization, such as p-norm distance between weight/gradient matrices. In fact, the model poisoning attacks are mainly realized by modifying the objective function. Bhagoji et al. (2019) mainly studied a targeted attack on FL initiated by a few malicious local workers. They proposed the idea of simple boosting. In this processing, malicious local workers try to overcome the impact of the normal local workers and central server on model updates.
In order to improve the stealth of this attack, Bhagoji et al. (2019) proposed the idea of steady model pooling and alternating minimization, making the adversaries avoid being detected by central server.

Q2. What are the conditions for a successful poisoning attack?
Sun et al. (2019) compared the "random sampling attack" with "fixed frequency attack". "Random sampling attack" randomly selects malicious local workers in each round. And "fixed frequency attack" ensures one malicious local worker per f round. They proved that the performance of attacks depends on the proportion of malicious local workers. Baruch et al. (2019) indicted that the model changed within a certain small range is enough to lead to a non-omniscient attack, and some existing defenses (Krum, Trimmed Mean, Bulyan) can be bypassed when the data of each participant satisfy i.i.d.
Q3. How to make a backdoor task more secret? Xie et al. (2020) proposed a distributed backdoor attack. The original trigger added to samples in one local worker is disassembled into many sub-triggers added to samples in different local workers. Hence, each compromised local worker trains the local model using partial triggers. In the predicting phase, all subtriggers can be clustered on a single sample to launch a backdoor attack. In this way, the detection difficulty will increase after the triggers are distributed.
Q4. What are the triggers that can launch a successful backdoor attack ?  indicted that using tail edge samples as triggers can effectively launch backdoor attacks. These samples are unlikely to be part of the training or test data. This provided an idea for finding backdoor triggers.

Q5. Can poisoning attacks bypass the defense strategies?
Existing work presented that the answer to this question is "YES". Fang et al. (2020) studied model poisoning attacks against byzantine robust FL. It demonstrated that poisoning attacks can succeed even using robust aggregation algorithms such as Krum, Bulyan, Trimmed Mean and Median. Their work can greatly improve the error rate of the global model learned by the above four robust aggregation algorithms.

Defenses
There are two types of defense methods for poisoning attacks, namely robustness aggregation and differential privacy.

• Robustness aggregation
The central server can independently verify the performance of the global model with the validation dataset. The central server can also check whether the malicious local workers' updates are statistically different from other local workers' updates (Bhagoji et al. 2019).
Various byzantine-robust aggregation methods have been proposed to defend against malicious local workers. Sun et al. (2019) proved that norm threshold of updates can mitigate the attack without affecting the model performance. Fang et al. (2020) generalized RONI and TRIM which were designed to defend against data poisoning attacks to defend against their model poisoning attacks. RFA (Pillutla et al. 2019) aggregated the local models by computing a weighted geometric median using the smoothed Weiszfeld's algorithm. FoolsGold (Fung et al. 2018) is a defense method against sybil attacks on FL. Fools-Gold adapts the learning rate (aggregate weight) of local models based on the model similarity in each round. In the Median method (Yin et al. 2018), the central server sorts the parameters of local models, and takes the median as the next round global model. Same as Median, in the Trimmed Mean method (Yin et al. 2018), the server will also sort the parameter of local models. Then, the central server removes the largest and smallest β parameters, and computes the mean of the remaining m − 2β parameters as the next round global model. Blanchard et al. (2017) selects one of the local models which is similar to other local models as the global model. Even if the selected local model comes from the compromised local workers, its influence will be limited. Mhamdi et al. (2018)

Predicting phase
In the model predicting phase, there are still security and privacy threats, as shown in Table 4. The global model are visible to the local workers and central server, which may increase the possibility of launching attacks in the predicting phase. Malicious local workers or central server may infer honest local workers' sensitive information from the global model.

Evasion attacks
Evasion attacks aim to cheat the target model by constructing specific samples called adversarial examples. Usually, some subtle noise added to the input samples cannot be detected by human beings, and cause the model to give incorrect classification results. A classic example is that a panda image with a small amount of noise is identified as a gibbon (Szegedy et al. 2014).
The adversarial examples can be attributed to the linear characteristics in high-dimensional space (Goodfellow et al. 2015) and the non-robust characteristics (Gilmer et al. 2019).
According to the optimization objective, evasion attacks can be divided into targeted attacks with classspecific errors, and untargeted attacks that do not consider class-specific errors. The evasion attacks have attracted wide attentions and been applied to many scenes, such as attacking autonomous driving (Lu et al. 2017), internet of things (Yulei 2021), face recognition (Sharif et al. 2016), and speech recognition (Carlini et al. 2016).

Threat model
From the perspective of the adversary's knowledge, the attack can be divided into white-box and black-box attacks. Under the white-box attacks, the adversary has complete knowledge about the target model, including neural network structure, model parameters and output. In contrast, under the black-box attacks, the adversary does not know the neural network architecture, parameters, and other target model information. The attack can be implemented according to the query results of the target model.

Attacks
The main research direction of the evasion attacks (adversarial examples attacks) is to design adversarial examples and to break through the robustness of the model.

• In computer vision (CV)
White-box evasion attacks are mainly based on optimization, gradient, classification hyperplane and so on. For the optimization-based methods, how to find the minimum possible attack disturbance is defined as an optimization problem. The most representative method is C&W (Carlini and Wagner 2017) and L-BFGS (Szegedy et al. 2014). For the gradientbase methods, their core idea is to modify the input sample in the gradient direction. The main methods include one attack, such as FGSM (Goodfellow et al. 2015) and iterative attack, such as i-FGSM (Kurakin et al. 2017). For the classification hyperplane-based methods, their purpose is to find the minimum disturbance that fool deep networks, such as Deepfool (Moosavi-Dezfooli et al. 2016). Black-box evasion attacks are mainly based on transferability, gradient estimation and decision-making (Ji et al. 2021).
• In natural language processing (NLP) Evasion attacks in CV domain have made significant breakthroughs in attack methods. However, there are still many challenges in NLP tasks. Due to the inherent differences between image and text data, the evasion attacks for the CV tasks cannot be directly applied to the NLP tasks. First, image data (such as pixel value) is continuous, but text data is discrete, so that it is a challenge to disturb along the gradient direction. Second, a tiny change in the pixel values can cause image data disturbance, and this disturbance is challenging to be detected by human beings. However, minor disturbances can be easily detected for text data.
The adversarial examples for text data can be char, word and sentence levels (Zeng et al. 2020). There are three representative methods of generating adversarial examples in text classification: genetic attack (Ren et al. 2020), HoTFLip (Ebrahimi et al. 2018) and MHA .

• Empirical defense
Many researchers suggest that image preprocessing and feature transformation can defend against evasion attacks. However, these methods are almost ineffective in the scenario where the adversary knows the defense methods (Ji et al. 2021). Security-byobscurity mechanism improves the model security by hiding information, mainly including model fusion, gradient mask and randomization (Ji et al. 2021).
The main methods affecting decision boundary are adversarial training (Madry et al. 2018). In order to improve the robustness of the model, the defender generates the adversary examples and mixes them with the original samples to train the model. However, in CV, adversarial training tends to overfit the model to the specific constraint region, which leads to the degradation of generalization performance of the model.

• Certified defense
Certified defense (Lécuyer et al. 2019; has been studied in recent years, and it is provably robust to certain kinds of adversarial perturbations. Cohen et al. (2019) prove a tight robustness guarantee in l 2 norm for smoothing with Gaussian noise. Strong empirical results suggest that randomized smoothing is a promising direction for future research into robust adversarial classification.

Privacy inference attacks
Privacy inference attacks also happened in predicting phase. These attacks include model inversion, membership inference, and model extraction.

Threat model
In the model predicting phase, the adversaries may have no knowledge of the parameters of the model, and only have access to query the model. In particular, different assumptions about adversary's knowledge, such as with or without auxiliary data, and knowing the confidence vector or label-only, make the attack and defense methods difficult to be generally applicable.

• Model inversion
Model inversion attacks mainly use some APIs provided by a machine learning system to obtain the preliminary information of the model. With this preliminary information, the adversaries can analyze the model to obtain some relevant information about the original data (Jayaraman and Evans 2019). We argue that model inversion attacks are categorized as attribute inference attacks and property inference attacks.
Attribute inference attacks (Fredrikson et al. 2014;Yeom et al. 2018) aim to learn hidden sensitive attributes of a sample. The prediction results of machine learning models often contain a lot of reasoning information about the sample. Fredrikson et al. (2014) proposed that the input information contained in the confidence output can be used as a measure of the input inversion attacks. Property inference attacks (Song and Shmatikov 2020) try to infer whether the training dataset has a specific property. We argue that the difference between attribute and property inference attacks is that attribute inference attacks obtain the features involved in the main task, while the property inference attacks obtain the features independent of the main task.

• Membership inference
Membership inference attacks aim to test whether a specific point is part of the training dataset. Shokri et al. (2017) first proposed this attack catting it as a supervised learning problem. Specifically, the adversary trains multiple shadow models to mimic the behavior of the target model, and trains an attack model from data derived from the shadow models' outputs. Salem et al. (2019) pointed that the above method has many assumptions on the adversary, such as the use of several shadow models, knowledge of the target model structure, and a dataset from the same distribution as the target model's training dataset. They relax these assumptions and study three different types of attacks. Choquette-Choo et al.
(2021) and Li and Zhang (2021) focus on how to implement the attack in the case of label-only. These methods based on an intuition that it is more difficult to perturb the member inputs to mislead the target model than to perturb the non-member inputs. The fundamental reason for the success of the membership inference attacks is the overfitting of the target model. Yeom et al. (2018) Tramèr et al. 2016). Hyperparameter extraction attacks try to recover the underlying hyperparameters, such as regularization coefficient (Wang and Gong 2018). Grosso et al. (2021) analysed fundamental bounds on information leakage, which can help us to construct privacy-preserving ML models.  concluded that the following types of data privacy-preserving measures could be adopted: model structure defense (e.g. reducing the sensitivity of the model to training samples and overfitting of the model), information obfuscation defense (e.g. confusing the output of the model), and query control defense (e.g. controlling query times). The reasons of successful attacks are very important for studying defense methods. Facts have proved that the existing defense methods still have some defects. For example, overfitting is the main reason why membership inference attacks can succeed, and the data enhancement mechanism can effectively prevent overfitting. However, Kaya and Dumitras (2021) evaluated the implementation of two membership inference attacks on seven data enhancement mechanisms and differential privacy. They found that "applying augmentation does not limit the risk", so that we should to study more robust defense methods. In particular, differential privacy is used to protect data privacy (Papernot et al. 2018). At training, random noise may add to the data, objective function, gradients, parameters, or output. At Inferring, due to the noise added in the training process, the model's generalization performance will be reduced, so that there is a trade-off between privacy and utility. In order to achieve the utility-loss guarantees, Jia et al. (2019) added crafted noise to each confidence score vector to turn it into an adversarial example against black-box membership inference attacks. This method can mislead the adversary's attack model, and it belongs to information obfuscation defense.

• Security and privacy threats on VFL and FTL
Most previous work has focused on security and privacy threats in HFL, while work on security and privacy threats in VFL/FTL is limited. In VFL, usually only one local worker has the label of training data. Hence, whether the threats in HFL still exit in VFL/ FTL and whether there are new threats in VFL/FTL deserve further study . Some attacks against VFL have been proposed. For example, Luo et al. (2020) proposed a feature inference attack against VFL in the predicting phase. Weng et al. (2020) implemented two practical attacks against VFL based on logistic regression and XGBoost.
• Limitations of attack scenarios For property and membership inference attacks in the training phase, if the adversaries are local workers, they can only obtain the sum of information from other local workers. Therefore, they can only infer that there is a specific sample or property in the overall dataset of other local workers. How to confirm the specific information belonging to which honest local worker is an open problem. For data reconstruction attacks, the existing work assumed that adversaries are located in the central server. They can collect the parameters or gradients about all local workers and launch a white-box data reconstruction attack. However, these attacks can only recover a single sample or a batch of samples when iteration = 1 , where iteration means stochastic gradient update steps per epoch. How to implement data reconstruction attacks under epoch > 1 and iteration > 1 is a big challenge.
For evasion attacks and poisoning attacks, the key to the success depends on finding or generating the appropriate samples as triggers. For the discrete datasets, further work on evasion and poisoning attacks is needed . Except for the most obvious difference, namely that evasion attacks occur in the predicting phase and poisoning attacks occur in the training phase, it is valuable to analyze the connections and differences between them in theory (Pang et al. 2020;Suciu et al. 2018;Demontis et al. 2019).

• Weakness of the defense strategies
Recent evidence suggests that the defense methods of FL have some shortcomings. For example, robust aggregation algorithms can be circumvented by poisoning attacks; DP affects the usability of the model; SMC and HE can cause model inefficiency to some extent (Kanagavelu et al. 2020). With the continuous improvement of attack methods, targeted defense strategies need to be put forward as soon as possible to ensure the security and privacy of FL. Besides, previous work emphasized that detecting whether the local workers are trusted. The local workers should confirm whether the central server is trusted (Guowen et al. 2020;Guo et al. 2021) in the training phase. Previous work also established that adversaries can extract memorized information from the model . Therefore, how to make the trained model remember less information about data is also a research direction ).

• Building a trustworthy FL
There are many threats against FL in every phase from data and behavior auditing, model training to predicting. In particular, the data and behavior auditing for FL should be paid more attention, as it is the first line of defense for FL security and privacy. In addition, more trustworthiness measurement and assessment methods can be investigated to evaluate the trustworthiness of local staff and central servers before the model training phase. In the model training phase, the centralized FL needs to employ privacy-preserving and security technologies, and advances machine learning algorithms. Warnat-Herrestha et al. (2021) construct a decentralized collaborative learning platform based on blockchain. This platform fully considers the trusted access of institutions, and employs Trusted Execution Environment (TEE), DP and HE to protect private information. This platform can provide experience for centralized FL. Building a FL systems on Blockchain may be more reliable due to its nature of immutability and decentralization.

Conclusion
Federated Learning (FL) has recently emerged as a solution to the issues of data silos. However, FL itself is still riddled with attack surfaces that arouse the risk of data privacy and model robustness. In this work, we identify the issues and provide the taxonomy of FL based on the multi-phases it works with, including data and behavior auditing phase, training phase and predicting phase. Finally, we present the perspectives of FL. Our work indicate that FL is promising in privacy enhancement technology. However, building a trusted FL system is confronted with security and privacy issues inherited by its distributed nature. One should consider the threats existing in all the phases on which the execution of FL follows, including the data and behavior auditing phase, training phase and predicting phase.