Transparency order versus confusion coefficient: a case study of NIST lightweight cryptography S-Boxes

Side-channel resistance is nowadays widely accepted as a crucial factor in deciding the security assurance level of cryptographic implementations. In most cases, non-linear components (e.g. S-Boxes) of cryptographic algorithms will be chosen as primary targets of side-channel attacks (SCAs). In order to measure side-channel resistance of S-Boxes, three theoretical metrics are proposed and they are reVisited transparency order (VTO), confusion coefficients variance (CCV), and minimum confusion coefficient (MCC), respectively. However, the practical effectiveness of these metrics remains still unclear. Taking the 4-bit and 8-bit S-Boxes used in NIST Lightweight Cryptography candidates as concrete examples, this paper takes a comprehensive study of the applicability of these metrics. First of all, we empirically investigate the relations among three metrics for targeted S-boxes, and find that CCV is almost linearly correlated with VTO, while MCC is inconsistent with the other two. Furthermore, in order to verify which metric is more effective in which scenarios, we perform simulated and practical experiments on nine 4-bit S-Boxes under the nonprofiled attacks and profiled attacks, respectively. The experiments show that for quantifying side-channel resistance of S-Boxes under non-profiled attacks, VTO and CCV are more reliable while MCC fails. We also obtain an interesting observation that none of these three metrics is suitable for measuring the resistance of S-Boxes against profiled SCAs. Finally, we try to verify whether these metrics can be applied to compare the resistance of S-Boxes with different sizes. Unfortunately, all of them are invalid in this scenario.


Introduction
With the emergence and explosive development of the Internet of Things, a large number of highly constrained devices are interconnected and working in concert to accomplish certain tasks (Zhu and Reddi 2017). In order to protect the security of most applications, lightweight cryptographic algorithms tailored for constrained devices have been researched for more than a decade (Heuser et al. 2020). Specifically, NIST has initiated a process to solicit, evaluate, and standardize lightweight cryptographic algorithms (NIST 2021). Subsequently, many ingenious ciphers have been proposed Zhang et al. 2019;Dobraunig and Mennink 2019).
The security evaluation of lightweight cryptographic algorithms is a topic of interest due to their wide application prospects. In particular, the resistance of cryptographic implementations against side-channel attacks (SCAs) has been recognized as a crucial factor (Heuser et al. 2020). Essentially, SCAs exploit physical leakages (e.g., power consumption (Kocher et al. 1999), electromagnetic emanations (Brier et al. 2004)) from cryptosystems to recover their underlying sensitive data. Generally speaking, SCAs can be divided into two classes: nonprofiled attacks, such as differential power analysis (DPA) (Kocher et al. 1999) and correlation power analysis (CPA) Open Access Cybersecurity *Correspondence: zhouyongbin@iie.ac.cn (Brier et al. 2004), and profiled attacks, such as template attacks (TA) (Chari et al. 2002) and deep learning (DL) based profiled attacks (Maghrebi et al. 2016;Cagli et al. 2017;Wouters et al. 2020).
When performing an efficient SCA, it is evident that non-linear components (e.g. S-Boxes) of cryptographic algorithms will be chosen as the primary targets (Carlet 2005). Therefore, for evaluating the side-channel resistance of a lightweight cipher, it is an important perspective to study how to measure the intrinsic resistance of S-Boxes against SCAs. Consequently, various metrics have been proposed, such as DPA signal-to-noise ratio (Guilley et al. 2004), transparency orders (Prouff 2005;Chakraborty et al. 2017;Li et al. 2020), confusion coefficients (Fei et al. 2012) and non-absolute indicator (Carlet et al. 2021).
Among those metrics, transparency orders and confusion coefficients are the most commonly used to compare and select optimal S-Boxes with high SCA resistance. As for the first ones, the original transparency order (TO) (Prouff 2005) and modified transparency order (MTO) (Chakraborty et al. 2017) has been widely used to select 4 × 4 S-Boxes, 6 × 6 S-Boxes, and 8 × 8 S-Boxes Kavut and Baloğlu 2016;Patranabis et al. 2019). However, it has been pointed out that both TO and MTO are flawed (Li et al. 2020). And the notion of reVisited transparency order (VTO) was further proposed in Li et al. (2020). As far as we know, VTO has been used to select 4 × 4 S-Boxes in Runlian et al. (2020) and 8 × 8 S-Boxes in Martínez-Díaz and Freyre-Echevarria (2020). As for confusion coefficients, confusion coefficient variance (CCV) and minimum confusion coefficient (MCC) were proposed by  and Guilley et al. (2015), respectively. CCV has been used to heuristically select optimal 4 × 4 and 8 × 8 S-Boxes for cryptographic algorithms (Ege et al. 2015;Freyre-Echevarría et al. 2020). While MCC has not received much attention. Furthermore, there are some studies consider both transparency orders and confusion coefficients to select optimal S-Boxes against SCAs (de la Cruz Jiménez 2018; Martínez-Díaz and Freyre-Echevarria 2020).
However, the practical effectiveness of these metrics remains still unclear. Specifically, for transparency orders, the existing research work is limited to the analysis of TO or MTO, and there is a lack of research on the recently proposed VTO. And for confusion coefficients, the effectiveness of CCV and MCC needs to be further verified. Therefore, we mainly focus on investigating the applicability and relations of VTO, CCV, and MCC in this work.
Our Contributions. In this paper, we give a comprehensive study of the applicability of three typical theoretical metrics for side-channel analysis, namely VTO, CCV and MCC. We take the 4-bit and 8-bit S-Boxes used in NIST Lightweight Cryptography candidates as concrete examples for our analysis. Firstly, we empirically investigate the relations among three metrics for targeted S-boxes. The metric values of these S-Boxes show that CCV is almost linearly correlated with VTO, while MCC is inconsistent with the other two metrics.
Next, to verify the effectiveness of these metrics, we perform simulated and practical experiments on nine 4-bit S-Boxes in the non-profiled and profiled scenarios, respectively. For the non-profiled scenario, when VTO (resp. CCV) difference value of two S-Boxes is relatively large, the S-Box with a lower VTO (resp. higher CCV) value is generally more resistant to attacks. However, when VTO and CCV values of S-Boxes turn relatively close to each other, these two metrics become inaccurate to some extent. Interestingly, the MCC fails to work in quantifying the resistance of S-Boxes against CPA attacks. For the profiled scenario, template attacks and deep learning based profiled attacks are performed, respectively. Unfortunately, none of these three metrics (VTO, CCV and MCC) is suitable for measuring the resistance of S-Boxes against profiled SCAs.
Finally, we try to verify whether these metrics can be applied to compare the resistance of S-Boxes with different sizes. Interestingly, all of them cannot be used to compare the resistance of S-Boxes with different sizes.
The rest of the paper is organized as follows. "Notations and preliminaries" section gives preliminary notions on S-Boxes and theoretical metrics evaluating the resiliency of S-Boxes against SCAs. "Evaluation of S-Boxes" section provides basic information on the S-Boxes we evaluated and the results based on the theoretical metrics. Then in Non-profiled side-channel attacks against 4 × 4 S-Boxes section, we demonstrate the simulated and practical results of non-profiled attacks on nine 4-bit S-Boxes. And the results of profiled attacks are shown in Profiled sidechannel attacks section. Furthermore, we verify whether these metrics can be applied to compare the resistance of S-Boxes with different sizes in "p04 × 4 S-Boxes versus 8 × 8 S-Boxes" section. Finally, we conclude our work in "Conclusions and future work" section.

Notations and preliminaries
In this section, we first give basic notions about the cryptographic properties of S-Boxes. Then, we introduce the notions of reVisited transparency order (VTO), confusion coefficient variance (CCV), and minimum confusion coefficient (MCC).

Boolean functions and S-Boxes
Let F n 2 be the vector space that contains all the n-bit binary vectors, where n is a positive integer. For every vector u ∈ F n 2 , we denote by H(u) the Hamming weight (HW) of u. A Boolean function on n variables can be viewed as a mapping from F n 2 to F 2 , and the mappings from the vector space F n 2 to the vector space F m 2 are called (n, m)-vectorial Boolean functions where m n . An (n, m)-function F that performs substitution in the cryptosystem is commonly referred to as the n × m S-Box. Generally, S-Boxes have to be chosen carefully to satisfy cryptographic properties like resisting linear and differential cryptanalysis.
For each (n, m)-function F, the Boolean func- . . , f m (x)) are called the coordinate functions of F. Let z ∈ F m 2 be a vector whose binary coordinates are all zero except one which is assumed to be at index j. The j-th component function of the function F is a single output Boolean function z · F , and we also denote this component function as F j . The cross-correlation spectrum between two Boolean functions f 1 , f 2 is defined as the value C f 1 ,f 2 (u) = x∈F n 2 (−1) f 1 (x)⊕f 2 (x⊕u) for every u ∈ F n 2 .

ReVisited transparency order
Following the work of Prouff on transparency order (TO) (Prouff 2005 where β i denotes the value of the i-th bit of the register initial state β , and C F i ,F j (a) denotes the cross-correlation spectrum between the component functions F i and F j . Specifically, the VTO metric assumes that target devices leak the HW value of v ⊕ β , where v denotes the data being processed, and β denotes the register initial state that is assumed to be constant. In Eq. (1), the value of VTO(F) is obtained by traversing all register initial state β ∈ F m 2 , and it represents the worst case context when implementing the S-Box. However, in practice, the strategy of the adversary depends on the target device. As a result, we set the value of β to zero (1) for each S-Box implementation in our experiments. It corresponds to our context in which the target microcontroller leaks the HW value of the manipulated value v. And the corresponding value of VTO is denoted as VTO 0 (F). Fei et al. (2012) introduced another metric called confusion coefficient. This metric measurers the probability of occurrences for which key hypotheses k i and k i result in different intermediate values v. For DPA attacks, it can be calculated through measuring the difference between the v values under the two keys by the expectation of their squared distance. That is, it can be computed as:

Confusion coefficient variance
where L denotes the leakage function, p denotes the arbitrary inputs, and E is the mean operator.
Then,  proposed to calculate the variance of all confusion coefficients with respect to each possible k i and k j under the HW leakage model. And the S-Box with higher confusion coefficient variance (CCV) value leads to a higher resistance against SCAs. Formally, for all the key pairs k i , k j , k i = k j , the value of CCV of an S-Box is calculated as follows: Guilley et al. (2015) pointed out that when the signalto-noise-ratio (SNR) of the leakage is low, the empirical success rate of DPA, CPA and the optimal distinguisher mainly depends on minimum confusion coefficient (MCC) min k =k * κ ′ (k * , k) . Where k * denotes the secret key, and k denotes a key hypothesis that is not the secret key. The lower the value of MCC, the lower the success probability to extract the secret key based on leakages associated with the S-Box. Here the κ ′ (k * , k) is calculated as follows:

Minimum confusion coefficient
(3) which is slightly different from κ(k * , k) , but it does not affect the order of the different S-Boxes. Note that the distribution of κ ′ (k * , k) is independent on the particular choice of k * and the values are only permuted. Therefore, k * can be set to 0 during the calculation. In Heuser et al. (2016) and Heuser et al. (2020), the effectiveness of using MCC to measure the resistance of different S-Boxes against CPA and the optimal distinguisher was validated through simulated experiments.

Evaluation of S-Boxes
In this section, we first show basic information on the S-Boxes we investigate. Next, the values of VTO 0 , CCV, and MCC of these S-Boxes are given.

Investigated S-Boxes
Of the 25 NIST Lightweight Cryptography second-round candidates that use S-Boxes as the nonlinear component, 18 schemes use 4-bit or 8-bit S-Boxes. Therefore, we mainly evaluate the 4-bit and 8-bit S-Boxes in this work.
More precisely, we focus on the following 11 S-Boxes.  Table 1. Note that a cipher may use several different S-Boxes (e.g., SATURNIN). In addition, the above nine S-Boxes are also used in other NIST candidate ciphers. For instance, the GIFT S-Box is also used in ESTATE (ESTATE TweGIFT-128) (Chakraborti et al. 2020

Non-profiled side-channel attacks against 4 × 4 S-Boxes
Among various non-profiled attacks, we focus on CPA due to its simplicity and efficiency. Actually, CPA is equivalent to multi-bit DPA up to a change of the attacker leakage modeling (Doget et al. 2011). Therefore, VTO 0 , CCV and MCC can all be used to measure the resistance of S-Boxes against CPA under the HW leakage model in theory. Concretely, CPA recovers the secret key by selecting the key that maximizes the Pearson correlation coefficient between the actual leakage and the estimated leakage based on the assumed secret key. That is, where ρ(X, Y ) denotes the Pearson correlation coefficient between X and Y. L k * represents the measured traces, and L k denotes the estimated leakages.

Experiments of the unprotected S-Boxes
We first perform simulated and practical attacks against the nine unprotected 4 × 4 S-Boxes and compare their CPA resistance.

Simulated experiments
Experimental setup We implement S-Boxes in the same way by using look-up tables, and leakages are simulated as where F (p ⊕ k * ) denotes the sensitive variable, and ω denotes a Gaussian random variable centered in zero with a standard deviation σ . In the experimental setup, the value of σ varies in the set 2 −1 , 2 − 1 2 , 1, 2 1 2 , 2, 2 3 2 , 4, 2 5 2 . Experimental results In the field of side-channel analysis, success rate (Standaert et al. 2005) is a common metric to evaluate an attack. Here, for each attack, we evaluate the minimum number of traces N required to achieve an attack success rate of 90% as it is a sound way to evaluate the efficiency of a side-channel attack (Mangard 2004). The attack results are shown in Fig. 1a.
It can be observed that when the noise is low, the number of traces required for successful attacks of different S-Boxes is very close. And with the noise increases, the difference between different S-Boxes becomes more significant. However, the order of S-Boxes resistance against CPA attacks is basically the same under different noise levels. So we mainly take the result with noise variance of 2 5 as an example to illustrate for easy observation.
According to our experimental results, S-Boxes with lower VTO 0 and higher CCV values are more resistant against CPA. Such as the S-Boxes of PHOTON and GIFT are more resilient than S-Boxes of SKINNY-64 and Spook. However, the difficulty of attacking an S-Box is quite different from the outcome of the MCC metric. For example, the MCC value of Elephant's S-Box is higher than that of Spook's S-Box, while the number of required traces of the former is approximately 1.5 times that for the latter.
One may also note that sometimes there exists discordance between the VTO 0 (CCV) and the simulation results. Such as the VTO 0 (CCV) value of Elephant's S-Box is higher (lower) than that of PHOTON's S-Box, while the Elephant's S-Box is more resilient than PHO-TON's S-Box. As for VTO, the reason for this phenomenon is explained in Li et al. (2020), which is due to the different perspectives of VTO and the success rate metric when quantifying the SCA resistance of S-Boxes. In detail, the basic idea of VTO is quantifying the difference between the score for the correct key and the average score for the other hypotheses; however, the success rate metric quantifies the number of successful attacks (i.e. the number of attacks in which the correct key is ranked first) in all attacks performed. As for CCV, we argue that it takes into account the distinctiveness level of the S-Box outputs for all key hypothesis pairs, which is also different from the basic idea of the success rate. Besides, the number of traces used for attacks is limited, but in the notions of VTO and CCV, it is assumed that the number of traces is sufficient so that the noise can be omitted.
Overall, when the difference of the VTO 0 (CCV) values of the two S-Boxes is relatively large, the S-Box with a lower VTO 0 (higher CCV) value is generally more resistant to CPA attacks. However, when the difference of the VTO 0 (CCV) values of the two S-Boxes is relatively small, these two metrics lack the accuracy to evaluate the resiliency of S-Boxes. Besides, MCC fails to work in our experiments.

Practical experiments
Experimental setup In practical experiments, all the nine S-Boxes are implemented on a CW308-STM32F Target Board (for ChipWhisperer CW308 UFO Board) with the STM32F405RGT6 Arm 32-bit Cortex-M4 device, and the power traces are captured through the ChipWhisperer-Lite Capture Board (O'Flynn and Chen 2014). The sampling rate is set to 29.5 MHz, and the 500 points around the sensitive operations are taken to attack. Same as the simulated experiments, the S-Boxes are implemented by using look-up tables, and the register initial state β is set to 0. In order to study the performance of the three metrics with different noise levels, the attacks are performed based on the raw traces and traces with added Gaussian noise. Before adding noise, we standardize the traces (zero mean and unit variance). And the value of σ is set to the same as in simulated experiments. Experimental results The attack results are shown in Fig. 1b. The −∞ on the x axis represents the attack is performed on the raw traces with no additional noise. It can be observed that for most S-Box examples, the results obtained are consistent with simulated results. However, one may also note that for certain cases, the results are slightly inconsistent with the simulation results. We infer the reasons for the inconsistent results are the leakages in the real environment do not fully satisfy the HW leakage model and the noise does not fulfill the Gaussian noise assumption.

Experiments of the masked S-Boxes
Masking, due to its provable security and good device independence, has been one of the widely adopted countermeasures against SCAs (Duc et al. 2019). Naturally, the effectiveness of the three metrics is an important question when masking is adopted. Based on the work in Rivain et al. (2009), the CPA results toward dth-order masked S-Boxes of the same size is only related to the masking order d under the HW model for Boolean masking schemes. And the function of S-Box does not affect the security gain from unprotected S-Boxes to dth-order masked S-Boxes. Thus, the three metrics ( VTO 0 , CCV, and MCC) should be independent of the masking order for Boolean making when higher-order CPA attacks are utilized. We also try to verify it first by simulated experiments.

Simulated experiments
Experimental setup As for simulation of masking, we separately simulate first-and second-order masked S-Boxes. So two and three points corresponding to their shares y i are simulated, and we have where y denotes the output of the S-Box while p ⊕ k * is the input. y i (i > 0) is generated randomly, and y 0 is processed such that Eq. (4) is satisfied. Each share of y is under the HW model, and the value of initial state β is set to zero. So each leakage point corresponding to y i can be simulated as: L y i = zscore H y i + ω i , where ω i denotes the Gaussian noise centered in zero with a standard deviation σ at this moment. In the first-order masking experiments, the value of σ varies in the set 2 − 3 4 , 2 − 1 2 , 2 − 1 4 , 1, 2 one can note that the results of masked S-Boxes are basically consistent with those of unprotected S-Boxes, especially in the case of low noise. The S-Box of Elephant is the most resistant against CPA attacks, and the S-Boxes of Spook and SKINNY-64 are the weakest. In addition, with the noise increase, the order of S-Boxes resistance against CPA attacks fluctuates slightly in the experimental results. We argue that this is due to the increase of noise, which makes the evaluation results unstable.

Practical experiments
Experimental setup For the first-and second-order masking cases, the masking scheme proposed in Benadjila et al. (2020) and Valiveti and Vivek (2020) are adopted and implemented as our attack targets, respectively.
Experimental results The attack results are shown in Figs. 2b and 3b. It can be observed that for most S-Box examples, we obtain similar results. Namely, those S-Boxes with lower VTO 0 (higher CCV) values still have higher CPA resistance in real environments.

Profiled side-channel attacks
In this section, we further investigate the resistance of different S-Boxes against profiled side-channel attacks and check whether the three metrics are applicable to profiled attacks scenario.
Profiled side-channel attacks consist of two phases: the offline profiling phase and the online attack phase. The attacker is assumed to have an open copy of the target device to learn the leakage distribution and to perform attacks with the learned models. In profiling phase, the attacker has a device with knowledge about the secret key implemented and acquires a set of N side-channel traces L profiling = l j | j = 1, 2, . . . , N . Each trace l j is corresponding to sensitive variable y j = f p j , k in one encryption (or decryption) with known key k ∈ K and plaintext (or ciphertext) p j . Once the acquisition is done, the attacker builds suitable models and computes the estimation of probability: from a profiling set ( l j , y j ) N j=1 . Then in attack phase, the attacker attempts to recover the unknown key in the target device with the help of profiled leakage details.
Specifically, we launch template attacks and deep learning based profiled attacks by simulated and practical experiments.

Template attacks on the nine 4 × 4 S-Boxes
Among profiled attacks, template attack (TA) (Chari et al. 2002) and its modified version efficient template attack (ETA) (Choudary and Kuhn 2013) are the most popular and widely used approaches. In TA, the attacker assumes that L | Y has a multivariate Gaussian distribution, and estimates the mean vector µ y and y for each y ∈ Y (i.e. the so-called templates). In this way, Eq. (5) is approximated by the Gaussian probability distribution function with parameters µ y and y . And in ETA, the attacker replaces the covariance matrixes with one pooled covariance matrix to cope with some statistical difficulties (Choudary and Kuhn 2013). In this paper, ETA is adopted to evaluate the resistance of the S-Boxes. In the attack phase, the attacker acquires a small new set of traces L attack = l j | j = 1, 2, . . . , Q with a fixed unknown key k * . With the knowledge of the established models, the estimated posterior probabilities can be calculated via the Bayes' Theorem. Then the attacker can select the key that maximizes the probability following the Maximum Likelihood strategy: Equation (6) stands only when acquisitions are independent, which is a practical condition in reality. Notice that the attacker can launch a high-order template attack Pr L = l j .

Fig. 2 CPA attacks on first-order masked 4 × 4 S-Boxes
if the leakages exist in high-order moments of sample points, such as defeating mask countermeasures. Similar to the previous section, we also study the resistance of S-Boxes in unprotected, first-and second-order masking cases, respectively.

Experiments of the Unprotected S-Boxes
Experimental setup We perform both simulated and practical attacks to compare different S-Boxes. As for simulated experiments, the leakages are simulated in the same way as in the non-profiled scenario. In detail, we generate 3 points of interest (PoIs) corresponding to the output of S-Boxes. As for practical experiments, the experimental setup is exactly the same as that in the previous section, and we pre-select 3 PoIs with the highest Pearson correlation coefficient. We profile 16 efficient templates using 10,000 traces for each S-Box. And attacks are performed at almost no leakage noise, low leakage noise and high leakage noise levels ( σ = 0.1 , σ = 1 and σ = 2 ), respectively. For each S-Box, we run ETA attacks 100 times with randomly selected sub-samples of attack set for evaluation and record the minimum number of traces N required to achieve an attack success rate of 90%. Experimental results The experimental results are shown in Fig. 14 of the Appendix. We can observe that the resistance of different unprotected S-Boxes against ETA attacks is very close, even under high noise condition. We believe the main reason is that the efficient templates have a good characterization of the leakages in both simulated and practical experiments. Therefore, we further investigate the resistance of different S-Boxes in first-and second-order masking cases.

Experiments of the masked S-Boxes
In the profiling phase, we first profile 16 efficient templates using 10,000 traces for each share. Next, in the attack phase, we match the leakages to the profiled templates, which are denoted as M i and i ∈ {0, 1, . . . , d} . Then we get the probability P Y i j = y i j | l i j , M i utilizing the efficient templates for each trace. Where y i j denotes the i-th share of the output of the S-Box corresponding to the j-th trace, and l i j denotes the leakage for the i-th share of the j-th trace. The probability of y j can be expressed as: where S is the set y 0 j , . . . , y d j | y j = y 0 j ⊕ · · · ⊕ y d j , and l j denotes the leakages of all shares of the j-th trace. With the information of the inverse mapping and the plaintext, P(y j ) can be mapped to P j (k) . Add up the P j (k) of all the attack traces, and the key hypothesis corresponding to the maximum value of P(k) is the revealed key.
Experimental setup As for simulated experiments, we generate 3 PoIs corresponding to each share of the output of S-Boxes. As for practical experiments, we also preselect 3 PoIs for each share to construct templates and perform attacks. The remaining experimental settings are the same as those in the previous experiments.
Experimental results The attack results of first-and second-order masking cases with different noise levels are shown in Figs. 4 and 5, respectively. As for the secondorder masking case, the increase of noise will seriously affect the stability of the attack results and the accuracy of the evaluation, so we only show the experimental results when σ = 0.1 and σ = 1 . It can be observed that in both first-and second-order masking implementations, when the noise level is very low, the resistance of different S-Boxes against ETA attacks is still very close to each other. So we think that in very low-noise scenarios, it doesn't seem necessary to consider how to select optimal 4 × 4 S-Boxes against ETA attacks.
With the noise increase, the difference between different S-Boxes becomes slightly more significant. However, the practical results are not consistent with the simulation results. We infer the main reasons for the inconsistent results are the leakages in the real environment do not fully satisfy the HW leakage model and the noise does not fulfill the Gaussian noise assumption. And with the noise increase, the accuracy of the constructed templates is seriously affected. In addition, neither the simulated results nor the practical results are consistent with the results of all the three metrics. We argue that this is because the characterization of the noise, rather than the intrinsic properties of S-Boxes, is the dominant factor affecting the effectiveness of the attacks. Therefore, these metrics may not be suitable for evaluating the resistance of S-Boxes against template attacks.
In addition, we find that the difference between S-Boxes against ETA is far less than that of S-Boxes against CPA attacks. And the experimental results of ETA are not consistent with those of CPA attacks. For example, the S-Box of Elephant is the most resistant to CPA attacks, but obviously not the most resistant to ETA attacks. And none of the 4-bit S-Boxes shows significantly more resistant than the others. We also perform attacks that target the HW of the outputs of the S-Boxes (profiling 5 efficient templates), again with no clear pattern that could be observed. The possible reason is that the intrinsic properties of the S-Boxes we analyzed are relatively close to each other. Whatever, when selecting the optimal S-Boxes, it is necessary to comprehensively consider the resistance of S-Boxes against a different type of attacks. It is not sufficient to consider only transparency orders or confusion coefficients.

Deep learning based profiled attacks
Recently, deep learning techniques gained substantial interest in the community of side-channel analysis. Previous researches have evidenced deep learning based attacks give a very efficient alternative to the state-of-theart profiled attacks, and even outperform the traditional profiled attacks (Maghrebi et al. 2016;Cagli et al. 2017). We explore the resistance of the nine 4 × 4 S-Boxes against such attacks, and whether the three metrics are effective when measuring the resistance against deep learning based attacks. According to the work in Wouters et al. (2020), when the traces are synchronized, the Multi Layer Perceptron (MLP) models are as effective as Convolutional Neural Network (CNN) models. Since we only consider the case of the traces are aligned in this work, the attacks based on the MLP networks are performed.
In this subsection, all experiments are conducted on an Intel(R) Xeon(R) CPU E5-2667 v4 @3.20 GHz 32 core machine with two NVIDIA TITAN Xp GPUs. We use the MLP architecture We refer to the recent work (Wouters et al. 2020) and then design our MLP models. For the unprotected and first-order masking cases, the MLP is composed of one hidden layer with 10 neurons. And for the second-order masking case, the MLP is composed of two hidden layers with 10 neurons. Each layer is activated by the ReLU function and He Uniform initialization is used to improve the weight initialization. The output layer contains 16 neurons activated by the softmax function. Cross-entropy is used as the loss function. As a remark, the network architectures used in this subsection are surely not optimal, as our goal is not to select the optimal parameters.
For the training of MLP networks, the mini-batch size is 128 and the maximum iterative epoch is 100. And the network kernel weights are recorded for the best validation loss. Once the training is done, we reconstruct the neuron network with the best recorded weights. The learning rate is initially 0.005, and a technique called One Cycle Policy (Smith 2017) is used to choose the right learning rate.

Experiments of the Unprotected S-Boxes
Experimental setup As for simulated experiments, we generate 10 sample points for each trace, of which the first three points are PoIs corresponding to the output of S-Boxes and the rest are randomly generated in [0,4]. As for practical experiments, 10 samples that contain information on the output of S-Boxes are captured for each trace. There are 10,000 traces for profiling and 5,000 traces for the attack. In the profiling traces, 90% are used for training and 10% are used for validation. We run each attack 100 times with randomly selected sub-samples of attack sets and record the minimum number of traces required to achieve an attack success rate of 90%. Since the training of the neural network might be unstable, we repeat the experiments 10 times and take the average results.
Experimental results The experimental results are shown in Fig. 15 of the Appendix. Similar to the results of ETA attacks, the resistance of different unprotected S-Boxes against deep learning based attacks is still very close, even under the high noise condition. Next, we further investigate the resistance of different S-Boxes in first-and second-order masking cases.

Experiments of the masked S-Boxes
Experimental setup Both the simulated and practical traces consist of 10 sample points. As for simulated experiments, we generate 3 PoIs corresponding to each share of the output of S-Boxes, and the rest are randomly generated in [0,4]. As for practical experiments, 10 samples that contain information on each share of the output of S-Boxes are captured. For the first-order masking case, there are 10,000 traces for profiling and 10,000 traces for the attack. And for the 2nd-order masking case, there are 30,000 traces for profiling and 20,000 traces for the attack.
Experimental results The results of first-and secondorder masking cases are shown in Figs. 6 and 7, respectively. Similar to the results of ETA attacks, when the noise level is very low, the resistance of different S-Boxes against deep learning based attacks is still very close to each other in both first-and second-order masking cases. As the noise increases, the difference between different S-Boxes becomes more obvious. However, we still cannot find patterns in the experimental results. On the one hand, the practical results are not consistent with the simulation results. In addition to the reasons mentioned above, the instability of the network training may also contribute to this phenomenon. On the other hand, neither the simulated results nor the practical results are consistent with the results of all the three metrics. Namely, all the three metrics are not suitable for evaluating the resistance of S-Boxes against deep learning based attacks. Therefore, how to quantify the resistance of S-Boxes against deep learning based attacks still has a long way to go.

× S-Boxes versus 8 × 8 S-Boxes
In this section, taking several 4 × 4 S-Boxes and 8 × 8 S-Boxes as examples, we verify whether VTO, CCV and MCC can be applied to compare the resistance of S-Boxes with different sizes through simulated and practical experiments.

Non-profiled side-channel attacks
From the perspective of theoretical analysis, among nine 4 × 4 S-Boxes, the S-Box of PHOTON is the hardest to attack, and the S-Box of Spook is one of the easiest to attack. In addition, according to the experimental results, the S-Box of Elephant is the most resistant against CPA attacks, and the S-Box of Spook is one of the easiest to attack. Considering the above factors, we select the S-Boxes of PHOTON, Elephant, and Spook as the representatives of the 4 × 4 S-Boxes to compare with the 8 × 8 S-Boxes of SKINNY-128 and AES.
Experimental setup We study the resistance of S-Boxes in unprotected, first-and second-order masking cases, respectively. And the simulated and practical experiments are performed with different noise levels. Due to the simulated traces and practical traces are standardized (zero mean and unit variance) before Gaussian noise Experimental results The results of simulated and practical experiments are shown in Figs. 8 and 9, respectively. In the unprotected case, we can observe that the S-Boxes of SKINNY-128 and AES perform worse than that of Elephant, similar to PHOTON, and better than Spook. Therefore, the 4 × 4 S-Boxes that are selected carefully could be even more resistant against CPA attacks than certain 8 × 8 S-Boxes. However, according to the values of theoretical metrics, the two 8 × 8 S-Boxes lead to higher values of VTO 0 and MCC than the 4 × 4 S-Boxes, which implies 8 × 8 S-Boxes are more vulnerable to attacks. And the resistance of the S-Box of SKINNY-128 should be worse than that of PHOTON and Elephant, and slightly better than that of Spook in terms of VCC. As for the S-Box of AES, it should be the easiest to attack among all the S-Boxes. The inconsistency between theoretical analysis and practical results indicates that none of the three metrics can be used to quantify and compare S-Boxes with different sizes.
As for first-and second-order masking cases, the two 8 × 8 S-Boxes perform much better than the 4 × 4 S-Boxes. The main reason is that the 8-bit masks provide much better randomization than the 4-bit masks. Of course, the larger size of S-Boxes also leads to higher implementation costs. This is a trade-off between the security and costs, which is outside the scope of this work.
In addition, for the two 8 × 8 S-Boxes we evaluated, the S-Box of SKINNY-128 always performs better than that of AES. However, the results in Heuser et al. (2016) show that the 4 × 4 S-Boxes they studied have a different side-channel resiliency, while the difference in the 8 × 8 S-Boxes is only theoretically present. We argue that a good selection of 8 × 8 S-Boxes could also result in an improvement in inherent resilience.

Profiled side-channel attacks
In this section, we compare the resistance of 4-bit and 8-bit S-Boxes against profiled side-channel attacks. The S-Boxes used are the same as above.

Template attacks
Experimental setup We study the resistance of S-Boxes in unprotected and first-order masking cases. And the simulated and practical experiments are performed with different noise levels. We profile 16 efficient templates using 10,000 traces for each 4 × 4 S-Box, and profile 256 efficient templates using 160,000 traces for each 8 × 8 S-Box. Therefore, the number of profiling traces for each class of 4 × 4 S-Boxes and 8 × 8 S-Boxes is roughly the same.
Experimental results The results of the unprotected and first-order cases are shown in Figs. 10 and 11, respectively. In the unprotected case, we can observe that the resistance of 8-bit S-Boxes and 4-bit S-Boxes are quite close. The main reason is that the efficient templates have a good characterization of the leakages. As for the first-order case, it is obvious that the two 8-bit S-Boxes are more resistant against ETA attacks than the 4-bit S-Boxes. It seems natural since 8-bit S-Boxes have a significantly larger number of classes than 4-bit S-Boxes. In addition, in practical experiments, the difference between the 4-bit and 8-bit S-Boxes is larger than that in the simulated experiments. We infer the main reasons are the leakages in the real environment do not fully satisfy the HW leakage model and the noise does not fulfill the Gaussian noise assumption. Because the traces of the 8-bit S-Box is divided into 256 classes, it requires higher precision of the constructed templates, and then the accuracy decreases faster.

Deep Learning Based Profiled Attacks
Experimental setup We study the resistance of 4 × 4 S-Boxes and 8 × 8 S-Boxes against deep learning based profiled attacks. The simulated and practical experiments are performed with different noise levels. We profile 16 efficient templates using 10,000 traces for each 4 × 4 S-Box, and profile 256 efficient Experimental results The results of the unprotected and first-order cases are shown in Figs. 12 and 13, respectively. It is obvious that, in both unprotected and first-order cases, the two 8-bit S-Boxes are more resistant against deep learning based profiled attacks than the 4-bit S-Boxes. It implies that, when the leakages cannot be characterized very accurately, S-Boxes with larger sizes are more resistant than S-Boxes with smaller sizes. Interestingly, for the first-order case, practical attacks perform even better than simulated attacks. We guess the reason is the irregular noise in practical traces alleviates the overfitting during the training of networks. This phenomenon also shows that when evaluating the resistance of S-Boxes against deep learning based side-channel attacks, it is not sufficient to perform simulated experiments alone.

Conclusions and future work
In this paper, taking the S-Boxes used in NIST Lightweight Cryptography candidates as concrete examples, we give a comprehensive study of the applicability of three popular theoretical metrics for side-channel analysis, namely VTO, CCV and MCC. Firstly, we find that CCV is almost linearly correlated with VTO, while MCC is inconsistent with the other two metrics. Next, to verify which metric is more effective in which scenarios, we perform simulated and practical experiments on nine 4-bit S-Boxes in the non-profiled and profiled scenarios, respectively. For the non-profiled attacks, when the difference of VTO (resp. CCV) values of the two S-Boxes is relatively large, the S-Box with a lower VTO (resp. higher CCV) value is generally more resistant to CPA attacks. However, when VTO and CCV values of S-Boxes become relatively close to each other, these two metrics turn less accurate. Interestingly, MCC fails to work in quantifying the resistance of S-Boxes against CPA attacks. As for the profiled scenario, we perform efficient template attacks and deep learning based profiled attacks. However, none of the three metrics is suitable for measuring the resistance of S-Boxes against profiled SCAs. Finally, we try to verify whether these metrics can be applied to compare the resistance of S-Boxes with different sizes. Unfortunately, all the three metrics fail to work when measuring and comparing S-Boxes with different sizes.
Since VTO and CCV lack the accuracy to evaluate the resistance of S-Boxes against CPA-like attacks, it is significant to further analyze the reasons for the lack of precision of the existing metrics, and then explore the theoretical metric that fits the reality better. Additionally, exploring the theoretical relationship between transparency order and confusion coefficients may be helpful to propose the new metric.