In this section, we further investigate the resistance of different S-Boxes against profiled side-channel attacks and check whether the three metrics are applicable to profiled attacks scenario.

Profiled side-channel attacks consist of two phases: the offline profiling phase and the online attack phase. The attacker is assumed to have an open copy of the target device to learn the leakage distribution and to perform attacks with the learned models. In profiling phase, the attacker has a device with knowledge about the secret key implemented and acquires a set of *N* side-channel traces \({\mathcal{L}}_{\text{profiling}}=\left\{ \varvec{{\widetilde{l}}}_{j} \mid j=1,2, \ldots , N\right\}\). Each trace \(\varvec{{\widetilde{l}}}_{j}\) is corresponding to sensitive variable \(y_{j}=f\left( p_{j}, k\right)\) in one encryption (or decryption) with known key \(k \in {\mathcal{K}}\) and plaintext (or ciphertext) \(p_{j}\). Once the acquisition is done, the attacker builds suitable models and computes the estimation of probability:

$$\begin{aligned} {\text{Pr}}[\varvec{L} \mid Y=y] \,, \end{aligned}$$

(5)

from a profiling set \(\left\{ (\varvec{{\widetilde{l}}}_{j}, y_{j})\right\} _{j=1}^{N}\). Then in attack phase, the attacker attempts to recover the unknown key in the target device with the help of profiled leakage details.

Specifically, we launch template attacks and deep learning based profiled attacks by simulated and practical experiments.

### Template attacks on the nine \(4 \times 4\) S-Boxes

Among profiled attacks, template attack (TA) (Chari et al. 2002) and its modified version efficient template attack (ETA) (Choudary and Kuhn 2013) are the most popular and widely used approaches. In TA, the attacker assumes that \(\varvec{L} \mid Y\) has a multivariate Gaussian distribution, and estimates the mean vector \(\varvec{\mu }_{y}\) and \(\varvec{\Sigma }_{y}\) for each \(y \in {\mathcal{Y}}\) (i.e. the so-called templates). In this way, Eq. (5) is approximated by the Gaussian probability distribution function with parameters \(\varvec{\mu }_{y}\) and \(\varvec{\Sigma }_{y}\). And in ETA, the attacker replaces the covariance matrixes with one pooled covariance matrix to cope with some statistical difficulties (Choudary and Kuhn 2013). In this paper, ETA is adopted to evaluate the resistance of the S-Boxes. In the attack phase, the attacker acquires a small new set of traces \({\mathcal{L}}_{\text{ attack } }=\left\{ \varvec{l}_{j} \mid j=1,2, \ldots , Q\right\}\) with a fixed unknown key \(k^*\). With the knowledge of the established models, the estimated posterior probabilities can be calculated *via* the Bayes’ Theorem. Then the attacker can select the key that maximizes the probability following the Maximum Likelihood strategy:

$$\begin{aligned} k^{*}=\underset{k \in {\mathcal{K}}}{{\text{argmax}}}\! \prod _{j=1}^{Q} \frac{{\text{Pr}}\left[ \varvec{L}\!=\!\varvec{l}_{j}\! \mid \!Y\!=\!\!f\left( p_{j}, k\right) \right] \!\cdot \! {\text{Pr}}\left[ Y\!=\!\!f\left( p_{j}, k\right) \right] }{{\text{Pr}}\left[ \varvec{L}=\varvec{l}_{j}\right] }. \end{aligned}$$

(6)

Equation (6) stands only when acquisitions are independent, which is a practical condition in reality. Notice that the attacker can launch a high-order template attack if the leakages exist in high-order moments of sample points, such as defeating mask countermeasures.

Similar to the previous section, we also study the resistance of S-Boxes in unprotected, first- and second-order masking cases, respectively.

#### Experiments of the Unprotected S-Boxes

*Experimental setup* We perform both simulated and practical attacks to compare different S-Boxes. As for simulated experiments, the leakages are simulated in the same way as in the non-profiled scenario. In detail, we generate 3 points of interest (PoIs) corresponding to the output of S-Boxes. As for practical experiments, the experimental setup is exactly the same as that in the previous section, and we pre-select 3 PoIs with the highest Pearson correlation coefficient. We profile 16 efficient templates using 10,000 traces for each S-Box. And attacks are performed at almost no leakage noise, low leakage noise and high leakage noise levels (\(\sigma =0.1\), \(\sigma =1\) and \(\sigma =2\)), respectively. For each S-Box, we run ETA attacks 100 times with randomly selected sub-samples of attack set for evaluation and record the minimum number of traces *N* required to achieve an attack success rate of 90%.

*Experimental results* The experimental results are shown in Fig. 14 of the Appendix. We can observe that the resistance of different unprotected S-Boxes against ETA attacks is very close, even under high noise condition. We believe the main reason is that the efficient templates have a good characterization of the leakages in both simulated and practical experiments. Therefore, we further investigate the resistance of different S-Boxes in first- and second-order masking cases.

#### Experiments of the masked S-Boxes

In the profiling phase, we first profile 16 efficient templates using 10,000 traces for each share. Next, in the attack phase, we match the leakages to the profiled templates, which are denoted as \({\varvec{M}}^{{\varvec{i}}}\) and \(i \in \{0,1,\ldots ,d\}\). Then we get the probability \(P\left( Y_{j}^{i}=y_{j}^{i} \mid \right.\) \(\left. \varvec{l}_{j}^{i}, {\varvec{M}}^{{\varvec{i}}}\right)\) utilizing the efficient templates for each trace. Where \(y_{j}^{i}\) denotes the *i*-th share of the output of the S-Box corresponding to the *j*-th trace, and \(\varvec{l}_{j}^{i}\) denotes the leakage for the *i*-th share of the *j*-th trace. The probability of \(y_{j}\) can be expressed as:

$$\begin{aligned} P\left( y_{j} \mid \varvec{l}_{j}, {\varvec{M}}\right) =\sum _{{\mathcal{S}}} \prod _{j=0}^{d} P\left( y_{j}^{i} \mid \varvec{l}_{j}^{i}, {\varvec{M}}^{{\varvec{i}}}\right) , \end{aligned}$$

where \({\mathcal{S}}\) is the set \(\left\{ \left( y_{j}^{0}, \ldots , y_{j}^{d}\right) \mid y_{j}=y_{j}^{0} \oplus \cdots \oplus y_{j}^{d}\right\}\), and \(\varvec{l}_{j}\) denotes the leakages of all shares of the *j*-th trace. With the information of the inverse mapping and the plaintext, \(P(y_{j})\) can be mapped to \(P_{j}(k)\). Add up the \(P_{j}(k)\) of all the attack traces, and the key hypothesis corresponding to the maximum value of *P*(*k*) is the revealed key.

*Experimental setup* As for simulated experiments, we generate 3 PoIs corresponding to each share of the output of S-Boxes. As for practical experiments, we also pre-select 3 PoIs for each share to construct templates and perform attacks. The remaining experimental settings are the same as those in the previous experiments.

*Experimental results* The attack results of first- and second-order masking cases with different noise levels are shown in Figs. 4 and 5, respectively. As for the second-order masking case, the increase of noise will seriously affect the stability of the attack results and the accuracy of the evaluation, so we only show the experimental results when \(\sigma =0.1\) and \(\sigma =1\). It can be observed that in both first- and second-order masking implementations, when the noise level is very low, the resistance of different S-Boxes against ETA attacks is still very close to each other. So we think that in very low-noise scenarios, it doesn’t seem necessary to consider how to select optimal \(4 \times 4\) S-Boxes against ETA attacks.

With the noise increase, the difference between different S-Boxes becomes slightly more significant. However, the practical results are not consistent with the simulation results. We infer the main reasons for the inconsistent results are the leakages in the real environment do not fully satisfy the HW leakage model and the noise does not fulfill the Gaussian noise assumption. And with the noise increase, the accuracy of the constructed templates is seriously affected. In addition, neither the simulated results nor the practical results are consistent with the results of all the three metrics. We argue that this is because the characterization of the noise, rather than the intrinsic properties of S-Boxes, is the dominant factor affecting the effectiveness of the attacks. Therefore, these metrics may not be suitable for evaluating the resistance of S-Boxes against template attacks.

In addition, we find that the difference between S-Boxes against ETA is far less than that of S-Boxes against CPA attacks. And the experimental results of ETA are not consistent with those of CPA attacks. For example, the S-Box of Elephant is the most resistant to CPA attacks, but obviously not the most resistant to ETA attacks. And none of the 4-bit S-Boxes shows significantly more resistant than the others. We also perform attacks that target the HW of the outputs of the S-Boxes (profiling 5 efficient templates), again with no clear pattern that could be observed. The possible reason is that the intrinsic properties of the S-Boxes we analyzed are relatively close to each other. Whatever, when selecting the optimal S-Boxes, it is necessary to comprehensively consider the resistance of S-Boxes against a different type of attacks. It is not sufficient to consider only transparency orders or confusion coefficients.

### Deep learning based profiled attacks

Recently, deep learning techniques gained substantial interest in the community of side-channel analysis. Previous researches have evidenced deep learning based attacks give a very efficient alternative to the state-of-the-art profiled attacks, and even outperform the traditional profiled attacks (Maghrebi et al. 2016; Cagli et al. 2017). We explore the resistance of the nine \(4 \times 4\) S-Boxes against such attacks, and whether the three metrics are effective when measuring the resistance against deep learning based attacks. According to the work in Wouters et al. (2020), when the traces are synchronized, the Multi Layer Perceptron (MLP) models are as effective as Convolutional Neural Network (CNN) models. Since we only consider the case of the traces are aligned in this work, the attacks based on the MLP networks are performed.

In this subsection, all experiments are conducted on an Intel(R) Xeon(R) CPU E5-2667 v4 @3.20 GHz 32 core machine with two NVIDIA TITAN Xp GPUs. We use the Keras library (version 2.2.2) with the TensorFlow library (version 1.10.0) as the backend for MLP.

*MLP architecture* We refer to the recent work (Wouters et al. 2020) and then design our MLP models. For the unprotected and first-order masking cases, the MLP is composed of one hidden layer with 10 neurons. And for the second-order masking case, the MLP is composed of two hidden layers with 10 neurons. Each layer is activated by the ReLU function and He Uniform initialization is used to improve the weight initialization. The output layer contains 16 neurons activated by the softmax function. Cross-entropy is used as the loss function. As a remark, the network architectures used in this subsection are surely not optimal, as our goal is not to select the optimal parameters.

For the training of MLP networks, the mini-batch size is 128 and the maximum iterative epoch is 100. And the network kernel weights are recorded for the best validation loss. Once the training is done, we reconstruct the neuron network with the best recorded weights. The learning rate is initially 0.005, and a technique called One Cycle Policy (Smith 2017) is used to choose the right learning rate.

#### Experiments of the Unprotected S-Boxes

*Experimental setup* As for simulated experiments, we generate 10 sample points for each trace, of which the first three points are PoIs corresponding to the output of S-Boxes and the rest are randomly generated in [0, 4]. As for practical experiments, 10 samples that contain information on the output of S-Boxes are captured for each trace. There are 10,000 traces for profiling and 5,000 traces for the attack. In the profiling traces, 90% are used for training and 10% are used for validation. We run each attack 100 times with randomly selected sub-samples of attack sets and record the minimum number of traces required to achieve an attack success rate of 90%. Since the training of the neural network might be unstable, we repeat the experiments 10 times and take the average results.

*Experimental results* The experimental results are shown in Fig. 15 of the Appendix. Similar to the results of ETA attacks, the resistance of different unprotected S-Boxes against deep learning based attacks is still very close, even under the high noise condition. Next, we further investigate the resistance of different S-Boxes in first- and second-order masking cases.

#### Experiments of the masked S-Boxes

*Experimental setup* Both the simulated and practical traces consist of 10 sample points. As for simulated experiments, we generate 3 PoIs corresponding to each share of the output of S-Boxes, and the rest are randomly generated in [0, 4]. As for practical experiments, 10 samples that contain information on each share of the output of S-Boxes are captured. For the first-order masking case, there are 10,000 traces for profiling and 10,000 traces for the attack. And for the 2nd-order masking case, there are 30,000 traces for profiling and 20,000 traces for the attack.

*Experimental results* The results of first- and second-order masking cases are shown in Figs. 6 and 7, respectively. Similar to the results of ETA attacks, when the noise level is very low, the resistance of different S-Boxes against deep learning based attacks is still very close to each other in both first- and second-order masking cases. As the noise increases, the difference between different S-Boxes becomes more obvious. However, we still cannot find patterns in the experimental results. On the one hand, the practical results are not consistent with the simulation results. In addition to the reasons mentioned above, the instability of the network training may also contribute to this phenomenon. On the other hand, neither the simulated results nor the practical results are consistent with the results of all the three metrics. Namely, all the three metrics are not suitable for evaluating the resistance of S-Boxes against deep learning based attacks. Therefore, how to quantify the resistance of S-Boxes against deep learning based attacks still has a long way to go.