Quantum key recovery attack on SIMON32/64

The quantum security of lightweight block ciphers is receiving more and more attention. However, the existing quantum attacks on lightweight block ciphers only focused on the quantum exhaustive search, while the quantum attacks combined with classical cryptanalysis methods haven’t been well studied. In this paper, we study quantum key recovery attack on SIMON32/64 using Quantum Amplitude Amplification algorithm in Q1 model. At first, we reanalyze the quantum circuit complexity of quantum exhaustive search on SIMON32/64. We estimate the Clifford gates count more accurately and reduce the T gate count. Also, the T-depth and full depth is reduced due to our minor modifications. Then, using four differentials given by Biryukov in FSE 2014 as our distinguisher, we give our quantum key recovery attack on 19-round SIMON32/64. We treat the two phases of key recovery attack as two QAA instances separately, and the first QAA instance consists of four sub-QAA instances. Then, we design the quantum circuit of these two QAA instances and estimate their corresponding quantum circuit complexity. We conclude that the quantum circuit of our quantum key recovery attack is lower than quantum exhaustive search. Our work firstly studies the quantum dedicated attack on SIMON32/64. And this is the first work to study the complexity of quantum dedicated attacks from the perspective of quantum circuit complexity, which is a more fine-grained analysis of quantum dedicated attacks’ complexity.


Introduction
The devolvement of quantum computation poses a threat to classical cryptosystems. Shor's algorithm (Shor 1994) can break the security of public-key cryptosystems based on integer factorization and discrete logarithm, which gives rise to post-quantum cryptography. As for the symmetric cryptosystems, before Simon's algorithm (Simon 1997) is applied in quantum cryptanalysis, there is only Grover's algorithm (Grover 1997) that helps get a quadratic speed-up.
Quantum cryptanalysis against block ciphers receives much attention in recent years. Following the notions for *Correspondence: yangli@iie.ac.cn 1 State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, 100093 Beijing, China 2 School of Cyber Security, University of Chinese Academy of Sciences, 100049 Beijing, China PRF security in quantum setting proposed by Zhandry et al. (Zhandry 2012), there are two security models in quantum cryptanalysis against block ciphers, called Q1 model and Q2 model by Kaplan et al. in (Kaplan et al. 2016b).
Q1 model: The adversary is only allowed to make classical online queries and do quantum offline computation.
Q2 model: The adversary is allowed to do offline quantum computation and make online quantum superposition queries. That is, the adversary could query in a superposition state to the oracle and get a superposition state as a query result.
We can observe that Q1 model is more realistic than Q2 model for the reason that it's up to the oracle whether to allow superposition access. However, it's still meaningful to study Q2 model to prepare for the future with highly developed quantum communication technology.
In fact, quantum cryptanalysis in Q2 model has been going on for a long time. In 2010, Kuwakado and Morii constructed a quantum distinguisher on 3-round Feistel structure (Kuwakado and Morii 2010) using Simon's algorithm in Q2 model. Then they also recovered the key of Even-Mansour also using Simon's algorithm in 2012 (Kuwakado and Morii 2012). At Crypto2016, Kaplan et al. extended the result in (Kuwakado and Morii 2010;2012) and applied Simon's algorithm to attack a series of encryption modes and authenticated encryption such as CBC-MAC, PMAC, OCB (Kaplan et al. 2016a). In Q2 model, Simon's algorithm can be combined with Grover's algorithm to apply in quantum cryptanalysis against block ciphers. Leander and May (2017) firstly used this idea to attack FX-construction in Q2 model. Inspired by this work, Dong et al. (2020a) gave a quantum key recovery attack on full-round GOST also in Q2 model. Besides, Bernstein-Vazarani (BV) algorithm (Bernstein and Vazirani 1997) can also be applied in quantum cryptanalysis. Li and Yang (2015) proposed two methods to execute quantum differential cryptanalysis based on BV algorithm. Then, Xie and Yang extended the result in Li and Yang (2015) and present several new methods to attack block ciphers using BV algorithm (Xie and Yang 2019).
In Q1 model, it seems as if quantum cryptanalysis becomes less powerful. The most trivial quantum attack is quantum exhaustive search that defines the general security of block ciphers in quantum setting. Grassl et al. present quantum circuits to implement an exhaustive key search on AES and estimate quantum resources in Q1 model (Grassl et al. 2016). After that, there are also some other results exploring the quantum circuit design of AES (Almazrooie et al. 2018;Jaques et al. 2020;Zou et al. 2020;Langenberg et al. 2020). Besides, there are many attempts of quantum dedicated attacks combined with classical cryptanalysis methods, e.g. differential and linear cryptanalysis (Kaplan et al. 2016b), meet-in-the-middle attacks (Hosoyamada and Sasaki 2018;Bonnetain et al. 2019), and rebound attacks (Hosoyamada and Sasaki 2020;Dong et al. 2020b).
The research of lightweight block ciphers has received much attention in a decade. Several lightweight primitives have been proposed by the researchers, to just name some, SIMON (Beaulieu et al. 2015), SPECK (Beaulieu et al. 2015), SKINNY (Beierle et al. 2016), PRESENT (Bogdanov et al. 2007). To prepare for the future with large-scale quantum computers, it's necessary to study the quantum security of lightweight block ciphers. There are several attempts to study the quantum generic attacks on some lightweight block ciphers (Anand et al. 2020c;Jang et al. 2020;Anand et al. 2020b). In this paper, we focus on the quantum security of SIMON. The family of SIMON algorithm (Beaulieu et al. 2015) is a lightweight block cipher proposed by NSA in 2013, which has outstanding hardware implementation performance. In classical setting, there have been many dedicated attacks aimed at SIMON. However, in quantum setting, the only quantum attack on SIMON is in Anand et al. (2020c) where Anand et al. present the quantum circuit of Grover's algorithm on SIMON variants and give corresponding quantum resources estimate, which is a quantum generic attack. To further explore the quantum security of SIMON, we need to study the dedicated quantum attacks of SIMON. Notably, when measuring the attack complexity, the existing quantum dedicated attacks all studied the encryption complexity, while we use the quantum circuit resources cost as a measure of complexity in our study for the first time. Attack model We consider the chosen-plaintext attack to SIMON32/64 in Q1 model, where the adversary is allowed to make classical online queries of encryption oracle and can choose random message pairs with input differential x. To achieve such a attack, the adversary needs to implement transformation: when given q pairs of classical plaintext-ciphertext pair as input. We suppose this process is efficient. Thus we can ignore the quantum resources cost of this process. Our contribution In this paper, we study the quantum key recovery attack on SIMON32/64 using Quantum amplitude Amplification(QAA) in Q1 model. Our contributions can be summarized in the following two aspects.
1 We reanalyze the quantum circuit complexity of quantum master-key search on SIMON32/64. On one hand, we give more accurate estimate result of Clifford gates count and reduced T gate count. We reduce the execution number of key expansion process, which brings down the number of NOT gates and CNOT gates. Besides, counting the Clifford gates decomposed by Toffoli gates into the total number of Clifford gates helped us give a more accurate estimate of Clifford gates count. And we reduce the number of T gates using the decomposition of multi-control NOT gates with ancilla qubits. On the other hand, we give a more thorough analysis of circuits' depth. The depth we foucs on here is the depth of such quantum circuits that only are composed of Clifford + T gates. We make some modifications to the code of implementing SIMON32/64, which reduces the Tdepth and full depth of circuits. Compared to (Anand et al. 2020c), we give a more accurate and thorough complexity analysis of QMKS's quantum circuit. 2 We present our quantum round-key recovery attack on 19-round SIMON32/64 combined with CRKR in (Biryukov et al. 2014). We treat the partial key guessing phase and exhaustive search phase as two QAA instances separately and design the corresponding quantum circuit. The first QAA instance includes four sub-QAA instances corresponding to the four processes of key recovery using four differentials. Then we estimate the comlexity of our quantum circuits. At last, we make a a simple comparison among QMKS, QRKR and CRKR. We conclude that the encryption complexity is lowest among these three attacks and the quantum circuit complexity of QRKR is lower than QMKS. That is, we give a quantum dedicated attacks on 19round SIMON32/64 that has lower complexity than quantum generic attack both in terms of encryption complexity and quantum circuit complexity. Different from the former quantum dedicated attacks that only focused on encryption complexity, our work takes the first step of studying the quantum cirucuit complexity of quantum dedicated attacks.
Outline The rest of the paper is organized as follows. In "Preliminaries" section, we introduce the notations used in this paper and the background knowledge of SIMON block cipher, QAA algorithm and quantum circuit. In "The quantum master-key exhaustive search attack on 19-round SIMON32/64" section, we reanalyze the quantum circuit complexity of quantum exhaustive search attack on SIMON32/64. In "The quantum round-key key recovery attack on 19-round SIMON32/64" section, we describe the quantum round-key key recovery attack on 19-round SIMON32/64. In "The complexity analysis" section, we compare the complexity of our attack, quantum master-key search attack and classical differential attack. In "Conclusion" section, we make a summary of this paper.

Notations
In this section, we list the notations used in this paper in Table 1.

Brief Description of SIMON
In this section, we describe SIMON briefly. SIMON is a Feistel structure lightweight block cipher. There are many SIMON variants to adapt to different computing scenarios, the differences between which lie at block size, key size, word size and round number. The block size of SIMON is 2n bits while the key size is mn bits. We could use SIMON2n/mn to denote all SIMON variants, where The input block of Round-i in SIMON32/64 T h e j-th bit of L i (the index of rightmost bit is 0) The round key of Round-i in SIMON32/64 The input difference to Round-i  Table 2.

Round function
The i-th iteration structure of SIMON2n/mn is shown in Fig. 1. We can easily see that the round function of SIMON2n/mn consists of bit-wise AND, cyclic left rotation and bit-wise The round function is defined as follows: 2. When i = m, m + 1, · · · , r − 1, z j is a constant sequence and c = 2 n − 4. The key schedule is linear. Thus we can derive the master key from any mn independent bits of subkeys. Particularly, for SIMON32/64, as long as we get the round keys of any four adjacent rounds, the master key can be easily deduced. Related works In classical setting, there already have been some attack results on SIMON. We make a simple summary of some attacks on SIMON32/64 in Table 3. However, in quantum setting, the only quantum attack on SIMON is the quantum exhaustive search in Anand et al. (2020c). To furthur explore the quantum security of SIMON block cipher, we study the quantum dedicated attack on SIMON32/64 in this paper. According to the analysis in "The complexity analysis", we also list the complexity of quantum generic attack and our quantum dedicated attack in Table 3 for comparison.

Brief Description of QAA algorithm
In this section, we describe QAA algorithm briefly. QAA algorithm is a natural generalization of Grover's algorithm that searches all solutions in an unsorted database. Compared to classical algorithm, QAA algorithm can achieve quadratic speed-up. According to (Brassard et al. 2002), QAA algorithm can be summarized in the following theorem.
Theorem 1 Let A be any quantum algorithm that uses no measurements, and let g : {0, 1} n → {0, 1} be any Boolean function. Let p be the initial success probability of If we compute G m A|0 and measure the system, the outcome is good with probability at least max(1−p, p).
The quantum circuit for QAA algorithm is displayed in Fig. 2. For simplicity, we call a search problem using QAA algorithm to settle as a QAA instance. Every iteration of a QAA instance is called QAA iteration. For a QAA instance with M solutions in N elements, we define elements that are solutions as GOOD while the elements that are not solutions as BAD. We define a function g : Based on function g, we construct an oracle U g , which is defined as The process of QAA is described as follows: 1 Apply A on the initial state |ψ = |0 , we can get |ψ = A|0 = |GOOD + |BAD . 2 Call QAA iteration m = π 4θ times. In each iteration, there are two steps. The first step is to apply U g to quantum state, after which we can get U g |ψ = −|GOOD + |BAD . The second step is to apply diffusion operator 2|s s| − I to |ψ , where |s is the equal superposition of all elements. 3 Measure the first register and obtain one of all solutions.
We can observe that compared to the original Grover's algorithm, the operator H is replaced by a random unitary operator A. We must carry out plenty of measurements to

Quantum circuit
In this section, we introduce the related knowledge of quantum circuits briefly. Quantum logic gates are the foundation of quantum circuits. A quantum circuit can be seen as a sequence of quantum logic gates. In order to measure the complexity of a quantum circuit, we should consider the number of gates, and the number of qubits and the depth. When computating the depth of a quantum circuit, we also adopt the full parrellism assumption as in Jaques et al. (2020), which means a quantum circuit can apply any number of gates simultaneously so long as these gates act on disjoint sets of qubits. The Clifford + T gate set form a set of universal quantum gates. The Clifford group is defined as the group of unitary operators that map the group of Pauli operators to itself under conjugation. The Clifford gates are then defined as elements in the Clifford group. The basic Clifford gates includes H gate, S gate and CNOT gate. However, we cannot achieve universal quantum computation only with Clifford gates. This is, non-Clifford gate should be added into the gate set. And T gate is ususlly the choice to be added in. The matrix representations of Clifford + T gate set in shown in Eq.(1). (1) According to (Amy et al. 2013), all Clifford group operations have transversal implementations and thus are relatively simple to implement while non-Clifford gates require much more sophisticated and costly techniques to implement. The surface codes, which promise higher thresholds than concatenated code schemes, also have a significantly more complicated T gate implementation than any of the Clifford group generators. As a result, it's significant to study the number of T gate in a quantum circuit in order to measure the complexity of quantum computation. Besides, Amy et al. proposed T-depth as a cost function of quantum circuits in Amy et al. (2013). We can observe that the research on reducing the T depth of quantum circuits has been paid more and more attention.
In classical computation, the Toffoli gate is a universal classical reversible logic gate, while for quantum computation it needs to be decomposed into Clifford + T gates for real implementation. According to (Nielsen and Chuang 2001), the decomposition of Toffoli gate is shown in Fig. 3.  The decomposition of Toffoli gate in (Nielsen and Chuang 2001) That is, a Toffoli gate can be decomposed into 7 T gates, 6 CNOT gates, 2 H gates and 1 S gate with T-depth 7 and full depth 13. Then, to reduce T-depth, Amy et al. proposed a decomposition scheme of Toffoli gate in Amy et al. (2013) with T-depth 3 and full depth 10, shown in Fig. 4. And Amy et al. conjectured that this T-depth is optimal for circuits without ancillas. Although T-depth could be reduced to 1 further with ancilla qubits according to the Figure 1 in Selinger (2013), the number of CNOT gates increases much. After a overall consideration of gate counts and T-depth of quantum circuits, we adpot the method in Fig. 4 to decompose Toffoli gate in this paper.
In QAA iterator G, there two multi controlled-NOT gates. For the real implementation of QAA algorithm, we need to decompose the mutli controlled-NOT gate into a series of Toffoli gates. Then we need to decompose the Toffoli gate into Clifford + T gates. According to (Nielsen and Chuang 2001), the n-fold controlled-NOT could be decomposed into 2n − 3 Toffoli gates using n − 2 ancilla qubits. We show the decomposition of n-fold controlled-NOT in Fig. 5. Here, we offer a concept, Toffoli-depth, which is similar to T-depth, meaning the number of stages in the circuit involving Toffoli gates. In our analysis, computing the Toffoli-depth is the first step to compute the Tdepth and full depth of quantum circuits. We can observe that the Toffoli-depth of Fig. 5 is 2n−3. Thus the full depth of implementing a n-fold controlled-NOT is 20n − 30 ,and the T-depth is 6n − 9. It is worth noting that the depth we're talking about refers to the depth of the quantum circuits only containing Clifford gates and T gates. This is, we need to decompose all Toffoli gates into Clifford + T gates before computing the depth of quantum circuits.

The quantum master-key exhaustive search attack on 19-round SIMON32/64
In this section, to put the comparison standard on the same scale, we reanalyze the quantum circuit complexity of QMKS using QAA algorithm based on the result in Anand et al. (2020c) where Anand et al. present Grover's search algorithm on SIMON variants and estimate the quantum resources to implement such attack.
At first, we present the quantum circuit complexity of implementing 19-round SIMON32/64. From Table 3 in Anand et al. (2020c), we can easily derive the gate count of implementing 19-round SIMON32/64. However, when computing the circuit depth, we got different results from (Anand et al. 2020c). Anand et al. implemented all SIMON variants in QISKIT(Koch et al. 2019). The circuit depth can be calculated using the Qiskit function. After running the code of implementing SIMON32/64 given by Anand et al. (2020c) in Anand et al. (2020a), we found that the Qiskit function computes the the depth of quantum circuit without decomposing Toffoli gate which leads to the incompleteness of the circuit depth calculation. In our estimate, Toffoli gates should be decomposed into Clifford + T gates before computing the circuit depth. Besides, we made some small modifications to the code of implementing SIMON32/64, which brought in reduction of full depth and T-depth. We performed one operation on all bits firstly, and then performed the next operation on all bits, instead of performing all operations on each bit one by one in our modifications. We gave our modified code in (Lau I 2021). We list the quantum circuit complexity of implementing SIMON32/64 in Table. 4. Then we reanalyze the quantum circuit complexity of QMKS's quantum circuit, shown in Fig. 6. To implement the circuit in Fig. 6, we need to implement the QAA iterator G = U s U g . The implementation of U g is in Fig. 7, in which 3 plaintext-ciphertext pairs are chosen for the uniqueness of solution. The operator U s consists of two 64-fold Hardmard gates and one 64-fold controlled-NOT gate. Here, we reanlyze the quantum circuit complexity of quantum exhaustive search on SIMON32/64 from the following three points.  Anand et al. (2020c), which overestimated the full depth and T-depth of G. We estimated that the Toffoli-depth of QAA iterator G is 96. Then we can easily get the full depth and T-depth of G, as shown in the second line of Table 4. We can observe that our estimated depth are smaller than the results in Anand et al. (2020c). This is due to the slight modification we made to the circuit implementation of SIMON32/64. In addition, we didn't ignore the depth of implementing the two multi-control NOT gates, which makes our estimate more accurate and thorough.
Through the above analysis, we present our more accurate estimate results of QAA iterator G in Table 5. To find the master key in the key space {0, 1} 64 , we need to iterate QAA iterator G = U s U g for π 4 2 32 times. From the result in Table 5, we can easily get the quantum circuit complexity of quantum exhaustive search on SIMON32/64 in Table 6. In our estimate results, the number of Clifford gates is a little higher than that in

The quantum round-key key recovery attack on 19-round SIMON32/64
In this section, we describe the quantum round-key key recovery attack on 19-round SIMON32/64 and give the corresponding quantum circuit as well as its quantum resources estimate. At first, we recall the classical key recovery attack on 19-round SIMON32/64 in Biryukov et al. (2014) where Biryukov et al. present four 13-round differentials with which they recovered the round keys from Round-16 to Round-19. Then we use the four 13-round differentials in Biryukov et al. (2014) as our distinguisher and apply QAA algorithm into the two phases of key recovery attack on 19-round SIMON32/64. At last, we compare the complexity of our key recovery attack and exhaustive search on 19-round SIMON32/64 in terms of encryption complexity and quantum resources separately.
1 Plaintexts Collecting: Similar to (Biryukov et al. 2014), we construct a set P with 2 23 plaintexts with 9 bits fixed. While different from (Biryukov et al. 2014), we just need one right pair. By varying some fixed bits of plaintexts in P and guessing 2 bits of the round key K 0 , we can identify 2 28.5 pairs which satisfy the input difference x i to Round-3 for each D i and for each guessed two bits of K 0 . In total we can get a set with 2 30.5 plaintext pairs for each D i and there must be a right pair in this set. 2 Filtering: 2 30.5 pairs of plaintexts is filtered by verifying the fixed 14 bits of the corresponding difference 18 . After filtering, the number of plaintext pairs can be reduced to 2 30.5−18 = 2 12.5 for each differential. 3 Partial key guessing: For each differential, we need to recover the following 25 key bits. The key recovery process of using four differentials is quite similiar. So we only describe the key recovery process using D 2 . We denote all the key bits in D K 2 by k 1 and denote the input ciphertext pair by ). The keys that satisfy Eq.(2) are called candidate keys. Eq.
(2) holds with probability 2 −14 , which means there are 2 25 × 2 12.5 /2 14 = 2 23.5 plaintext-key pairs that satisfy Eq.(2). In expectation, we can get 2 23.5 candidate keys for D K 2 . Then we use the other three differentials to carry out the similiar key recovery process and can get 2 23.5 candidate keys for We run an exhaustive search on 2 33 candidate keys for 39 key bits in D c denoted by k 1 and 2 25 remaining 25 key bits denoted by k 2 to get the unique and correct key that satisfies E k 1 ||k 2 (m 1 ) = c 1 ∧ E k 1 ||k 2 (m 2 ) = c 2 .

The quantum partial key guessing phase in QRKR
In this section, we give the quantum circuit of Step 3 and the corresponding quantum resources estimate. We consider Q1 model as our attack model where both Step 1 and Step 2 are classical processes. Thus to design the quantum circuit of quantum key recovery, we only need to regard Step 3 and Step 4 as two QAA instances separately. In Step 3, four differentials are used to get candidate keys for 39 key bits in D c . So the QAA instance of Step 3 is actually the combination of four sub-QAA instances corresponding to the four processes of partial key guessing using four differentials. The input of every sub-QAA instance is 2 25 partial keys and 2 12.5 plaintext pairs, while the output is a superposition state of 2 23.5 plaintext-key pairs. We need to design quantum circuit for each sub-QAA instance. Once we have the quantum circuit of one sub-QAA instance using one differential, we can easily design the other three quantum circuits for the other three sub-QAA instances because the four key recovery processes using four differentials are quite similar. Besides, after our analysis, the cost of these four quantum circuits are totally the same. Thus here we only provide the quantum circuit of key recovery process using D 2 . Our sub-QAA instance searches the key-plaintext pairs that satisfy Eq. (2). The quantum circuit of this sub-QAA instance is in Fig. 8. To achieve our attack, we need to implement two operators C 1 and C 2 when given classical tuples (m i , E(m i ), E(m i ⊕ x 2 )), i = 1, · · · , 2 12.5 . The operator C 1 is defined as C 1 |0 = 2 12.5 i=1 |m i . And the operator C 2 is defined as C 2 . We suppose the implementation of operator C 1 and C 2 is efficient so that the cost of operator C 1 and C 2 can be ignored. To implement the quantum circuit in Fig. 8, we need to implement U g and U s separately. The main cost of operator U s comes from one 57-fold controlled-NOT gate. The main cost of operator U g comes from the computation of h and one 32fold controlled-NOT gate. The operator h corresponds to the process of computing 15 from given ciphertext pairs, denoted by (E(m), E(m ⊕ x 2 )) and 25 key bits in D K 2 , denoted by k 1 .
Here, we describe the implementation of U g . At first, we define a function h as follows.
) Then we define a function g as follows based on h.
According to the above process, we provide our quantum circuit of h in Fig. 9. After a simple analysis of the circuit, we can easily get there are 232 CNOT gates and 60 Toffoli gates in the implementation of h. As for the circuit depth, the total depth of h is 99 and the T-depth of h is 24.
Having the quantum circuit of h, we could easily estimate the cost of quantum partial key guessing using differential D 2 in Table 7. Following the same process, we can easily design the quantum circuit of the other three sub-QAA instances using D 1 , D 3 , D 4 separately. And the cost of other three sub-QAA instances can also be seen in Table 7.
At last, we describe our method of generating candidate keys. Our defined sub-QAA instance of Step 3 outputs a superposition state of 2 23.5 plaintext-key pair that satisfies Eq. (2) among 2 12.5 plaintext pairs and 2 25 partial keys after π 4 √ 2 14 iterations. To get candidate keys, we measure the key register many times. The probability of measuring right partial key is 2 −23.5 . That is, we expect that we can get the right partial key after running this Fig. 8 The quantum circuit of partial key guessing using D 2 sub-QAA instance for 2 23.5 times. And in expectation, we can get 2 23.5 [ 1 − (1 − 1 2 23.5 ) 23.5 ] ≈ 2 22.8 different candidate keys for 25 key bits in D K 2 from 2 23.5 measurements. After combining the results of the other three sub-QAA instances, we can get (2 22.8 ) 4 /(2 19 ×2 20 ×2 22 ) = 2 30.2 candidate keys for 39 key bits in D c . Despite that the cost of the process is a little high, we failed to find more efficient ways to get all candidate keys. Actually, Kaplen et al. also adopted a similar method to generate all candidate keys by measuring the key register for many times in Kaplan et al. (2016b). However, in their method, they ensured that the new gotten candidate key was different from the ones gotten before by excluding the keys that had been gotten in the QAA oracle. To implement their method using quantum circuit, a sequence of multi controlled-NOT gates need to be added in QAA oracle. That is, for every run, we need to design a new quantum circuit, which would greatly increase the quantum resources. Besides, the number of iteration increases with the increase of the number of elements needed to be excluded, which makes their encryption complexity also high. In our method, despite that we need to measure many times, we do not need to design a new quantum circuit in each run, which saves quantum resources.

Remark 1
We consider a practical model, Q1 model. In Fig. 8, the operator C 1 achieves the process of preparing a superposition of 2 12.5 classical plaintexts m i , i = 1, 2, · · · , 2 12.5 . And the operator C 2 achieves the process of preparing a superposition of 2 12.5 classical tuples (m i , E(m i ), E(m i ⊕ x 2 )). Actually, it's not known whether there exists such operators that could achieve such transformation, the difficulty of which is equal to preparing the superposition of random states. The choice of classical tuples may influence the efficiency of operator C 1 and C 2 . If there are structures in the classical tuples, it may be efficient to get the target superposition state.

The quantum exhaustive key search phase in QRKR
In this section, we give the quantum circuit of Step 4 and estimate its quantum resources.
We define another QAA instance in search space of 2 30.2 candidate keys for 39 key bits in D c denoted by k 1 and 2 25 remaining 25 key bits denoted by k 2 . According to (Jaques Fig. 9 The quantum circuit of function h. The input register y is equal to E k1 (m) ⊕ E k1 (m ⊕ x 2 ). We use red dotted lines to split the calculation process of 19 , 18 , 17 , 16 , 15  Step 4 is in Fig. 10. The C operator is a creation operator, which creates the superposition state of 2 30.2 candidate keys for 39 key bits in D c from the all-zero state, which is defined as C|0 = i=2 30.2 i=1 |(k 1 ) i . As previously assumed, we also assume that this process is efficient so that the cost of operator C could be ignored. Then, we need to implement the quantum circuit of U g and U s separately. The main cost of U s is one 64-fold controlled-NOT gate. The main cost of U g is four SIMON instances, and the circuit of U g is shown in Fig. 11.
At first, we define a function h as follows, which corresponds to the encryption process of m 1 , m 2 with given k 1 ||k 2 .
Then based on h, we define a function g as follows: Naturally, the operator U g is defined as follows: We need to iterate the QAA operator G = U s U g for π 4 √ 2 55.2 times. We can easily deduce the cost of Step 4 in Table 8.

The complexity analysis
Our research is related to three attacks, QMKS, QRKR, CRKR. In this section, we compare the complexity of these three attacks. On one hand, we compare the encryption complexity and data complexity of QMKS, QRKR and CRKR. On the other hand, we compare the quantum circuit complexity of QMKS and QRKR.

Encryption complexity and data complexity comparison
In this section, we compare the complexity of QMKS, QRKR and CRKR in terms of encryption complexity and data complexity.
In QMKS, to recover the master key, we need to carry out π 4 2 32 × 19 32 × 6 ≈ 2 33.5 encryptions, where 6 represents six SIMON instances in one QAA iteration. In our QRKR, 4×2 23.5 × π 4 2 √ 14 × 4 19 ×2+ π 4 2 √ 55.2 ×4 ≈ 2 31.3 encryptions are needed. In the first term, 4 represents four sub-QAA instances using four differentials, and 4 19 represents the complexity of 4-round decryption. In the second term, 4 represents four SIMON instances. On the whole, the encryption complexity of QRKR is slightly lower than QMKS. Besides, the encryption complexity of CRKR is 2 34 from Table 3. That is, the encryption complexity of QRKR is also lower than CRKR. We can observe that the main encryption complexity comes from Step 3, generating candidate keys. As a result, if the complexity of Step 3 could be reduced further, QRKR could achieve much lower encryption complexity. Fig. 10 The quantum circuit of exhaustive search in QRKR Fig. 11 The quantum circuit of U g in Fig. 10 Although the data complexity isn't our focus, we still offer the comparison here for completeness. In QMKS, 3 plaintexts are enough for the uniqueness of solution. In CRKR, the data complexity is 2 31 to get 4 right pairs in expectation. However, in our QRKR, we only need to get one right pair in expectation. So the data complexity of QRKR is 2 30 . That is, the data complexity of QRKR is lower than CRKR.

Quantum circuit complexity comparison
In this section, we compare the complexity of QMKS and QRKR in terms of quantum circuit complexity.
We need to run four sub-QAA instances in Step 3. So multiplying the gate count in Table 7 by 4, we can get the quantum resources of Step 3 in the second line of Table 9. And the cost of Step 4 is listed in the third line of Table 9. From Table 9, we can observe that the cost of Step 3 in QRKR is far lower than that of Step 4 so that it can be omitted. The main cost of QRKR comes from Step 4 and it is lower than that of QMKS. Thus we have that the quantum circuit complexity of QRKR is lower than that of QMKS.
In summary, we gain a quantum dedicated attack that has lower encryption complexity and quantum circuit complexity than quantum generic attack on SIMON32/64. Besides, both the encryption complexity and data complexity of our attack are lower than the classical keyrecovery attack in (Biryukov et al. 2014). However, we find it's not a big complexity gap between our attack and exhaustive search in quantum setting due to the big complexity of generating candidate keys.

Conclusion
In this paper, we studied the quantum key recovery attack on SIMON32/64 using QAA algorithm in Q1 model. We reanalyzed the quantum circuit complexity of quantum exhaustive search on SIMON32/64 and firstly offered a quantum dedicated attacks on SIMON32/64. And our work studied quantum dedicated attacks from the perspective of quantum circuit complexity for the first time, which can provide a research basis for performing real attacks on quantum computers in the future. On one hand, we gave more accurate estimate results of the quantum circuit complexity of quantum exhaustive search on SIMON32/64 than the results in (Anand et al. 2020c). We considered the number of Clifford gates more comprehensively and reduced the number of T gates. And we reduced the T-depth and full depth via small modifications. On the other hand, using the four differentials in (Biryukov et al. 2014) as our differential distinguisher, we gave our quantum key recovery attack on 19-round SIMON32/64. We treated the two phases of key recovery attack as two QAA instances separately and gave their corresponding quantum circuits, as well as quantum circuit complexity analysis separately. And the first QAA instance is composed of four sub-QAA instances corresponding to four differentials. At last, we compare the complexity of our quantum key recovery attack, quantum exhaustive search  attack and classical key recovery attack. We found our attack has lowest encryption complexity and the quantum circuit complexity of our attack is lower than quantum exhaustive search attck. However, we used the method of measuring many times to generate all the candidate keys and failed to find a better way to generate candidate keys, which is the bottleneck of reducing complexity. In the following work, we may try to combine other key recovery techniques with our quantum dedicated attack, such as the dynamic key-guessing techniques proposed by Wang et al. . Besides, more efforts should be made to study how to reduce the complexity of generating candidate keys. Further, we could investigate the physical feasibility of our attack by considering the decoherence time of quantum computers and the time of CNOT operation because the two-qubit operation takes a longer time than single-qubit operations.