Minimizing CNOT-count in quantum circuit of the extended Shor’s algorithm for ECDLP

The elliptic curve discrete logarithm problem (ECDLP) is a popular choice for cryptosystems due to its high level of security. However, with the advent of the extended Shor’s algorithm, there is concern that ECDLP may soon be vulnerable. While the algorithm does offer hope in solving ECDLP, it is still uncertain whether it can pose a real threat in practice. From the perspective of the quantum circuits of the algorithm, this paper analyzes the feasibility of cracking ECDLP using an ion trap quantum computer with improved quantum circuits for the extended Shor’s algorithm. We give precise quantum circuits for extended Shor’s algorithm to calculate discrete logarithms on elliptic curves over prime fields, including modular subtraction, three different modular multiplication, and modular inverse. Additionally, we incorporate and improve upon windowed arithmetic in the circuits to reduce the CNOT-counts. Whereas previous studies mostly focused on minimizing the number of qubits or the depth of the circuit, we focus on minimizing the number of CNOT gates in the circuit, which greatly affects the running time of the algorithm on an ion trap quantum computer. Specifically, we begin by presenting implementations of basic arithmetic operations with the lowest known CNOT-counts, along with improved constructions for modular inverse, point addition, and windowed arithmetic. Next, we precisely estimate that, to execute the extended Shor’s algorithm with the improved circuits to factor an n-bit integer, the CNOT-count required is 1237n3/logn+2n2+n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1237n^3/\log n+2n^2+n$$\end{document}. Finally, we analyze the running time and feasibility of the extended Shor’s algorithm on an ion trap quantum computer.

Since the elliptic curve discrete logarithms problem (ECDLP) was proposed, it has been widely used in cryptosystem because of its strong security.Although the proposal of the extended Shor's algorithm offers hope for cracking ECDLP, it is debatable whether the algorithm can actually pose a threat in practice.From the perspective of the quantum circuit of the algorithm, we analyze the feasibility of cracking ECDLP with improved quantum circuits using an ion trap quantum computer.
We give precise quantum circuits for extended Shor's algorithm to calculate discrete logarithms on elliptic curves over prime fields, including modulus subtraction, three different modulus multiplication, modulus inverse, and windowed arithmetic.Whereas previous studies mostly focused on minimizing the number of qubits or the depth of the circuit, we minimize the number of CNOTs, which greatly affects the time to run the algorithm on an ion trap quantum computer.First, we give the implementation of the basic arithmetic with the lowest known number of CNOTs and the construction of an improved modular inverse, point addition, and the windowing technique.Then, we precisely estimate the number of improved quantum circuits needed to perform the extended Shor's algorithm for factoring an n-bit integer, which is 1237n 3 / log n + 2n 2 + n.We analyze the running time and feasibility of the extended Shor's algorithm on an ion trap quantum computer according to

Introduction
Elliptic curve cryptography (ECC) has attracted wide attention for its unique advantages since it was introduced in 1980s [1,2].The safety of ECC is based on the elliptic curve discrete logarithm problem(ECDLP), which is the discrete logarithm problem (DLP) on the cyclic subgroup with a point on the elliptic curve as the generator and more complex than DLP.Although there are many attempts to solve DLP, the best known classical algorithm for DLP is still exponentially complex [3].Fortunately, with the development of quantum computing, the emergence of quantum algorithms offers hope for solving such problems.The most representative and compelling quantum algorithm is Shor's algorithm [4,5], which can theoretically solve DLP over multiplicative group for the prime fields in polynomial time [4,5].This algorithm can be extended to elliptic curve groups (we call it extended Shor's algorithm in this paper), which makes ECDLP theoretically not difficult to a quantum computer, thus posing a threat to the cryptography system based on ECDLP.However, the gates number of a quantum algorithm's circuit determines the time to run the quantum algorithm on a quantum computer and the exact quantum gates number of the extended Shor's algorithm has not been analyzed.Therefore, it is debatable whether the extended Shor's algorithm can pose a threat to ECC, which is exactly what we are trying to do.
Quantum computers implement quantum computation by taking as input superposition quantum states representing all the different possible inputs and simultaneously evolving them into the corresponding outputs using a sequence of unitary transformations [6][7][8][9][10][11][12][13][14].Quantum computing can be described as a quantum circuit in which the unitary transformations are represented by quantum gates.The most basic quantum gates are control-NOT (i.e., CNOT) and single qubit gates.In an ion trap quantum computer, the operation time of the non-adjacent CNOT is much higher than that of other single quantum gates, and the CNOT can only operate serially [15].Therefore, the number of CNOT contained in the quantum circuit of a quantum algorithm largely determines the running time of the quantum algorithm.
Since the advent of the first quantum algorithm to attack ECC in 1995 [16], the research in this field has attracted extensive attention.Refs.[17][18][19] proposed quantum algorithms that attack ECDLP defined on finite fields F p and F 2 m , respectively.Ref. [20] studied the extended Shor's algorithm to attack ECDLP on F p and improved the algorithm of modular inversion in [18].The resources needed from the Toffoli gates point of view were 448n 3 log 2 (n)+4090n 3 , but only O(n 3 ) rough results were given for the CNOT gates.Ref. [21] improved the Kaliski algorithm in the middle of [20].Fewer T gates were used in the circuit of modular inversion and the windowed arithmetic in Ref. [22] was briefly introduced to calculate ECDLP, but it is not discussed in detail.In view of the size of the quantum computer, i.e. the number of qubits, a quantum circuit for calculating the discrete logarithm problem on a binary elliptic curve is optimized in Ref. [23].
Note that the resources required by the quantum circuit in previous papers did not analyse the number of CNOT gates in detail, but with the development of ion trap quantum computer, the number of CNOT gates largely determines the algorithm running time [24].Therefore, this paper analyzes the feasibility of the quantum algorithm to attack ECDLP by studying the number of CNOT gates in the circuit and discusses the application of windowed arithmetic in detail.It is worth noting that based on the physical limitations of quantum computers, we consider whether a sufficiently large quantum computer in the future can complete the extended Shor's algorithm in a reasonable running time, so we do not focus on the number of qubits.

Our contributions
In this paper, we give precise quantum circuits for the extended Shor's algorithm to calculate discrete logarithms on elliptic curves over prime fields.More specifically, we have the following contributions.
1. We construct and improve the circuits of basic operations including modulus subtraction, three different modulus multiplication, modulus inverse, windowed arithmetic and further improve the quantum circuit of extended Shor's algorithm.
2. We combine window technique to focus on optimizing the number of CNOT gates, and further analyze the running time of extended Shor's algorithm on ion trap quantum computers according to the CNOT gates number we obtained.
3. We study whether the extended Shor's algorithm can be completed in a reasonable running time under the premise that the fault-tolerant quantum computer has enough space, further illustrating whether the Shor's algorithm can really pose a threat to cryptosystems such as ECC.

Outline
The rest of the paper is organized as follows.Preliminaries section is the introduction of ECDLP and the elliptic curves group law.Quantum circuits for algebraic problems section introduces the basic circuits to compute scalar multiplication on the elliptic curves groups required by the algorithm, including modular multiplication, modular inverse, and windowed arithmetic, etc..In Quantum circuits of point addition on elliptic curves groups section, we design a new method to calculate the point addition reversibly out-ofplace (storing the results in a new register), which is different from the way of in-place (replacing the input value by the sum) in [20] and reduces the CNOT number.The Discusion and conclusion section is a discussion of the time required to attack ECDLP.

Preliminaries
In this section, we first give a brief description of DLP, and then show the Shor's algorithm for solving DLP.Next, we elaborate on the algorithm for solving ECDLP, which we call the extended Shor's algorithm.
Shor's quantum algorithm for solving the DLP DLP.Let g be a generator of a finite cyclic group G with the known order ord(g) = k, i.e g k = 1.The DLP over G is defined as, given an element x ∈ G, determining the unique r ∈ [0, |G| − 1] such that g r = x, then r = log g x.Consider the case when G is the additive group Z N , where N is a positive integer and gcd(g, N ) = 1.Here the DLP is to find r satisfying r • g ≡ x mod N .The DLP over the Z N can be solved by finding the multiplicative inverse of g modulo N with the extended Euclidean algorithm in polynomial time (O(log 2 2 N )) [18].However, in group G = Z * p (i.e., the multiplicative group modulo p and g r ≡ x mod p), there was no classical algorithm to solve the DLP (i.e., calculate r = log g x) until Shor [4,5] in 1994 proposed a quantum algorithm that could theoretically solve this problem in polynomial time.
Shor's quantum algorithm.To be specific, the Shor's algorithm uses three quantum registers to solve the DLP, each quantum register has n qubits and satisfies p ≤ q = 2 n < 2p.The Shor's algorithm for DLP is shown as follows.
Algorithm 1 Shor's quantum algorithm for DLP Require: g, x and p, such that gcd(g, p) = 1.Ensure: integer r such that g r = x mod p. (2) 4: Perform quantum Fourier transform on the first two registers to get the state 5: Measure and obtain the probability of |c |d |z is: determine r with high probability by classical post-processing on the measured results.
Using the above algorithm, Shor proved that r can be calculated with high probability in polynomial time.Based on the Shor's algorithm to solve DLP, next we show the case of ECDLP.Extended Shor's quantum algorithm for solving the ECDLP ECDLP.Let F p be a field of characteristic p = 2, 3.An elliptic curve over F p is the set of solutions (x, y) ∈ F p × F p to the equation where A, B ∈ F p satisfy 4A 4: Perform a quantum Fourier transform on the first two registers to obtain the state 5: Measure the first two registers and determine m with high probability by classical post-processing on the measured results.
The initial state of the third register is |kP instead of |0 to satisfy the point addition rule on the elliptic curve.Whether we use |0 or |kP has no effect on the result of measuring probability.The detailed proof can be seen in the Appendix.

Elliptic curves group law
Before designing the circuits of the extended Shor's quantum algorithm, The elliptic curve group law on an affine Weierstrass curve we give the law on the group of elliptic curves.
Let P (x 1 , y 1 ) = O, Q(x 2 , y 2 ), R(x 3 , y 3 ) ∈ P , P + Q = R, the elliptic curve group law on the Eq. ( 5) can be computed as follows: where λ satisfies the following equation: Thus we have Refs. [18,20,21] described the detailed steps of how to transform coordinates from (x 1 , y 1 ) to (x 3 , y 3 ).Since the purpose of this paper is to reduce the number of CNOT gates, we improve the previous steps of the coordinate transformation in Quantum circuits of point addition on elliptic curves groups to reduce the number of CNOT gates but at a cost of increasing the number of qubits, which is not the focus of this paper.

Quantum circuits for algebraic problems
In the implementation of the extended Shor's algorithm for ECDLP, the most important is to design a quantum circuit to compute scalar multiplication on the elliptic curves groups, i.e., ((a+k)P +bQ) mod p, which includes a series of modular operations.In this section, we design the circuits of modular subtraction and direct modular multiplication operations.Meanwhile, we improve a series of basic operations as well as including modular inversion and windowed arithmetic.In the following, the black triangle symbol in the circuits indicate that the corresponding qubit register is modified and holds the result of computation.

Modular subtraction
Modular subtraction is divided into four parts: controlled and non-controlled constant modular subtraction, controlled and non-controlled quantum state modular subtraction.The difference between the constant and the quantum state is that the constant is known and can be ignored in quantum circuits.
Modular subtraction of a constant |(x − y) mod p , where y is a known constant and p is the known n-bit constant, can be constructed by the following steps: 1. Subtract y from |x to |x − y using inverse circuit of the addition.Ref. [26] presented a way to calculate the addition as shown in Fig. 1, which we denote by 1-Add y .It shows that each M AJ (i.e., compute the majority of three bits in place) and each U M A (i.e., UnMajority and Add) contain two CNOT gates and one Toffoli gate.Thus, n-qubit 1 − Add y has a total n M AJs and n U M As, that is, 4n + 1 CNOT gates and 2n Toffoli gates.At the same time, based on the standard decomposition of the Toffoli gate into the Clifford+T set, we obtain that one Toffoli gate contains six CNOT gates [7].Therefore, we conclude that the number of CNOT gates of an n-qubit 1 − Add y is 16n + 1.According to 1-Add y , we further design its controlled form in Fig. 2 with a CNOT number of 26n + 6.
2 ○: 2-Add y Vedral et al. [27] proposed another quantum circuit for calculating addition as shown in Fig. 3, where the blocks of CARRY and SU M are shown in Fig. 4 and the circuit of CARRY −1 is the inverse order of the quantum gates in CARRY .When the addend y is known, Markov et al. [28] modified the CARRY, SU M to the form shown in the last two rows of Fig. 4, that is, the y is omitted.At this point, one CARRY (or CARRY −1 ) contains on average 1 Toffoli gate and 1  2 CNOT, and one SU M has on average 1 CNOT.Therefore, an n-qubit 2-ADD y has a total n CARRY s, n SU M s, n − 1 CARRY −1 and 1 additional CNOT.Combine with six CNOT of one Toffoli, we conclude that the number of CNOT in 2 − Add y is 14n − 5.5 when y is known.The left circuit in Fig. 5 is a common controlled form of 2-Add y , while the right one proposed by Ref. [21] gives a simpler controlled form.That is, the control bits ctrl use NOT gates to control the known addend y to store in an n-qubit auxiliary register, then the addend y cannot be omitted and need to use 1-Add to sum.Finally, repeat the storage operation to restore the auxiliary bits.Since encoding a known n-qubit addend y into the circuit requires an average of 1  2 CNOTs, combined with the 1 − Add, we conclude that 2-Add requires 17n + 1 CNOTs.(ii) Now we also use two circuits, 1-Comp y and 2-Comp y , to perform comparison.Compare x and y by whether the highest bit of x − y is 0 or 1.When the highest bit is 0 then x − y > 0, otherwise x − y < 0. The difference between these two circuits is that 1-Comp y applies to the case where y is a known constant, while 2-Comp y can be used either for y known or for y unknown.Although we use 2-Comp y for all the comparisons covered in this paper, both circuits are presented for the sake of the completeness of the method.Details of the two circuits are shown below.
1 ○: 1-Comp y 1-Comp y in Fig. 6 is obtained by modifying 1-Add y so that it output only the highest bit of |x − y [28].But the premise is that the input is −y + 2 n instead of y, which means this way only works if y is a known constant instead of an unknown quantum state.When y is known, the M AJ can be simplified to Fig. 7, that is, one M AJ contains 1 Toffoli.Thus the number of CNOT in 1 − Comp y is 12n + 1.  8. We see that 2-Comp y not only applies to where y is a known constant but also applies to an unknown quantum state.The number of CNOT in the former is 12n + 1, which is the same as 1-comp y , and the latter is 16n + 1. Fig. 9 is the controlled form of 2-Comp y .The corresponding CNOT numbers are 12n + 7 and 16n + 7, respectively.The quantum state modular addition circuit M odAdd can be obtained in a similar way, which is shown in Fig. 11.Different from the constant modular addition, M odAdd contains one 1 − Add, two 2 − Comp y with the known constant y, two CNOTs, one 1 − Add −1 , and two circuits of encoding p.Thus, we conclude that the number of CNOTs in M odAdd is 61n + 6.Furthermore, using the reversible circuit M odAdd we can calculate quantum state modular subtraction.The controlled form of M odAdd −1 (•) and M odAdd are shown in Fig. 12 and Fig. 13, respectively.The corresponding CNOTs numbers are 46n + 11 and 71n + 17, respectively.

Negation
Given the value of x mod p, it is easy to calculate −x mod p algebraically.Conversely, performing this calculation using quantum circuit is difficult.In order to solve this problem, Markov et al. [28] indicated that it can be done by first flipping each of the bits x to get (2 n − 1 − x), then subtracting (2 n − 1 − p) from 2 − Add −1 to get the result.According to these two steps, Fig. 14 shows the circuit of calculating −x mod p and Fig. 15 is its controlled form.The number of CNOT is equal to 2 − Add −1 in N egM od, i.e. 14n − 5.5, while in controlled circuit is 18n + 1.

Modular shift
For constructing the circuit of modular shift, i.e., |x mod p → |2x mod p , we first show the circuits of the binary shift.The functions of the binary shift are as follows.
Left shift l − shif t: Before the method to implement shift is to use SWAP gate, that is, the second method below.However, we note that there is no need to swap two qubits with a SWAP operation if a qubit is known to be in the state of |0 .Hence, we reconstruct the modular shift circuit for an n-qubit quantum register to reduce the CNOT number, that is, the first method below.
1 ○ The first shift method shown in Fig. 16 requires 2n CNOT, and the controlled form uses one control qubit to control each CNOT, which needs 2n Toffoli gates.Thus, the controlled form needs 12n CNOT.Based on the above two method to perform modular shift, we can choose an appropriate circuit to minimize the number of CNOT gates, that is, choose the second way when a controlled mode is involved, otherwise, choose the first method.
As shown in Figure 18, we improve the modular shift, by replacing the subtraction of the constant p with a comparison of the constant p.The CNOT count of our modular doubling is 31n+15 by selecting the appropriate binary shift method.

Modular multiplication
There are three kinds of modular multiplication methods: fast modular multiplication, Montgomery modular multiplication and direct modular multiplication.The first way is to compute by repeating modular and conditional modular additions.The second way is often the most efficient choice for modular multiplication when modulus p is not close to a power of 2. The last method is to calculate it in the most direct way, that is, first do the binary multiplication and then subtract multiples of p.
Fast modular multiplication.In Ref. [18], the fast modular multiplication is used to calculate the modular multiplication and the circuit of this method is designed in detail in the section 3.3 of Ref. [20], which requires 104n2 − 86.5n − 11.5 CNOTs.Furthermore, module addition and module shift in the fast modular multiplication apply the circuits mentioned earlier in this paper.
Montgomery modular multiplication.According to the Montgomery algorithm [29], input x, y, we can get (x•y •2 −n mod p), where 2 n−1 < p < 2 n and Roetteler et al. [20] gave a specific quantum circuit.Combined with the basic arithmetic circuit improved before in this paper, appropriate circuits are selected to obtain the following Montgomery modular quantum circuit, the result in M -M ul is (x • y • 2 −n mod p), where the Add is 1-Add, Add −1 is the constant subtraction 2-Add −1 .The inverse operation M -M ul −1 is used to restore the auxiliary bits.The entire quantum circuit of Montgomery modular multiplication is a combination of M -M ul and M -M ul −1 with a CNOT number of 90n 2 + 78n − 9. Actually, to obtain the value (x • y • 2 −n mod p), we still need to set n CNOTs to encode the value into an extra n-qubit auxiliary bits before performing M -M ul −1 .Direct modular multiplication.Now we give a method to construct the circuit of modular multiplication according to its calculation.The main idea is that x • y = (kp + x • y) mod p, k = x•y p , where 1 < x, y < p, so x • y < p 2 < 2 n p, thus Then by comparing the sizes of x • y and 2 i p to obtain the target result.Since this method is constructed directly according to the calculation, we call it direct modular multiplication.More specifically, it is divided into the following four steps.
1. Calculate the value of x • y.

Calculate the value of
If the highest bit of the result is 1, then add p to the result.

Modular inversion
The most common method of modular inverse is the extended Euclidean algorithm (EEA).Proos et al. [18] described the idea of using EEA to calculate modular inverse and it required O(n) times of division in total and each step was performed O(n 2 ) times.However, implementing the EEA in a quantum circuit is very complicated, then we consider to use Montgomery inversion algorithm described in detail in Ref. [20] and they repeated the Montgomery-Kaliski round function 2n times to get x −1 R mod p. Subsequently, Haner et al. [21] improved this algorithm and its circuit used fewer CNOTs, but used the same modular inversion circuit.In this paper, we choose the improved algorithm in Ref. [21] as the round function and redesign a simpler circuit to calculate modular inversion.
For inputs x, p, n, p > x > 0, 2 n−1 < x < 2n, the Montgomery-Kaliski algorithm consists of two steps.First, calculate gcd(x, p) and x −1 • 2 k mod p.Second, calculate x −1 • 2 n mod p.When the input quantum state is a superposition state, the number of iterations k in the first step is related to the integer x corresponding to a certain ground state.Considering all possible ground states in the superposition state, the first step requires 2n rounds of iteration.However, before each round, it is necessary to judge whether the iteration process in the corresponding ground state has ended by determining whether v is 0, so as to determine whether this round is really iterated.Due to k > n, all ground states of the input superposition state need to go through the first n rounds of iteration and only need to judge whether v is 0 before the iteration of the last n rounds.In the second step, the intermediate result x −1 • 2 k mod p is shifted to the right by k − n bits.In the last n-round iteration of the first step, the results of the subsequent ShiftMod of the second step is stored in the auxiliary qubit and x −1 • 2 n mod p is obtained.
Combining the round function circuit in Figure 6 (b) of Ref. [21] with the above algorithm steps, the quantum circuit of the modular inversion Inv in Fig. 21 is obtained.The quantum circuit for restoring the auxiliary bits is Inv −1 , i.e. the inverse operation of Inv and the complete quantum circuit is a combination of Inv and Inv −1 .According to Inv, we can obtain the whole quantum circuit of modular inversion needs 578n 2 + 283n − 13 CNOTs.

Windowed arithmetic
In this section, we use the window form described in Ref. [22] to design the quantum circuit to attack ECDLP, reducing the CNOT number N from O(n 3 ) to O(n 2 ) < N < O(n 3 ).
The general method to calculate aP by quantum circuit is to expand a binary and control the operation of P by using each bit of a respectively, i.e., aP =(2 n−1 a n−1 + 2 n−2 a n−2 + ... + 2a 1 + a 0 )P =2 n−1 a n−1 P + 2 n−2 a n−2 P + ... + 2a 1 P + a 0 P, the circuit is shown in Fig. 22.The circuit designed by this method is the point addition operation of n-controlled.In Ref. [21], it is pointed out that m a i can be selected first and the 2 m cases a P represented by m a i can be calculated and stored in an n-qubit register, where a = m j=1 2 i j a i j .Then the a P is used to perform the point addition operation on the group of elliptic curves.The left circuit in Fig. 23 shows the situation when m = 2, where T i represents four cases of a respectively.Only the abscissa of point P (x, y) is shown in the figure and n 2 CNOTs are required on average.Therefore, it is estimated that a total of 8 Toffoli and 4n CNOTs are needed for the calculation point P (x, y), i.e., (4n + 48) CNOTs.

Quantum circuits of point addition on elliptic curves groups
Before we describe the construction of basic arithmetic used in point addition on the elliptic curves groups.In this section, we design a new algorithm to calculate point addition reversibly out place (storing the results in a new register) which reduces the number of CNOT gate of modular inversion and modular multiplication compared to the in place(replacing the input value by the sum) given by Ref. [20], while using O(n 2 ) qubits.Based on new approach for point addition, this section gives the schematic circuit of overall extended Shor's algorithm for ECDLP and then applies windowed arithmetic [22] to obtain the windowed scalar multiplication of the given point on elliptic curves.

Controlled form of point addition
Controlled form of point addition on elliptic curve, this algorithm operates on a quantum register holding the point P 1 = (x 1 , y 1 ) = ∅, a control bit ctrl and ten auxiliary bits c i .The second point P 2 = (x 2 , y 2 ) = ∅, P 2 = ±P 1 is assumed to be precomputed classical constant.If ctrl = 1, the algorithm correctly calculates c 9 ← x 1 + x 2 , c 10 ← y 1 + y 2 ; if ctrl = 0, c 9 ← x 1 , c 10 ← y 1 .
Table 1 and Table 2 describe the process of calculating P 1 + P 2 and restoring auxiliary bits, respectively.
Table 1: The steps from (x 1 , y 1 ) to (x 3 , y 3 ) by point addition.Symbols |• 1 and |• 0 respectively represent the state when the control bit is 1 and 0. The state in the table 1 represents the change of the quantum register corresponding to each step and the unwritten states are the same as the states in the previous step.

process
the change in value 1.1 Fig. 26 and Fig. 27 show quantum circuits corresponding to Table 1 and  Table 2.The quantum registers all consist of n logical qubits, whereas |ctrl is a single logical qubits.Thus the number of CNOT is 896n 2 + 1064n + 14.After each calculation of P 1 + P 2 , the result will be used as the next input P 1 for a new calculation and then will be restored as an auxiliary bit.However, the result of the last calculation should be kept in the auxiliary register without any need to be restored.Therefore, the circuit of the last calculation is modified as shown in Fig. 28.And the number of CNOT is 886n 2 + 783.5n − 18.5.Therefore, the schematic quantum circuit of overall extended Shor's algorithm for ECDLP can be obtained by combining the Fig. 26 to Fig. 28.

Windowed form of point addition
Windowed form of point addition on elliptic curve, this algorithm operates on a quantum register holding the point P 1 = (x 1 , y 1 ) = ∅, P 2 (x 2 , y 2 ) = ∅, P 2 = ±P 1 and eight auxiliary bits.In this form, the second point P 2 is stored in the quantum register as a quantum state and cannot be precomputed as a classical constant.
Table 3 and Table 4 describe the process of calculating P 1 + P 2 by windowed arithmetic and restoring auxiliary bits, respectively.
Table 3: The steps from (x 1 , y 1 ) to (x 3 , y 3 ) using windowed arithmetic by point addition.The state in the table 1 represents the change of the quantum register corresponding to each step and the unwritten states are the same as the states in the previous step.

process
the change in value 1.1 Fig. 30 and Fig. 31 show quantum circuits corresponding to Table 3 and Table 4.The quantum registers all consist of n logical qubits.Thus the number of CNOT is 896n 2 + 1108n + 36.After each calculation of P 1 + P 2 , the result will be used as the next input P 1 for a new calculation and then will be restored as an auxiliary bit.However, the result of the last calculation should be kept in the auxiliary register without any need to be restored and the coefficients of P and Q are different in the extended Shor's quantum algorithm, it cannot be mixed in the application of using window arithmetic.Therefore, the circuit of the last calculation is modified as shown in Fig. 32 with 886n 2 + 833.5n + 9.5.Fig. 33 is the schematic quantum circuit to calculate ECDLP by the extended Shor's algorithm using windowed arithmetic, where the Lookup is situation where several controlled operations can be merged into a single operation acting on a value produced by a small QROM lookup [22] and the point addition is the circuit introduced in Fig. 30 to Fig. 33.According to the process of calculating point addition, the number of CNOT gates for the first 2n−1 point addition is 896n 2 +1064n+14 (including the circuits for recovering auxiliary bits) and the 2n-th has 886n 3 + 783.5n − 18.5 CNOTs.Therefore, the number of CNOT needed to calculate the point addition of ECDLP using controlled point addition is 1792n 3 + 2118n 2 − 252.5n − 32.5.When using the form of window, modular subtraction of a constant is changed to M odAdd −1 and the CNOT number increases from 43n−2.5 to 61n+6.At the same time, the number of CNOT in the controlled circuit increases from 46n + 11 to 71n + 17, thus the number of CNOT gates for the first n − 1 point addition is 896n 2 + 1108n + 36 and the n-th has 886n 3 + 833.5n + 9.5 CNOTs in calculating (a + k)P mod p. so the number of CNOT needed to calculate the point addition of ECDLP using the form of Window is 2 n m [(n + 13) • 2 m+1 + 896n 2 + 1108n + 4] − 20n 2 − 549n − 53.Now we analyze the whole circuit of the extended Shor's algorithm to obtain a specific CNOT number.It can be inferred from the form of the formula that only when m = O(log n), N is a polynomial function of n.We calculate ∂N (n,m) ∂m , and for each n i ∈ (128, 521), we use Matlab to approximate the zero m i of ∂N (n i ,m) ∂m to obtain a pair (n i , m i ).Because m should be an integer, we round each m i up and down to get m i and m i , respectively.Then letting N i min = min(N (n i , m i ), N (n i , m i )) for each i and fitting N with respect to n based on all the pairs (n i , N i min ), we obtain N = 1237n 3 / log n.Combine the 2n 2 + n CNOT gates used for two QF T n , then there has the total number of CNOT of the extended Shor's algorithm for ECDLP is N = 1237n 3 / log n + 2n 2 + n.Ref. [15] gives the lower limit of time for executing a CNOT gate in an ion trap quantum computer, which is about 2.85 × 10 −4 s.Combined with the number of CNOT to run the extend Shor's algorithm, the time to break 512-bit ECDLP takes at least 51 years after three levels of coding.

Discusion and conclusion
Although there have been many attempts to improve the qubit number or the circuit depth of the extended Shor's algorithm for ECDLP, their focus has not been on optimizing the number of CNOT gates, which greatly affects the time to run the algorithm in an ion trap quantum computer.In this paper, we improve the quantum circuit of basic arithmetic operations, including modulus subtraction, three different modulus multiplication, modulus inverse and windowed arithmetic.Table 5 summarizes the CNOT numbers for the basic arithmetic.We further improve the quantum citcuit of extended Shor's algorithm.Based on this work, the quantum circuit that run the extended (3) To prove that it doesn't matter whether the input is |0 or |kP , just prove that (5) = ( 6 The number of CNOT gates in n-controlled-NOT The Fig. 34 is the quantum circuit of n-controlled-NOT,

1 : 2 : 3 :
Prepare a 3(p − 1)-qubit initial state in three quantum registers |0 |0 |0 .Apply the Hadamard transform H ⊗2(p−1) to the first two quantum regis-Perform a unitary transformation U f such that U f |a |b |0 → |a |b |g a x −b mod p to transform the initial state a x −b mod p .

2 .
If the highest bit of |x − y is 1 corresponding to x − y < 0, then add p to |x − y .Otherwise, do not operate.3. Compare the result of step 2 with (p − y).Uncompute the auxiliary bit and get |(x − y) mod p .Next we give the detailed of the quantum circuit for performing addition and comparison.(i) We use two circuits, 1 − Add y and 2 − Add y , to perform addition, i.e., |x |y |0 → |(x + y) 0,...,n−1 |y |(x + y) n .The first two quantum registers both have n qubits and the third one has 1 qubit as the highest bit of the sum.The two circuits of addition are shown below. 1 ○: 1-Add y

Figure 1 :
Figure 1: The first quantum circuit of addition 1-Add y is constructed by M AJ block and U M A block.A M AJ block and a U M A block both have two CNOT gates and one Toffoli gate.

Figure 2 :
Figure 2: The controlled form of 1 − Add y

Figure 3 :
Figure 3: The second quantum circuits of addition 2-Add y , where the blocks of CARRY and SU M are shown in the first line of Fig. 4.

Figure 4 :
Figure 4: The form of CARRY, SU M and the form of CARRY and SU M when y is known.

Figure 5 :
Figure 5: The original controlled form of 2-Add y and the new controlled form of 2-Add y when y is known.

Figure 7 :
Figure 7: The form of M AJ when y is known

Figure 9 :
Figure 9: The controlled form of 2-Comp y .

Figure 10 :
Figure 10: Circuit of the constant modular subtraction M odAdd −1 (•).Since addition is the inverse operation of subtraction and y is a known constant, we can use the reversible circuit of 2-Add y to compute |x − y and denote it 2-Add −1 y .

Figure 11 :
Figure 11: Circuit of the quantum state modular addition M odAdd.Since addition is the inverse operation of subtraction, we can use the reversible circuit of 1-Add to compute |x − y and denote it 1-Add −1 .

Figure 13 :
Figure 13: The controlled form of M odAdd.

Figure 14 :
Figure 14: The circuit of negation N egM od.

Figure 15 :
Figure 15: The controlled form of N egM od.

Figure 16 :
Figure 16: First way to perform binary shift, l − shif t and r − shif t, respectively.

2 ○
The second shift method shown in Fig.17 requires 3n CNOT.Different from the first method, the controlled form of second method just needs to use one qubit to control the middle CNOT in each SWAP gate.Then the circuit requires 2n CNOTs and n Toffoli gates in total, that is, 8n CNOT.

Figure 17 :
Figure 17: Second way to perform binary shift, l − shif t and r − shif t, respectively.

3 .
Repeat step 2 until i = 0; 4. Repeat the reverse of steps 2 1 in sequence to recover the auxiliary qubits.According to the first three steps, we can obtain the following partial quantum circuit D-M ul.The circuit of step 4 to restore the auxiliary bit is denotes D-M ul −1 , i.e. the inverse operation of D-M ul, where the Add and Add p are 1-Add and 2-Add respectively in Fig. 20.Thus the whole quantum circuit of direct modular multiplication needs 114n 2 + 5n CNOTs.Similar to the Montgomery modular multiplication, to obtain the value (x • y mod p) we still need to set n CNOTs to encode the value into an extra n-qubit auxiliary bits before performing D-M ul −1 .

Figure 20 :
Figure 20: The partial quantum circuit of direct modular multiplication D-M ul : |x |y |0 → |x |y |x • y mod p .

Figure 22 :
Figure 22: The general method to calculate aP by quantum circuit.

Figure 23 :
Figure 23: Quantum circuits of windowed arithmetic at m = 2. Circuit of right one is the simplified of left one.

Figure 27 :
Figure 27: The inverse operation of controlled point addition to restore the anxiliary bits.

Figure 28 :
Figure 28: The last calculation of controlled point addition.

Figure 29 :
Figure 29: Schematic quantum circuit of overall extended Shor's algorithm for ECDLP.

Figure 31 :
Figure 31: The inverse operation of windowed point addition to restore the anxiliary bits.

Figure 32 :
Figure 32: The last full round of windowed point addition.

Figure 33 :
Figure 33: Schematic quantum circuit of overall extended Shor's algorithm for ECDLP using windowed arithmetic.

Figure 35 :
Figure 35: The equivalent form of n-controlled-NOT.
[25]7B 2 = 0, together with the point O at infinity.The set of all the points on the elliptic curve is E(F p ) = {(x, y)|y 2 = x 3 + Ax + B; A, B ∈ F p } ∪ {∞}.Then E(F p ) forms Abelian group with a point addition operation and O as the neutral element.Let P ∈ E(F p ) be a generator of P , which is a cyclic subgroup of E(F p ) of known order ord(P ) = r, i.e., rP = O.Similar to DLP, the goal of ECDLP is to find the unique integer m ∈ {1, ..., r} such that mP = Q, where r, m ∈ F p and Q is a given point in P .Hasse[25]pointed out that the number of all the points on the elliptic curve is #E(F p ) = p + 1 − t, |t| ≤ 2 √ p.Thus the order of P

Table 2 :
The steps to restore the anxiliary bits.Symbols |• 1 and |• 1 respectively represent the state when the control bit is 1 and 0. The state in the table represents the change of the quantum register corresponding to each step and the unwritten states are the same as the states in the previous step.Ctrl − CN OT c 9 , c 10 , c 1

Table 4 :
The steps to restore the anxiliary bits.The state in the table represents the change of the quantum register corresponding to each step and the unwritten states are the same as the states in the previous step.

Table 5 :
The CNOT numbers for the basic arithmetic.