Generic, efficient and isochronous Gaussian sampling over the integers

Gaussian sampling over the integers is one of the fundamental building blocks of lattice-based cryptography. Among the extensively used trapdoor sampling algorithms, it is ineluctable until now. Under the influence of numerous side-channel attacks, it is still challenging to construct a Gaussian sampler that is generic, efficient, and resistant to timing attacks. In this paper, our contribution is three-fold. First, we propose a secure, efficient exponential Bernoulli sampling algorithm. It can be applied to Gaussian samplers based on rejection samplings. We apply it to FALCON, a candidate of round 3 of the NIST post-quantum cryptography standardization project, and reduce its signature generation time by 13–14%. Second, we develop an isochronous Gaussian sampler based on rejection sampling. Our Algorithm can securely sample from Gaussian distributions with different standard deviations and arbitrary centers. We apply it to PALISADE (S&P 2018), an open-source lattice-based cryptography library. During the online phase of trapdoor sampling, the running time of the G-lattice sampling algorithm is reduced by 44.12% while resisting timing attacks. Third, we improve the efficiency of the COSAC sampler (PQC 2020). The new COSAC sampler is 1.46x–1.63x faster than the original and has the lowest expected number of trials among all Gaussian samplers based on rejection samplings. But it needs a more efficient algorithm sampling from the normal distribution to improve its performance.


Introduction
Lattice-based cryptography has gained much attention due to its many attractive features. It allows us to build various powerful cryptographic primitives, such as fully homomorphic encryption (Gentry 2009), and has conjectured security against quantum computers (Shor 1997). Most lattice-based cryptographic schemes are based on two main average-case problems, the short integer solution (SIS) problem (Ajtai 1996), the learning with errors (LWE) problem (Regev 2005), and their analogs over rings (Micciancio 2002;Lyubashevsky et al. 2010). Discrete Gaussian distributions are at the core of security reduction proofs from the worst-case lattice problems to the average-case problems (Micciancio and Regev 2004;Brakerski et al. 2013). Therefore, they have great significance to the theoretical security of lattice-based cryptographic schemes.
In practice, it is notoriously difficult to sample from discrete Gaussian distributions effectively and securely, as demonstrated by numerous side-channel attacks (Groot Bruinderink et al. 2016;Espitau et al. 2017;Pessl et al. 2017;Bootle et al. 2018) against the Gaussian sampler in the BLISS signature (Ducas et al. 2013). For this reason, some schemes replace Gaussian distributions with other distributions Schwabe et al. 2020), like uniform distributions or binomial distributions, even if this ordinarily leads to performance and security degradation.
However, discrete Gaussian distributions are ineluctable in some situations. One prominent example is trapdoor sampling (Gentry et al. 2008;Peikert 2010;Micciancio and Peikert 2012). Trapdoor sampling can be used to construct many powerful cryptographic Open Access Cybersecurity *Correspondence: zhouyongbin@iie.ac.cn applications, including signature (Gentry et al. 2008;Prest et al. 2020), (hierarchical) identity-based encryption ((H)IBE) (Gentry et al. 2008;Agrawal et al. 2010;Ducas et al. 2014), and attribute-based encryption (ABE) (Zhang and Zhang 2011), etc. There are two extensively used trapdoor sampling algorithms at present. The first one is proposed by Gentry, Gentry et al. (2008) (GPV trapdoor sampler), while the second one is proposed by Micciancio and Peikert (2012) (MP trapdoor sampler). Both of them require to sample from the Gaussian distributions over the integers with different standard deviations and arbitrary centers (Ducas et al. 2014;Prest et al. 2020;Genise and Micciancio 2018;Hu and Jia 2019). To resist timing attacks, the former requires the Gaussian sampler over the integers does not leak the information of the standard deviation σ , center c, and output z , while the latter requires the sampler does not leak the information of c and z. In practical implementations, the trapdoor samplers usually employ a generic Gaussian sampler over the integers whose precomputation cost does not vary with different σ and c. So it is important to propose a Gaussian sampler over the integers which is generic, efficient, and resistant to timing attacks.
There are two strategies to design generic Gaussian samplers over the integers. One is based on the convolution theorems of discrete Gaussian distributions, and another is based on rejection samplings. The convolution sampler (Micciancio and Walter 2017) belongs to the former. It is easy to make its running time independent of its inputs σ , c and output z. However, its efficiency highly relies on online/offline skills, and its whole running time is not competitive. For Gaussian samplers based on rejection samplings, the rejection sampling algorithm seems to be inherently costly to turn into a constant-time algorithm. FALCON's sampler (Howe et al. 2020) is not constant-time, but isochronous concerning its inputs σ , c, and output z. Isochrony is sufficient to resist timing attacks. But FALCON's sampler is only suitable for small σ . In practice, if we do not use NTRU lattices to instantiate GPV trapdoor sampler, or employ MP trapdoor sampler, we may need to sample the distributions with large σ . Karney's sampler (Karney 2016) and the COSAC sampler (Zhao et al. 2020a) are suitable for large σ . However, Karney's sampler does not consider how to resist timing attacks, while the COSAC sampler may reveal its input σ . One crucial step of Gaussian samplers based on rejection samplings is sampling an exponential Bernoulli variable B p with parameter p = exp(−x) (x ≥ 0) . It is a time-consuming module, and the mainstream solution is approximating the exponential function by a polynomial (Zhao et al. 2020b; Barthe et al. 2019).
In some lattice-based cryptography libraries, such as PALISADE (Cousins et al. 2018;Dai et al. 2018;Gür et al. 2019), it is one of the fundamental building blocks to sample from the Gaussian distributions over the integers. To implement MP trapdoor sampler, the lattice-based cryptography library PALISADE (Cousins et al. 2018) employs the convolution sampler and Karney's sampler as the Gaussian samplers over the integers. The convolution sampler is only used in the offline phase, as it does not seem easy to get an efficient implementation of the convolution sampler. PALISADE employs Karney's sampler to obtain a more efficient implementation, although Karney's sampler can not resist timing attacks. Thus, it deserves more effort to design a Gaussian sampler over the integers which is generic, efficient, and resistant to timing attacks.

Our contribution
In this paper, we mainly focus on Gaussian samplers over the integers based on rejection samplings. We first propose an exponential Bernoulli sampling algorithm sampling the exponential Bernoulli variables. It is a basic tool to construct the Gaussian samplers based on rejection samplings. Then we utilize the discrete Gaussian distribution with a small standard deviation as the base distribution and design an isochronous Gaussian sampler based on rejection sampling. The COSAC sampler is a rejection sampling algorithm using the normal distribution as the base distribution. We reduce its expected number of trials to improve the efficiency. Therefore our contribution is three-fold.
The inspiration of our exponential Bernoulli sampling algorithm comes from von Neumann's algorithm that samples from the exponential distribution. As far as we know, our work is the first one that securely applies this idea to the discrete Gaussian sampling to improve the efficiency of the cryptographic scheme. Compared with the mainstream method approximating the exponential function by a polynomial (Zhao et al. 2020b), our algorithm has the following advantages: • Our algorithm has higher efficiency while resist timing attacks in rejection sampling scenes. On the one hand, we improve the performance by reducing floating-point operations significantly. On the other hand, when our algorithm is applied to the rejection sampling algorithm resistant to timing attacks, its output only relies on the public rejection rate and its running time is independent of its input, therefore our algorithm can resist timing attacks.
• We apply our algorithm to FALCON ), a lattice-based signature scheme of the thirdround candidates in the NIST post-quantum cryptography standardization project (NIST PQ project). When the LDL tree 1 has been built upon the secret key, the signature generation time of the new implementation decreases by 13-14%.
We derive the idea of our isochronous Gaussian sampler from Karney's sampler (Karney 2016) and FAL-CON's sampler (Howe et al. 2020). We employ the rejection sampling strategy 2 of Karney's sampler to design the sampler, then apply a similar technique as FALCON's sampler to make it isochronous concerning its inputs (standard deviation σ and center c) and output (sampling value z). There are four differences between the techniques in our work and (Howe et al. 2020): (1) We use Algorithm 3 instead of the polynomial approximation to sample the exponential Bernoulli variable; (2) We provide two base samplers, the cumulative distribution table (CDT) sampler and the binary sampler (Algorithm 5); (3) To make the rejection rate independent of σ , the parameter C(σ ) in Algorithm 4 relies on the ratio σ/σ 0 rather than σ (Lemma 8); (4) We need Algorithm 6 to sample the integers from [0, ⌈k⌉) uniformly. Our discrete Gaussian sampling algorithm has the following advantages: • As shown in Table 1, Algorithm 4 has the best performance while the samplers consider timing attacks. • Compared with FALCON's sampler, the memory consumption and running time of Algorithm 4 are almost unaffected by the maximum standard deviation σ max , so it is suitable for large σ . When we use the parameter C(σ ) in Algorithm 4 to hide σ , we could adjust C(σ ) to the minimum σ required by the cryptographic scheme, such that our sampler has the better performance. • We replace Karney's sampler in PALISADE with our Gaussian sampler to speed up the online phase of the MP trapdoor sampler. The running time of the G-lattice sampling algorithm in the online phase decreases by 44.12%.
The core of our new COSAC sampler is to adjust the rejection sampling strategy to the center. The expected number of trials of the new COSAC sampler is half of that of the original. As the standard deviation increases, its expected number of trials converges to 1, which is the best in theory for the rejection sampling algorithms. However, the new COSAC sampler is only 1.46x-1.63x faster than the original due to the additional operations to hide the output. As shown in Table 1, it is not faster than Algorithm 4 and needs a more efficient algorithm sampling from the normal distribution to improve its performance. Table 1 Comparison between our work and existing generic a samplers at 3.6 GHz a "Generic" means that the precomputation cost does not vary with different σ and c. The precomputation of FALCON's Sampler is determined by the maximum standard deviation σ max . b In this column, " × " means the sampler can not resist timing attacks; "Type I" means the sampler is isochronous concerning σ , c and z; "Type II" means the sampler is isochronous concerning c and z; "Type III" means the sampler is only isochronous concerning z. c We provide two base samplers. Here we utilize the CDT sampler as the base sampler.
d The reference implementation of the COSAC sampler batches the random number generation to hide its output z, but its rejection rate still seems to depend on c. 1 The LDL tree is a data structure to reduce the complexity of the Gaussian sampling over rings during the signature generation. It depends only on the secret key. 2 The rejection sampling strategy is the way of sampling the discrete Gaussian distributions with different σ and arbitrary c by the base sampler.

Related works
Some works (Forsythe 1972;Ahrens and Dieter 1973;Karney 2016) have employed the idea of von Neumann's algorithm to elegantly sample from a continuous probability distribution with the probability density function proportional to exp(−G(x)) , where G(x) is some simple function. The normal distribution is one of the continuous probability distributions. However, it is hard to make the algorithms isochronous. To the best of our knowledge, there is no work to employ the idea to improve the efficiency of cryptographic schemes in practice. Du et al. (2019) proposed a Gaussian sampler over the integers (DWZ sampler) utilizing the rejection sampling strategy of Karney's sampler (Karney 2016). It is similar to Algorithm 4. However, they chose the insecure binary sampler as the base sampler and computed the exponential function directly. Therefore, its side-channel resistance perspective is unclear. Although it is faster than Karney's sampler, it does not solve the crucial security problem of Karney's Sampler. In addition, we employ the CDT sampler as the base sampler of Algorithm 4 and get better performance. There are some not generic, but efficient Gaussian samplers. The CDT sampler and Knuth-Yao sampler are often used as the base samplers. Karmakar et al. (2019) crafted a constant-time and efficient implementation of the Knuth-Yao sampler. The Twin-CDT sampler (Melchor and Ricosset 2018) can sample from the discrete Gaussian distribution with the fixed σ and arbitrary c. It is constant-time.

Notations
Let R , Z , N be the set of real numbers, integers and nonnegative integers respectively. For a distribution D, we use the notation x $ ← − D to mean that x is chosen according to the distribution D. If S is a set, then x $ ← − S means that x is sampled uniformly at random from S. The left arrow ← denotes the assignment operator. We use the notation B p to denote the Bernoulli distribution with parameter p. A random variable with B p takes the true value with probability p and the false value with probability 1 − p . We denote the logarithm with base 2 by log and the one with base e by ln.

Gaussian distributions
For any σ , c ∈ R with σ > 0 , the Gaussian function with parameters σ and c over R is defined as ρ σ ,c (x) = exp(− (x−c) 2 2σ 2 ) . We denote the normal (or continuous Gaussian) distribution with parameters σ and c over R by N (c, σ 2 ) , which has the probability density function ρ σ ,c (x)/(σ √ 2π) . For any countable set S � R , we denote ρ σ ,c (S) by the sum x∈S ρ σ ,c (x) . If ρ σ ,c (S) is finite, we define the discrete Gaussian distribution with parameters σ and c over S as D S,σ ,c (x) = ρ σ ,c (x)/ρ σ ,c (S) , and denote the distribution by D S,σ ,c . The parameter σ (resp. c) is often called the standard deviation (resp. center) of the distribution. Note that when c = 0 , we omit it in index notation, e.g. ρ σ (x) = ρ σ ,0 (x) and D S,σ (x) = D S,σ ,0 (x).

Smoothing parameter
The smoothing parameter is a lattice quantity proposed by Micciancio and Regev (2004). For ǫ > 0 , the smoothing parameter η ǫ (�) of a lattice is the smallest value σ > 0 such that ρ 1 σ √ 2π (� * \{0}) ≤ ǫ , where * denotes the dual of . The initial definition (Micciancio and Regev 2004) of the smoothing parameter is √ 2π times that our definition, as it is defined by the Gaussian parameter s = σ √ 2π.
Lemma 1 (Micciancio and Regev 2004) For any positive real ǫ > 0 , we have η ǫ (Z) ≤ η + ǫ (Z): The following lemma states that the total Gaussian measure over Z , i.e. ρ σ ,c (Z) , is essentially the same for any center c when the standard deviation σ exceeds the smoothing parameter η ǫ (Z).
Lemma 2 (Micciancio and Regev 2004;Gentry et al. 2008) For any ǫ ∈ (0, 1) , σ > η ǫ (Z) and c ∈ R , we have We need the following lemma to make the running time of a discrete Gaussian sampling algorithm independent of σ . One may get a similar result by the proof of Lemma 4.4 in Micciancio and Regev (2004). The difference here is that σ is larger than η + ǫ (Z) rather than η ǫ (Z) . We give a brief proof in "Appendix".

Rényi divergence
We recall the definition of the R ényi divergence, which is extensively used in the cryptographic security proofs currently.
Definition 1 (Bai et al. 2015) Let P , Q be two distributions such that Supp(P ) ⊆ Supp(Q ). For a ∈ (1, +∞) , we define the R ényi divergence of order a by In addition, we define the R ényi divergence of +∞ by R ∞ (P, Q) = max x∈Supp(P) The R ényi divergence is not a distance, but it does verify cryptographically useful properties. We recall some important properties of the R ényi divergence from (Bai et al. 2015;van Erven and Harremoës 2014).
Lemma 4 (Bai et al. 2015) Let a ∈ (1, +∞] . Let P and Q denote distributions with Supp(P ) ⊆ Supp(Q ). Then the following properties hold: • Data Processing Inequality. For any function f, where f (P) (resp. f (Q) ) denotes the distribution of f(y) induced by sampling y from P (resp. Q ), we have: • Multiplicativity. For two families of distributions • Probability Preservation. For any event E ⊆ Supp(Q) and a ∈ (1, +∞) , In practice, when we approximate the original distribution D Z,σ ,c by sampling from a finite set, we can use the following lemma to bound the R ényi divergence between the real and ideal distributions.

Lemma 5 (Prest 2017) Let
When we use the double type or 64-bit integers to approximate floating-point numbers, which results in the relative error between the real and ideal distributions, the following lemma gives a bound of the R ényi divergence between the distributions.
Lemma 6 (Prest 2017) Let P and Q be two distributions of the same support S. Suppose that the relative error between P and Q is bounded: ∀x ∈ S , ∃δ > 0 , such that

Isochronous algorithm
When an algorithm is applied to different scenarios, the sensitive variables of the algorithm may be different. To resist against timing attacks, we only need to keep the running time of the algorithm from leaking sensitive parameters for a particular scenario. In this work, when we show that our algorithms are provably resistant against timing attacks, we use the following definition of the isochronous algorithm to capture this idea.
Definition 2 (Howe et al. 2020) Let A be a (probabilistic or deterministic) algorithm with set of input variables I , set of output variables O , and let S ⊆ I ∪ O be the set of sensitive variables. We say that A is perfectly isochronous with respect to S if its running time is independent of any variables in S.
In addition, we say that A is statistically isochronous with respect to S if there exists a distribution D independent of all the variables in S , such that the running time of A is statistically close (for a clearly identified divergence) to D.
In this paper, we refer to the implementations of FAL-CON and previous Gaussian samplers (Howe et al. 2020;Prest et al. 2020) and assume that the four basic operators over integers, the addition, subtraction, and multiplication over floating-point numbers, the rounding, ceiling, floor, and bitwise operations are isochronous. For the lattice-based trapdoor sampling, the short lattice basis determines σ of the Gaussian distribution over the integers, while c is usually variable. As the floating-point division rarely offers constant-time execution guarantee (Zhao et al. 2020b), we precompute the parameters 1/σ , k, ⌈k⌉ and P k in Algorithm 4 and Algorithm 6. We can also precompute S = ρ σ ,c (Z) in Algorithm 8 by Lemma 2 3 . While the implementation of FALCON's sampler only precomputes 1/σ 4 , Algorithm 4 may have to precompute a few more parameters. But its memory consumption is also tolerable (see Table 1).

Von Neumann's algorithm
Von Neumann's algorithm (Forsythe 1972) can sample a variable from the exponential distribution and avoid floating-point operations. It is the foundation of our exponential Bernoulli sampling algorithm. We review it in Algorithm 1.

Lemma 7
The variable x output by Algorithm 1 obeys the exponential distribution.
Proof We first analyze the probability that n is odd in step7 of Algorithm 1 for any x 2 ∈ [0, 1) . If n = n 0 in step 7, there are n 0 + 1 variables u 1 , u 2 , ..., u n 0 +1 uniformly sampled from [0, 1) in step 2 and step 5. These variables satisfy that x 2 > u 1 > ... > u n 0 and u n 0 ≤ u n 0 +1 . The probability that u 1 , ..., u n 0 are all less than x 2 is x n 0 2 . In addition, the probability that they are in descending order, which is one of the possible n 0 ! permutations, is x n 0 2 /n 0 ! . As u n 0 ≤ u n 0 +1 , the probability that n = n 0 in step 7 is x n 0 2 /n 0 ! − x n 0 +1 2 /(n 0 + 1)! . For any x 2 ∈ [0, 1) , the probability that n is even in step 7 is Because x 2 is sampled from [0, 1) uniformly, the probability that n is odd averaged over x 2 is 1 0 (1 − exp(−x 2 ))dx 2 = exp(−1) . Thus, the probability density that the algorithm terminates with a particular value of x 1 and x 2 is exp(−(x 1 + x 2 )) , as required.

A tool: exponential Bernoulli sampling algorithm
In this section, we propose a tool, the exponential Bernoulli sampling algorithm, to construct the Gaussian samplers based on rejection samplings. We first introduce the basic algorithm and analyze its relative error. Then we show how to make it isochronous in the particular application. Finally, we apply it to the FALCON signature.

Basic exponential Bernoulli sampling algorithm
When sampling a Bernoulli variable with parameter exp(−x) , the polynomial approximation method needs 14 floating-point multiplications in which 12 floating-point multiplications are used to compute the exponential function (Zhao et al. 2020b;Prest et al. 2020). To avoid these 12 floating-point multiplications, we can sample the Bernoulli variable utilizing step 2-step 6 in Algorithm 1 for any x ∈ [0, 1) . For any x ≥ 0 , let x = u 1 ln 2 + u 2 where u 1 is a non-negative integer and 0 ≤ u 2 < ln 2 , then exp(−x) can be written as exp(−x) = 2 −u 1 exp(−u 2 ) . The first part 2 −u 1 can be sampled efficiently in binary form, and the second part exp(−u 2 ) can be sampled efficiently too as u 2 ∈ [0, 1) . We present the basic exponential Bernoulli sampling algorithm in Algorithm 2.
Theorem 1 The output b of Algorithm 2 obeys the Bernoulli distribution with parameter exp(−x).
Proof The probability that the variable b 1 is true is 2 −u 1 . Because step 5-step 9 in Algorithm 2 is equivalent to step 2-step 6 in Algorithm 1, the probability that the variable b 2 is true is exp(−u 2 ) according to the proof of Lemma 7. As x = u 1 ln 2 + u 2 , the probability that the output b is true is exp(−x) .
In the next theorem, we estimate the expected sampling number of r 3 which is closely related to the efficiency of Algorithm 2.
Theorem 2 For any u 2 ∈ [0, ln 2) , in one running process of Algorithm 2, the expected sampling number S of r 3 is less than 2.
Proof The probability that n = n 0 in step 10 of Algorithm 2 is u n 0 2 /n 0 ! − u n 0 +1 2 /(n 0 + 1)! . If n = n 0 in step 10, the sampling number of r 3 is n 0 + 1 . The expected sampling number S of r 3 can be written as the last inequality is deduced from u 2 < ln 2.
When we prove the correctness of Algorithm 2 in Theorem 1, we treat it as an ideal algorithm that samples the variables b 1 and b 2 with the exact parameters 2 −u 1 and exp(−x + u 1 * ln 2) . In practice, it is necessary to bound the relative error of Algorithm 2 due to the approximations of u 2 and r 3 .
Theorem 3 Let P (R) be the probability that the output b of the real algorithm is true where the floating numbers u 2 , r 2 , r 3 of Algorithm 2 keep m ( m ≥ 5 ) significant bits. Let P (I) be the probability that the output b of the ideal algorithm is true where u 2 , r 2 , r 3 have infinite significant bits. Then the relative error between P (R) and P (I) satisfies that Proof Let P (R) 1 and P (R) 2 be the probabilities that the variables b 1 and b 2 of the real algorithm are true respectively, P (I) 1 and P (I) 2 be the probabilities that the variables b 1 and b 2 of the ideal algorithm are true respectively. Because u 1 is an integer and the sampling process of b 1 (step 2step 3) only involves the integer operations, P (R) 1 and P (I) 1 should be the same exactly. Thus, the relative error between P (R) and P (I) is the same as that between P (R) 2 and P (I)

2
The difference between P (R) 2 and P (I) 2 stems from the approximations of u 2 and r 3 . We define an intermediate algorithm where u 2 has m significant bits and r 2 and r 3 have infinite bits. Let P (IN ) 2 be the probability that the variable b 2 in the intermediate algorithm is true. Then we have As u 2 retains m significant bits and u 2 ∈ [0, ln 2) , the difference between the approximate number u (R) 2 and the accurate number u (I) 2 is not greater than 2 −m . So we have We will next analyze the process of the while loop (step 6 -step 9) of Algorithm 2. For the i-th while loop ( i ≥ 1 ), we denote A i as the event r 2 = r 3 , B i as the event r 2 > r 3 and C i as the event r 2 < r 3 . In the intermediate algorithm, the uniformly random floating-point number r (I) 3 has infinite significant bits, so event A i will never happen for any i ≥ 1 . In the real algorithm, both r (R) 2 and r (R) 3 have m significant bits and r (R) 3 is sampled uniformly, the probability of event A i is exactly 2 −m for any i ≥ 1 . Our goal is to calculate the probability sum of all A i in the real algorithm which is the upper bound of |P (R) 2 is always no more than u (R) · 2 −m for any i ≥ 1 . As u (I) 2 ∈ [0, ln 2) and m ≥ 5 , we can give an upper bound: (1) Combining the constraints (1) (2) (3) (4), we can bound the relative error between P (R) and P (I) , which concludes the proof.
In our implementation of Algorithm 2, we use the double type meeting the IEEE-754 standard in the floating-point arithmetics and 64-bit integers to approximate the floating numbers u 2 , r 2 , r 3 , so the number m of significant bits is at least 52. The relative error of our implementation is no more than

Isochronous exponential Bernoulli sampling algorithm
In this subsection, we focus on how to design an isochronous exponential Bernoulli sampling algorithm to resist timing attacks. It seems difficult to make the input and output of Algorithm 2 isochronous with respect to its running time while retaining its efficiency. Because the sampling number of r 3 determining the value of b 2 is closely related to the running time of Algorithm 2. To overcome the obstacle, we restrict the application of our algorithm to the rejection sampling algorithm. In these applications, the distribution of output b only relies on the rejection probability of the rejection sampling algorithm. For the rejection sampling algorithm resisting timing attacks, the rejection probability can not reveal the information of the sensitive variables. Therefore there is no need to hide the value of output b. In Algorithm 3, we successfully make the input independent of the running time while retaining the efficiency.
Theorem 4 The output of Algorithm 3 obeys the Bernoulli distribution with parameter exp(−x).
Proof After step 3, r 1 is a uniformly random integer in [0, 2 u 1 − 1] . The probability that b 1 is true is 2 −u 1 . There are two cases where b 2 is true. The first one is u 2 < r 3 in step 7, and the second one is that u 2 > r 3 in step 7 and n is even in step 12. As u 2 < t , their probabilities are 1 − u 2 and ∞ i=1 u 2i 2 /(2i)! − u 2i+1 2 /(2i + 1)! respectively. We can get that the probability that b 2 is true is exp(−u 2 ) by adding the two probabilities, which concludes the proof.
Theorem 5 Algorithm 3 is perfectly isochronous with respect to its input x. What's more, in one running process of Algorithm 3, the expected sampling number S of r 3 is exp(t).
Proof We need to prove that in Algorithm 3, u 1 and u 2 are respectively independent of the running times of sampling b 1 and b 2 . The sample of b 1 consists of one uniform sampling, one AND operation, and one comparison, which is independent of u 1 . The sample of b 2 consists of the uniform samplings of r 3 , the comparisons of r 2 and r 3 , one comparison of u 2 and r 3 . The sampling number of r 3 is the same as the number of comparisons between r 2 and r 3 . The distribution of the sampling number of r 3 depends on t rather than u 2 . The expected sampling number S of r 3 is In our implementation of Algorithm 3, we set t = 178 * 2 −8 to avoid the error of the floating-point number. So the relative error of Algorithm 3 stems from the approximations of u 2 and r 3 , which is the same as Algorithm 2. With the almost same proof of Theorem 3, we can get the same upper bound of the relative error of Algorithm 3 in practice.
Laziness Technique. When we use 64-bit integers to represent r 1 , r 2 , and r 3 , there are at most 64 · (1 + e) bits randomness required on average during each run of Algorithm 3. These integers are only involved in the comparison operations, so we can employ the laziness technique (Ducas and Nguyen 2012;Howe et al. 2020) to speed up the implementation of Algorithm 3. When at least one of the two compared 64-bit integers is sampled uniformly, the comparison can be determined by the first i significant bits, except with probability 2 −i (exactly when the first i bits of the two integers match). Therefore it is unnecessary to sample all bits of the integers at one time. In our implementation, the 64-bit integer is sampled 8 bits by 8 bits until the comparison is determined or all 64 bits are sampled. It is easy to estimate that each sampled integer needs less than 9 random bits on average in this way. So the expected number of random bits is less than 9 · (1 + e) in one running process of Algorithm 3 with this technique. It should be noted that the random number r 3 in step 6 of Algorithm 3 has to be compared with t and u 2 . We need to carefully select a value of the public parameter t to ensure that these two comparisons with the laziness technique don't reveal the information of u 2 . It is another reason why we set t = 178 * 2 −8 . The comparison between r 3 and 178 * 2 −8 only needs the first 8 bits. If r 3 ≥ 178 * 2 −8 , we can determine r 3 > ln 2 > u 2 through the first 8 bits. Otherwise, r 3 is a uniform random number in [0, 178 * 2 −8 ) . In this case, for the first 8 bits of r 3 and u 2 , the probability that they are equal is 1/178. For the i-th bits of r 3 and u 2 ( i ≥ 9 ), the probability that they  are equal is 1/2. Therefore the value of u 2 does not impact the running time of the two comparisons.

Applications and performances
Our exponential Bernoulli sampling algorithm can be applied to Gaussian samplers based on rejection samplings. In this section, we choose FALCON, a latticebased signature scheme of the third-round candidates in the NIST PQ project, as an example of the application of Algorithm 3. The Gaussian sampler in FALCON is based on rejection sampling, so it needs to sample an exponential Bernoulli variable B p with p = exp(−x) (x ≥ 0) . Under the parameter sets of FALCON, we can assume x ∈ [0, 64 · ln 2] . To achieve the security goal of FAL-CON, the relative error between the ideal and real probabilities can not be greater than 2 −43 (Howe et al. 2020;Prest et al. 2020). The relative error of our algorithm is not greater than 2 −48 , which satisfies the requirement of FALCON. In the latest implementations of FALCON, it uses a polynomial approximation P facct (x) (Zhao et al. 2020a) of the exponential function exp(x) on [− ln 2, 0] to sample B p .
In the round 3 NIST package, FALCON contains the optimized implementations at two security levels. There are three implementations at each security level. Because we don't consider the specific instruction set optimization, we choose the ones relying on the IEEE-754 floating-point type to perform our benchmark test. They can work on x68, ARM, and POWER/PowerPC systems.
We apply Algorithm 3 to FALCON and compile the implementations with the compiler option -O3 enabled. The benchmark results are showed in Tables 2 and 3. The new discrete Gaussian sampling implementation is about 1.19x faster than the original. While the LDL tree has been built upon the secret key, the new signature generation implementation is about 1.15x-1.16x faster than the original. In this paper, all our experiments are performed on a single Intel Core i7-4790 CPU core at 3.6 GHz.
The implementation of FALCON does not offer the interface to invoke the algorithm for sampling the exponential Bernoulli variable. So we separate this module and compare it with Algorithm 3. We choose the ChaCha20 in OpenSSL and the AES256 counter mode with hardware AES instructions (AES-NI) as the pseudorandom number generators (PRNG) and use g++ 7.5.0 to compile our implementations with the compiler options -O3 and -maes enabled 5 . Table 4 lists the efficiencies of different implementations. If the ChaCha20 in OpenSSL is used as the PRNG, Algorithm 3 is 1.25x faster than the algorithm in FALCON. Algorithm 3 with AES-NI is 1.49x faster than the algorithm in FALCON. The reason is that Algorithm 3 avoids a large number of floating-point multiplications and needs more random numbers. The efficiency of the PRNG has more influence on Algorithm 3 than the algorithm in FALCON.

Sampling integers using a discrete Gaussian distribution with a small standard deviation
In this section, we first introduce our new Gaussian sampler based on rejection sampling, and analyze its correctness and the requirements . Then we use our tool Algorithm 3 to implement our Gaussian sampler and make it isochronous concerning its inputs and output. Finally, we compare our implementations with previous works, and apply it to the lattice-based cryptography library PALISADE.

New discrete Gaussian sampling algorithm
We build our Gaussian sampler by employing the rejection sampling strategy of Karney's sampler and present it in Algorithm 4. The rejection sampling strategy of Karney's sampler can be viewed as the generalization of that of the Gaussian sampler in the BLISS signature scheme (BLISS sampler) (Ducas et al. 2013). The strategy of Karney's sampler is suitable for the distribution with arbitrary σ > σ 0 and c, while the strategy of the BLISS sampler is only suitable for the distribution with σ being an integral multiple of σ 0 and c = 0 . Du et al. (2019) proposed a similar algorithm based on the rejection sampling strategy of Karney's sampler. Although it is faster than Karney's sampler, it does not solve the crucial security problem of Karney's Sampler. To make Algorithm 4 resistant to timing attacks and not affecting the security of cryptographic schemes, we need to craft the modules in Algorithm 4, especially the base sampler (step 3) and the algorithm sampling the exponential Bernoulli variable (step 9). In addition, we also use C(σ ) ∈ (0, 1] to make σ independent from the rejection rate of Algorithm 4. These are the necessary steps for the practicality of Karney's sampler, which are also our main contributions in this section. To prove the correctness of Algorithm 4, there are two issues to be addressed: (1). After step 10 and step 11 in Algorithm 4, each integer z * can be calculated by exactly one tuple (x * , y * , s * ) ; (2). We need to show the probability of sampling z * is exactly proportional to ρ Z,σ ,c (z * ) . Because our rejection sampling strategy is the same as that of Karney's sampler, the idea of proof is implicit in Karney (2016), Du et al. (2019). We provide the detailed proof in "Appendix" for completeness.

Theorem 6
The output z sampled by Algorithm 4 is distributed as D Z,σ ,c .
The floating-point error in Algorithm 4 causes the error of the Bernoulli variable B p rej , so the error of Algorithm 4 stems from the approximate distribution D (R) N,σ 0 and the approximate Bernoulli variable B (R) p rej , which is the same as FALCON's sampler. We utilize the technique in Howe et al. (2020) to estimate the security loss of the cryptographic scheme invoking Algorithm 4 in practice.
Theorem 7 Let (I) (resp. (R) ) be the security parameter of an implementation using the ideal distributions D (I) N,σ 0 and B (I) p rej (resp. the real distributions D (R) N,σ 0 and B (R) p rej ). Let the numbers of sampling from D N,σ 0 and B p rej be Q bs and Q exp . If the following conditions are respected, at most two bits of security are lost. In other words, In the proof of Theorem 7, we can first bound the R ényi divergence between B (R) p rej and B (I) p rej by Lemma 6 and the first condition of Theorem 7. Then, combining Lemma 4 and the upper bound of estimate the security loss. We provide the detailed proof in "Appendix". As suggested by the NIST PQ project, an attacker can make no more than 2 64 signature or decryption queries. The lattice dimension of a cryptographic scheme is usually less than 2 12 . Therefore, it is reasonable to assume that the cryptographic scheme calls Algorithm 4 less than 2 76 in most applications. According to the acceptance probability of Algorithm 4 in Lemma 8, if σ 0 > 1/ √ 2 ln 2 , the expected number of trials of Algorithm 4 is no more than 3. Then we have Q bs = Q exp ≤ 3 · 2 76 ≤ 2 78 . If a cryptographic scheme invoking algorithm 4 has 256-bit security, the concrete numerical values of the conditions in Theorem 7 are

Isochronous implementations
In this subsection, we first analyze the conditions to make Algorithm 4 isochronous with respect to the standard deviation σ , the center c, and its output z, then introduce how to implement Algorithm 4 and select the appropriate parameters to satisfy those conditions.
To get an isochronous implementation of Algorithm 4, there are five necessary conditions: The implementation isochronous concerning z and c needs to satisfy the first three conditions. The implementation isochronous concerning z, c, and σ needs to satisfy all five conditions. For the first condition, we adopt two samplers to instantiate Algorithm 4 respectively. The first one is the CDT sampler. We precompute the approximate cumulative distribution table of D N,1 with 80-bit precision shown in Table 5. To produce a sample, we generate a random value r in [0, 1) with the same precision and return the index of the last entry in the table that is greater than r. To make the running time isochronous with respect to the output, the CDT sampler has to read the entire table and compare r with each entry. For any a ≤ 512 , our experiment shows that the R ényi Divergence R a D tion D (R) N,1 and the ideal distribution D (I) N,1 is bounded by 1 + 2 −80 , which satisfies the second condition in Theorem 7. The advantage of the CDT sampler is that it has a very attractive performance.
Another one is the binary sampler, which is introduced in the BLISS sampler Ducas et al. (2013). It sample from the distribution D N,1/ √ 2 ln 2 . In Algorithm 5, we show how to make the binary sampler isochronous concerning its output. We cut the tail of the distribution and only sample the non-negative integers no more than 8. Moreover, the while loop in Algorithm 5 never terminates in advance even if the sample has been determined. As each probability of D N,1/ ≤ 1 + 2 −80 satisfying the second condition in Theorem 7. The advantage of the binary sampler is that it does not need any precomputation.
For the second condition, as C(σ ) ∈ (0, 1] , it is straightforward to sample the exponential Bernoulli variable B p rej through Algorithm 3. The relative error of Algorithm 3 is no more than 2 −48 meeting the first condition in Theorem 7.
For the fourth condition, we use Algorithm 6 to sample y. In our implementation, the public parameter h is equal to 32. The acceptance probability of Algorithm 6 is 0.5, so its running time is independent of k.
Both third and fifth conditions are related to the acceptance probability of Algorithm 4. We utilize the technique in FALCON's sampler to satisfy these two conditions. In the following lemma, we prove that σ ≥ η + ǫ (Z) is sufficient for the independence between c and the acceptance probability of Algorithm 4. Moreover, if σ ≥ tσ 0 , where t, σ 0 are public parameters, and C(σ ) in Algorithm 4 has an appropriate value, the acceptance probability could not rely on σ.
In the remaining part of this subsection, we will analyze the influence of the acceptance probability P true (σ , c) on the running time of Algorithm 4 and the adversary advantage. We only consider the scenarios where both σ and c are secret. If σ is public, we can ignore the last two conditions. This means we don't need Algorithm 6 and parameter C(σ ) to hide σ . In this case, we can utilize a similar method to analyze the success probability of the adversary by the first part of Lemma 8. We omit the details here. In the following Theorem 8, we show that Algorithm 4 is perfect isochronous with respect to z and statistically isochronous for the R ényi divergence with respect to σ , c.
Theorem 8 Let ǫ ∈ (0, 1) , c ∈ R , k = σ σ 0 and t be an positive integer. Let σ 0 , σ be the standard deviations such that 2(t+1)ρ σ 0 (N) be constants in (0, 1). Suppose that the implementation of Algorithm 4 satisfies the first, second, and fourth conditions. Its running time follows a distribution T (σ , c) such that: for some distribution T independent of σ , c, and z.
In the following Theorem 9, we leverage Theorem 8 to prove that the running time of Algorithm 4 does not help an adversary to break the cryptographic scheme. As is analyzed in Howe et al. (2020), we consider that the adversary has access to some function g(D Z,σ ,c ) as well as the running time of Algorithm 4. This is intended to capture the fact that in most applications, the output of Algorithm 4 is not given directly to the adversary, but processed by some function g before.

Theorem 9
Consider an adversary A making Q s queries to g(D Z,σ ,c ) for some randomized function g, and solving a search problem with success probability 2 − for some ≥ 1 . With the notations of Theorem 8, suppose that ǫ ≤ 1 √ 24 Q s . Learning the running time of each call to Algorithm 4 does not increase the success probability of A by more than a constant factor.
To prove Theorem 8, we need to analyze the running time T 0 of one iteration of the while loop in Algorithm 4  and the number I(σ , c) of iterations. Let I be the number of iterations in the ideal case. I(σ , c) (resp. I) is determined by the probability P true (σ , c) (resp. p). As the floating-point operations are isochronous and the first, second, and fourth conditions are satisfied, T 0 follows a distribution independent of σ , c, and z. By Lemma 4, we have R a (T (σ , c), T ) = R a (I(σ , c), I) .
We can deduce the upper bound of R a (I(σ , c), I) from the relation P true (σ , c) ∈ p · [1 − 2ǫ, 1 + 2ǫ] and prove the first part of the inequality. The second part can be derived from p ∈ [0.34, 0.36] . The proof of Theorem 9 requires the flexible applications of the three properties in Lemma 4. We put the detailed deductions in "Appendix".

Applications and performances
For simplifying the description, we use Type I isochrony and Type II isochrony to clarify whether the implementation hides the information of σ . The two isochronous properties satisfy the requirements of the GPV trapdoor sampler and the MP trapdoor sampler. We offer four different implementations of Algorithm 4. The implementations of Type I isochrony are the same as those of Type II isochrony except utilizing Algorithm 6 and parameter C(σ ) to hide σ . The base samplers are the CDT sampler and the binary sampler. The binary sampler does not need any precomputation, while the CDT sampler need to store the cumulative distribution table in Table 5. Each 80-bit integer is represented by two 64-bit integers in our implementations, so the table consumes 20 · 8 = 160 bytes. All four implementations need at most several hundred bytes of memory consumption containing other precomputed values such as k and 1/(2σ 2 ). We choose the AES256 counter mode with the AES-NI instruction set as the PRNG and use g++ 7.5.0 to compile the four implementations of Algorithm 4 with the compiler options -O3 and -maes enabled. In each test, we produce 10 4 random centers c in [0,1) for each σ , then generate 10 3 samples with the same σ and c and measure the consumed time. We calculate the benchmark results in Table 6 based on the measured times. The implementations based on the CDT sampler are a little faster than those based on the binary sampler.
In the implementations of Type I isochrony, we set the parameter C(σ ) equal to (2⌈k⌉)/(3k) for σ ≥ 2σ 0 6 . It causes a significant decline in the performances. If σ is far greater than σ 0 , we can adjust C(σ ) to get better performances. For example, if σ ≥ 32σ 0 , we can set C(σ ) equal to (32⌈k⌉)/(33k) . The implementations based on the binary sampler and the CDT sampler generate 9.68 × 10 6 and 10.23 × 10 6 samples per second for σ = 32.
We summarize the performances of previous works in Table 7. We scale all the numbers to be based on 3.6 GHz. The TwinCDT sampler (Melchor and Ricosset 2018) is isochronous concerning c and z. It offers three different tradeoffs between the running time and the precomputation storage. Although it is at least two times faster than our implementations of Type II isochrony for σ ∈ [2, 32] , it has two serious problems. First, it is not generic. For the distributions with different σ , TwinCDT sampler needs different precomputations. This problem will reduce the portability of the algorithm and will require to precompute for n distributions with different σ in GPV trapdoor sampler, where n is the dimension of the lattice. Second, its precomputation storage increases significantly along with σ increasing, so it is not suitable for the distribution with large σ . The convolution sampler (Micciancio and Walter 2017) satisfies Type I isochrony. Our implementations of Type I isochrony are at least four times faster than the convolution sampler. If we only measure the time during the online phase, our implementations will generate the base samples during the offline phase. We can assume σ ≥ 13 > 4 √ 2η ǫ (Z) for reasonable ǫ just as the benchmark test in Micciancio and Walter (2017) 7 . With C(σ ) = (13⌈k⌉)/(14k) , our implementations of Type I isochrony generate 12.43 × 10 6 and 13.11 × 10 6 samples per second for σ = 2 15 during the online phase, which are faster than the convolution sampler.
The DWZ sampler (Du et al. 2019) and Karney's sampler could not resist timing attacks, though the DWZ sampler is faster than our implementations. FALCON's sampler (Howe et al. 2020) satisfies Type I isochrony. It is faster than our implementations of Type I isochrony. However, its storage and running time increase rapidly along with the maximum standard deviation increasing. In addition, if the minimum standard deviation is larger than 2, our implementations will have better performances.
We apply Algorithm 4 to the lattice-based cryptography library PALISADE (Cousins et al. 2018) and use  (2017) is √ 2π times that in this paper. 6 σ 0 in our implementations is 1 or 1/ √ 2 ln 2.
its latest version as of this writing 8 to evaluate the performances. We replace Karney's sampler in PALISADE with our implementations based on the CDT sampler to sample from the Gaussian distributions over the integers. PALISADE uses the BLAKE2 hash function 9 as the PRNG. In PALISADE, Karney's sampler generates 2.49 × 10 6 samples per second, while our implementations generate 5.92 × 10 6 and 3.21 × 10 6 samples. PALISADE contains the implementations (Gür et al. 2019) of IBE (Micciancio and Peikert 2012) and CP-ABE (Zhang and Zhang 2011). Both rely on the latest MP trapdoor sampler (Genise and Micciancio 2018). We mainly focus on the G-lattice sampling algorithm in the MP trapdoor sampler. It determines the running time during the online phase. All standard deviations are public in the G-lattice sampling algorithm, so we make it invoke the implementation of Type II isochrony. The parameters include the modulus q, the dimension n of lattice, the base b for the gadget lattice G, and the standard deviation σ . Let (q, n, b, σ ) = (12289, 256, 2, 100) . The benchmark results are shown in Table 8. Our implementation is 1.79x faster than the original.

Sampling integers using a normal distribution
In this section, we first introduce the rejection sampling strategy of the new COSAC sampler, then analyze its correctness and security requirements. Finally, we implement it to confirm the theoretical analysis and compare it with Algorithm 4.

New COSAC sampler
The COSAC sampler (Zhao et al. 2020a) uses the normal distribution N (0, σ ) and the rejection sampling to sample from the distributions D Z\{0},σ ,c F with c F ∈ [−1/2, 1/2] . The original rejection sampling strategy is independent of c F and requires about two trials on average to output a sample. Our new rejection sampling strategy varies with c F . The expected number of trials of the new sampler is about half of that of the original.
Our sampler is represented in Algorithm 8. It invokes Algorithm 7 to sample from the distributions D Z\{0},σ ,c F with c F ∈ [−1/2, 1/2] . The correctness of Algorithm 8 and Algorithm 7 is followed by our analysis above and some simple integral calculations. Theorem 10 indicates that the acceptance probability of Algorithm 7 converges to 1 as σ increases. In theory, Algorithm 8 is almost two times faster than the original COSAC sampler. Theorem 10 The outputs of Algorithm 7 and Algorithm 8 are distributed as D Z\{0},σ ,c F and D Z,σ ,c respectively. Let ǫ ∈ (0, 1) , and σ > η ǫ (Z) , the acceptance probability P true (σ , c F ) of Algorithm 7 satisfies Due to the impossibility of achieving perfect distributions in practice, we need Theorem 11 to estimate the security loss of the cryptographic schemes invoking Algorithm 8. The error of Algorithm 8 stems from the approximate distribution N (R) (0, 1) and the approximate Bernoulli variables B (R) p 0 , B (R) p rej . The proof of Theorem 11 is similar to that of Theorem 7, except estimating the influence of the continuous distribution N (0, 1) . We provide the detailed proofs of Theorems 10 and 11 in "Appendix".
Theorem 11 Let (I) (resp. (R) ) be the security parameter of an implementation using the ideal distributions N (I) (0, 1) and B (I) p (resp. the real distributions N (R) (0, 1) and B (R) p ). Let the numbers of sampling from N (0, 1) and B p be Q bs and Q exp , the absolute error of the sample values between N (I) (0, 1) and N (R) (0, 1) be N . If the following conditions are respected, at most two bits of security are lost. In other words, (I) − (R) ≤ 2.

Performance of new COSAC sampler
We implement Algorithm 8 by modifying the implementation of the original COSAC sampler. The implementation utilizes AVX2 intrinsic instructions to improve the performance and employ the Box-Muller continuous Gaussian sampler (Hülsing et al. 2018) to sample from N (0, 1) . The AES256 counter mode with the ASE-NI instructions is used as the PRNG. For convenience, we use g++ 7.5.0 to compile the implementations with the same compiler options -O3 -march=native enable as Zhao et al. (2020a). The benchmark results are shown in Table 9.
In Table 9, the average numbers of trials are basically consistent with the theoretical results. However, the new COSAC sampler is 1.46x-1.63x faster than the original. The main reason is the additional operations to hide z. As claimed in Zhao et al. (2020a), our implementation of Algorithm 8 may also reveal σ . Although the implementation involves AVX2 intrinsic instructions, we compare it with the implementations of Type II isochrony in Table 6. The new COSAC sampler is faster than the implementation based on the binary sampler but slower than the implementation based on the CDT sampler. It is noteworthy that the absolute error of the Box-Muller continuous Gaussian sampler is no more than 2 −48 . For σ ∈ [2, 2 20 ] and (R) = 256 , the provable security is accessible only for Q bs ≤ 2 44 based on the second condition in Theorem 11.

Conclusions
In this work, we mainly proposed three algorithms to design a generic, secure and efficient Gaussian sampling algorithm over the integers. They are Algorithm 3, Algorithm 4, and Algorithm 8. Algorithm 3 is to sample the exponential Bernoulli variables. Compared with the polynomial approximation method, our algorithm reduces the floating-point multiplications significantly. We applied it to FALCON to decrease the signature generation time. One interesting phenomenon is that the polynomial approximation method can benefit from the Haswell microarchitecture developed by Intel as the fourth-generation core. If the compiler options include -march=haswell, the running time of the polynomial approximation method will decrease by about 45% on a single Intel Core i7-4790 CPU at 3.6 GHz. It is an important issue how to improve the performance of our algorithm on the specific architecture.
Algorithm 4 can sample from the Gaussian distributions over the integers while hiding the standard deviation, center, and output. It can be used as the fundamental operation modular in the lattice-based cryptography library. We applied it to PALISADE and obtained a more secure and efficient implementation of the MP trapdoor sampler. We can utilize the constanttime implementation of the Knuth-Yao sampler (Karmakar et al. 2019) as the base sampler and improve the performances of our implementations further. Besides, Theorem 9 is only available for search problems. For decision problems, one may prove the security by the techniques in Micciancio and Walter (2018).
Algorithm 8 also samples from the Gaussian distributions over the integers. It is not as competitive as Algorithm 4 so far. However, it has the lowest expected number of trials among the algorithms based on rejection sampling. It is necessary to find a more efficient algorithm sampling from the normal distribution to improve the performance. In addition, it also deserves further research to make the acceptance probability of Algorithm 8 independent of the standard deviation and center to resist timing attacks. outputs, then T 0 follows a distribution which is independent of σ , c, z.
which concludes the proof.

Proof of Theorem 10
Proof To prove that the output of Algorithm 7 is distributed as D Z\{0},σ ,c F , we need to calculate the probability that each non-zero integer z * is sampled. If z * > 0 , z * is calculated from x * in the interval [z * − 1 − c F , z * − c F ) , then we can obtain the probability of z * If z * < 0 , z * is calculated from x * in the interval [z * − c F , z * + 1 − c F ) , then it is easy to check that Pr[z = z * |z * ∈ Z, z * < 0] ∝ D Z\{0},σ ,c F (z * ) . So, for any non-zero integer z * , we can conclude that Pr[z = z * ] ∝ D Z\{0},σ ,c F (z * ).