 Research
 Open Access
 Published:
Generic, efficient and isochronous Gaussian sampling over the integers
Cybersecurity volume 5, Article number: 10 (2022)
Abstract
Gaussian sampling over the integers is one of the fundamental building blocks of latticebased cryptography. Among the extensively used trapdoor sampling algorithms, it is ineluctable until now. Under the influence of numerous sidechannel attacks, it is still challenging to construct a Gaussian sampler that is generic, efficient, and resistant to timing attacks. In this paper, our contribution is threefold. First, we propose a secure, efficient exponential Bernoulli sampling algorithm. It can be applied to Gaussian samplers based on rejection samplings. We apply it to FALCON, a candidate of round 3 of the NIST postquantum cryptography standardization project, and reduce its signature generation time by 13–14%. Second, we develop an isochronous Gaussian sampler based on rejection sampling. Our Algorithm can securely sample from Gaussian distributions with different standard deviations and arbitrary centers. We apply it to PALISADE (S&P 2018), an opensource latticebased cryptography library. During the online phase of trapdoor sampling, the running time of the Glattice sampling algorithm is reduced by 44.12% while resisting timing attacks. Third, we improve the efficiency of the COSAC sampler (PQC 2020). The new COSAC sampler is 1.46x–1.63x faster than the original and has the lowest expected number of trials among all Gaussian samplers based on rejection samplings. But it needs a more efficient algorithm sampling from the normal distribution to improve its performance.
Introduction
Latticebased cryptography has gained much attention due to its many attractive features. It allows us to build various powerful cryptographic primitives, such as fully homomorphic encryption (Gentry 2009), and has conjectured security against quantum computers (Shor 1997). Most latticebased cryptographic schemes are based on two main averagecase problems, the short integer solution (SIS) problem (Ajtai 1996), the learning with errors (LWE) problem (Regev 2005), and their analogs over rings (Micciancio 2002; Lyubashevsky et al. 2010). Discrete Gaussian distributions are at the core of security reduction proofs from the worstcase lattice problems to the averagecase problems (Micciancio and Regev 2004; Brakerski et al. 2013). Therefore, they have great significance to the theoretical security of latticebased cryptographic schemes.
In practice, it is notoriously difficult to sample from discrete Gaussian distributions effectively and securely, as demonstrated by numerous sidechannel attacks (Groot Bruinderink et al. 2016; Espitau et al. 2017; Pessl et al. 2017; Bootle et al. 2018) against the Gaussian sampler in the BLISS signature (Ducas et al. 2013). For this reason, some schemes replace Gaussian distributions with other distributions (Lyubashevsky et al. 2020; Schwabe et al. 2020), like uniform distributions or binomial distributions, even if this ordinarily leads to performance and security degradation.
However, discrete Gaussian distributions are ineluctable in some situations. One prominent example is trapdoor sampling (Gentry et al. 2008; Peikert 2010; Micciancio and Peikert 2012). Trapdoor sampling can be used to construct many powerful cryptographic applications, including signature (Gentry et al. 2008; Prest et al. 2020), (hierarchical) identitybased encryption ((H)IBE) (Gentry et al. 2008; Agrawal et al. 2010; Ducas et al. 2014), and attributebased encryption (ABE) (Zhang and Zhang 2011), etc. There are two extensively used trapdoor sampling algorithms at present. The first one is proposed by Gentry, Gentry et al. (2008) (GPV trapdoor sampler), while the second one is proposed by Micciancio and Peikert (2012) (MP trapdoor sampler). Both of them require to sample from the Gaussian distributions over the integers with different standard deviations and arbitrary centers (Ducas et al. 2014; Prest et al. 2020; Genise and Micciancio 2018; Hu and Jia 2019). To resist timing attacks, the former requires the Gaussian sampler over the integers does not leak the information of the standard deviation \(\sigma\), center c, and output z (Fouque et al. 2020), while the latter requires the sampler does not leak the information of c and z. In practical implementations, the trapdoor samplers usually employ a generic Gaussian sampler over the integers whose precomputation cost does not vary with different \(\sigma\) and c. So it is important to propose a Gaussian sampler over the integers which is generic, efficient, and resistant to timing attacks.
There are two strategies to design generic Gaussian samplers over the integers. One is based on the convolution theorems of discrete Gaussian distributions, and another is based on rejection samplings. The convolution sampler (Micciancio and Walter 2017) belongs to the former. It is easy to make its running time independent of its inputs \(\sigma\), c and output z. However, its efficiency highly relies on online/offline skills, and its whole running time is not competitive. For Gaussian samplers based on rejection samplings, the rejection sampling algorithm seems to be inherently costly to turn into a constanttime algorithm. FALCON’s sampler (Howe et al. 2020) is not constanttime, but isochronous concerning its inputs \(\sigma\), c, and output z. Isochrony is sufficient to resist timing attacks. But FALCON’s sampler is only suitable for small \(\sigma\). In practice, if we do not use NTRU lattices to instantiate GPV trapdoor sampler, or employ MP trapdoor sampler, we may need to sample the distributions with large \(\sigma\). Karney’s sampler (Karney 2016) and the COSAC sampler (Zhao et al. 2020a) are suitable for large \(\sigma\). However, Karney’s sampler does not consider how to resist timing attacks, while the COSAC sampler may reveal its input \(\sigma\). One crucial step of Gaussian samplers based on rejection samplings is sampling an exponential Bernoulli variable \({\mathcal {B}}_p\) with parameter \(p = \exp (x)\ (x \ge 0)\). It is a timeconsuming module, and the mainstream solution is approximating the exponential function by a polynomial (Zhao et al. 2020b; Barthe et al. 2019).
In some latticebased cryptography libraries, such as PALISADE (Cousins et al. 2018; Dai et al. 2018; Gür et al. 2019), it is one of the fundamental building blocks to sample from the Gaussian distributions over the integers. To implement MP trapdoor sampler, the latticebased cryptography library PALISADE (Cousins et al. 2018) employs the convolution sampler and Karney’s sampler as the Gaussian samplers over the integers. The convolution sampler is only used in the offline phase, as it does not seem easy to get an efficient implementation of the convolution sampler. PALISADE employs Karney’s sampler to obtain a more efficient implementation, although Karney’s sampler can not resist timing attacks. Thus, it deserves more effort to design a Gaussian sampler over the integers which is generic, efficient, and resistant to timing attacks.
Our contribution
In this paper, we mainly focus on Gaussian samplers over the integers based on rejection samplings. We first propose an exponential Bernoulli sampling algorithm sampling the exponential Bernoulli variables. It is a basic tool to construct the Gaussian samplers based on rejection samplings. Then we utilize the discrete Gaussian distribution with a small standard deviation as the base distribution and design an isochronous Gaussian sampler based on rejection sampling. The COSAC sampler is a rejection sampling algorithm using the normal distribution as the base distribution. We reduce its expected number of trials to improve the efficiency. Therefore our contribution is threefold.
The inspiration of our exponential Bernoulli sampling algorithm comes from von Neumann’s algorithm that samples from the exponential distribution. As far as we know, our work is the first one that securely applies this idea to the discrete Gaussian sampling to improve the efficiency of the cryptographic scheme. Compared with the mainstream method approximating the exponential function by a polynomial (Zhao et al. 2020b), our algorithm has the following advantages:

Our algorithm has higher efficiency while resist timing attacks in rejection sampling scenes. On the one hand, we improve the performance by reducing floatingpoint operations significantly. On the other hand, when our algorithm is applied to the rejection sampling algorithm resistant to timing attacks, its output only relies on the public rejection rate and its running time is independent of its input, therefore our algorithm can resist timing attacks.

We apply our algorithm to FALCON (Prest et al. 2020), a latticebased signature scheme of the thirdround candidates in the NIST postquantum cryptography standardization project (NIST PQ project). When the LDL tree^{Footnote 1} has been built upon the secret key, the signature generation time of the new implementation decreases by 13–14%.
We derive the idea of our isochronous Gaussian sampler from Karney’s sampler (Karney 2016) and FALCON’s sampler (Howe et al. 2020). We employ the rejection sampling strategy^{Footnote 2} of Karney’s sampler to design the sampler, then apply a similar technique as FALCON’s sampler to make it isochronous concerning its inputs (standard deviation \(\sigma\) and center c) and output (sampling value z). There are four differences between the techniques in our work and (Howe et al. 2020): (1) We use Algorithm 3 instead of the polynomial approximation to sample the exponential Bernoulli variable; (2) We provide two base samplers, the cumulative distribution table (CDT) sampler and the binary sampler (Algorithm 5); (3) To make the rejection rate independent of \(\sigma\), the parameter \(C(\sigma )\) in Algorithm 4 relies on the ratio \(\sigma /\sigma _0\) rather than \(\sigma\) (Lemma 8); (4) We need Algorithm 6 to sample the integers from \([0,\lceil k \rceil )\) uniformly. Our discrete Gaussian sampling algorithm has the following advantages:

As shown in Table 1, Algorithm 4 has the best performance while the samplers consider timing attacks.

Compared with FALCON’s sampler, the memory consumption and running time of Algorithm 4 are almost unaffected by the maximum standard deviation \(\sigma _{\mathrm{max}}\), so it is suitable for large \(\sigma\). When we use the parameter \(C(\sigma )\) in Algorithm 4 to hide \(\sigma\), we could adjust \(C(\sigma )\) to the minimum \(\sigma\) required by the cryptographic scheme, such that our sampler has the better performance.

We replace Karney’s sampler in PALISADE with our Gaussian sampler to speed up the online phase of the MP trapdoor sampler. The running time of the Glattice sampling algorithm in the online phase decreases by \(44.12\%\).
The core of our new COSAC sampler is to adjust the rejection sampling strategy to the center. The expected number of trials of the new COSAC sampler is half of that of the original. As the standard deviation increases, its expected number of trials converges to 1, which is the best in theory for the rejection sampling algorithms. However, the new COSAC sampler is only 1.46x–1.63x faster than the original due to the additional operations to hide the output. As shown in Table 1, it is not faster than Algorithm 4 and needs a more efficient algorithm sampling from the normal distribution to improve its performance.
Related works
Some works (Forsythe 1972; Ahrens and Dieter 1973; Karney 2016) have employed the idea of von Neumann’s algorithm to elegantly sample from a continuous probability distribution with the probability density function proportional to \(\exp (G(x))\), where G(x) is some simple function. The normal distribution is one of the continuous probability distributions. However, it is hard to make the algorithms isochronous. To the best of our knowledge, there is no work to employ the idea to improve the efficiency of cryptographic schemes in practice.
Du et al. (2019) proposed a Gaussian sampler over the integers (DWZ sampler) utilizing the rejection sampling strategy of Karney’s sampler (Karney 2016). It is similar to Algorithm 4. However, they chose the insecure binary sampler as the base sampler and computed the exponential function directly. Therefore, its sidechannel resistance perspective is unclear. Although it is faster than Karney’s sampler, it does not solve the crucial security problem of Karney’s Sampler. In addition, we employ the CDT sampler as the base sampler of Algorithm 4 and get better performance. There are some not generic, but efficient Gaussian samplers. The CDT sampler and KnuthYao sampler are often used as the base samplers. Karmakar et al. (2019) crafted a constanttime and efficient implementation of the KnuthYao sampler. The TwinCDT sampler (Melchor and Ricosset 2018) can sample from the discrete Gaussian distribution with the fixed \(\sigma\) and arbitrary c. It is constanttime.
Preliminaries
Notations
Let \({\mathbb {R}}\), \({\mathbb {Z}}\), \({\mathbb {N}}\) be the set of real numbers, integers and nonnegative integers respectively. For a distribution D, we use the notation \(x \xleftarrow {\$} D\) to mean that x is chosen according to the distribution D. If S is a set, then \(x \xleftarrow {\$} S\) means that x is sampled uniformly at random from S. The left arrow \(\leftarrow\) denotes the assignment operator. We use the notation \({\mathcal {B}}_p\) to denote the Bernoulli distribution with parameter p. A random variable with \({\mathcal {B}}_p\) takes the true value with probability p and the false value with probability \(1p\). We denote the logarithm with base 2 by log and the one with base e by ln.
Gaussian distributions
For any \(\sigma , c \in {\mathbb {R}}\) with \(\sigma > 0\), the Gaussian function with parameters \(\sigma\) and c over \({\mathbb {R}}\) is defined as \(\rho _{\sigma , c}(x) = \exp (\frac{(xc)^2}{2\sigma ^2})\). We denote the normal (or continuous Gaussian) distribution with parameters \(\sigma\) and c over \({\mathbb {R}}\) by \({\mathcal {N}}(c, \sigma ^2)\), which has the probability density function \(\rho _{\sigma , c}(x)/(\sigma \sqrt{2\pi })\). For any countable set \(S \subsetneq {\mathbb {R}}\), we denote \(\rho _{\sigma , c}(S)\) by the sum \(\sum _{x \in S}\rho _{\sigma , c}(x)\). If \(\rho _{\sigma , c}(S)\) is finite, we define the discrete Gaussian distribution with parameters \(\sigma\) and c over S as \(D_{S,\sigma ,c}(x) = \rho _{\sigma , c}(x)/\rho _{\sigma , c}(S)\), and denote the distribution by \(D_{S,\sigma ,c}\). The parameter \(\sigma\) (resp. c) is often called the standard deviation (resp. center) of the distribution. Note that when \(c = 0\), we omit it in index notation, e.g. \(\rho _{\sigma }(x) = \rho _{\sigma , 0}(x)\) and \(D_{S, \sigma }(x) = D_{S, \sigma , 0}(x)\).
Smoothing parameter
The smoothing parameter is a lattice quantity proposed by Micciancio and Regev (2004). For \(\epsilon > 0\), the smoothing parameter \(\eta _{\epsilon }(\Lambda )\) of a lattice \(\Lambda\) is the smallest value \(\sigma > 0\) such that \(\rho _{\frac{1}{\sigma \sqrt{2\pi }}}(\Lambda ^* \backslash \{0\}) \le \epsilon\), where \(\Lambda ^*\) denotes the dual of \(\Lambda\). The initial definition (Micciancio and Regev 2004) of the smoothing parameter is \(\sqrt{2\pi }\) times that our definition, as it is defined by the Gaussian parameter \(s = \sigma \sqrt{2\pi }\).
Lemma 1
(Micciancio and Regev 2004) For any positive real \(\epsilon > 0\), we have \(\eta _{\epsilon }({\mathbb {Z}}) \le \eta _{\epsilon }^+({\mathbb {Z}})\):
The following lemma states that the total Gaussian measure over \({\mathbb {Z}}\), i.e. \(\rho _{\sigma , c}({\mathbb {Z}})\), is essentially the same for any center c when the standard deviation \(\sigma\) exceeds the smoothing parameter \(\eta _{\epsilon }({\mathbb {Z}})\).
Lemma 2
(Micciancio and Regev 2004; Gentry et al. 2008) For any \(\epsilon \in (0, 1)\), \(\sigma > \eta _{\epsilon }({\mathbb {Z}})\) and \(c \in {\mathbb {R}}\), we have
We need the following lemma to make the running time of a discrete Gaussian sampling algorithm independent of \(\sigma\). One may get a similar result by the proof of Lemma 4.4 in Micciancio and Regev (2004). The difference here is that \(\sigma\) is larger than \(\eta _{\epsilon }^+({\mathbb {Z}})\) rather than \(\eta _{\epsilon }({\mathbb {Z}})\). We give a brief proof in “Appendix”.
Lemma 3
For any \(\epsilon > 0\), \(\sigma > \eta _{\epsilon }^+({\mathbb {Z}})\), we have:
R\(\acute{e}\)nyi divergence
We recall the definition of the R\(\acute{e}\)nyi divergence, which is extensively used in the cryptographic security proofs currently.
Definition 1
(Bai et al. 2015) Let \({\mathcal {P}}\), \({\mathcal {Q}}\) be two distributions such that Supp(\({\mathcal {P}}\)) \(\subseteq\) Supp(\({\mathcal {Q}}\)). For \(a \in (1, +\infty )\), we define the R\(\acute{e}\)nyi divergence of order a by
In addition, we define the R\(\acute{e}\)nyi divergence of \(+\infty\) by \(R_{\infty }({\mathcal {P}}, {\mathcal {Q}}) = \max \nolimits _{x\in Supp({\mathcal {P}})} \frac{{\mathcal {P}}(x)}{{\mathcal {Q}}(x)}\).
The R\(\acute{e}\)nyi divergence is not a distance, but it does verify cryptographically useful properties. We recall some important properties of the R\(\acute{e}\)nyi divergence from (Bai et al. 2015; van Erven and Harremoës 2014).
Lemma 4
(Bai et al. 2015) Let \(a \in (1, +\infty ]\). Let \({\mathcal {P}}\) and \({\mathcal {Q}}\) denote distributions with Supp(\({\mathcal {P}}\)) \(\subseteq\) Supp(\({\mathcal {Q}}\)). Then the following properties hold:

\(\varvec{Data\ Processing\ Inequality.}\) For any function f, where \(f({\mathcal {P}})\) (resp. \(f({\mathcal {Q}})\)) denotes the distribution of f(y) induced by sampling y from \({\mathcal {P}}\) (resp. \({\mathcal {Q}}\)), we have:
$$\begin{aligned} R_a\left( f({\mathcal {P}}), f({\mathcal {Q}})\right) \le R_a({\mathcal {P}}, {\mathcal {Q}}). \end{aligned}$$ 
\(\varvec{Multiplicativity.}\) For two families of distributions \(({\mathcal {P}}_i)_i, ({\mathcal {Q}}_i)_i\),
$$\begin{aligned}R_a(\prod \nolimits _i{\mathcal {P}}_i, \prod \nolimits _i{\mathcal {Q}}_i) = \prod \nolimits _i R_a({\mathcal {P}}_i, {\mathcal {Q}}_i).\end{aligned}$$ 
\(\varvec{Probability\ Preservation.}\) For any event \(E \subseteq Supp(Q)\) and \(a \in (1, +\infty )\),
$$\begin{aligned} {\mathcal {Q}}(E) \ge {\mathcal {P}}(E)^{\frac{a}{a1}}/R_a({\mathcal {P}}, {\mathcal {Q}})\ \ and\ \ {\mathcal {Q}}(E) \ge {\mathcal {P}}(E)/R_{\infty }({\mathcal {P}}, {\mathcal {Q}}). \end{aligned}$$
In practice, when we approximate the original distribution \(D_{{\mathbb {Z}}, \sigma , c}\) by sampling from a finite set, we can use the following lemma to bound the R\(\acute{e}\)nyi divergence between the real and ideal distributions.
Lemma 5
(Prest 2017) Let \({\mathcal {P}}\) and \({\mathcal {Q}}\) be two distributions such that Supp(\({\mathcal {P}}\)) \(\subseteq\) Supp(\({\mathcal {Q}}\)). For any \(x \in\) Supp(\({\mathcal {P}}\)), if there exists \(\delta >0\) such that \(\frac{{\mathcal {P}}(x)}{{\mathcal {Q}}(x)} \le 1 + \delta\) then, for \(a \in (1, +\infty )\): \(R_a({\mathcal {P}}, {\mathcal {Q}}) \le 1 + \delta\).
When we use the double type or 64bit integers to approximate floatingpoint numbers, which results in the relative error between the real and ideal distributions, the following lemma gives a bound of the R\(\acute{e}\)nyi divergence between the distributions.
Lemma 6
(Prest 2017) Let \({\mathcal {P}}\) and \({\mathcal {Q}}\) be two distributions of the same support S. Suppose that the relative error between \({\mathcal {P}}\) and \({\mathcal {Q}}\) is bounded: \(\forall x \in S\), \(\exists \delta > 0\), such that \(\left \frac{{\mathcal {P}}(x)}{{\mathcal {Q}}(x)}  1 \right \le \delta\). Then, for \(a \in (1, +\infty )\):
Isochronous algorithm
When an algorithm is applied to different scenarios, the sensitive variables of the algorithm may be different. To resist against timing attacks, we only need to keep the running time of the algorithm from leaking sensitive parameters for a particular scenario. In this work, when we show that our algorithms are provably resistant against timing attacks, we use the following definition of the isochronous algorithm to capture this idea.
Definition 2
(Howe et al. 2020) Let \({\mathcal {A}}\) be a (probabilistic or deterministic) algorithm with set of input variables \({\mathcal {I}}\), set of output variables \({\mathcal {O}}\), and let \({\mathcal {S}} \subseteq {\mathcal {I}} \cup {\mathcal {O}}\) be the set of sensitive variables. We say that \(\mathcal {}A\) is perfectly isochronous with respect to \({\mathcal {S}}\) if its running time is independent of any variables in \({\mathcal {S}}\).
In addition, we say that \({\mathcal {A}}\) is statistically isochronous with respect to \({\mathcal {S}}\) if there exists a distribution \({\mathcal {D}}\) independent of all the variables in \({\mathcal {S}}\), such that the running time of \({\mathcal {A}}\) is statistically close (for a clearly identified divergence) to \({\mathcal {D}}\).
In this paper, we refer to the implementations of FALCON and previous Gaussian samplers (Howe et al. 2020; Prest et al. 2020) and assume that the four basic operators over integers, the addition, subtraction, and multiplication over floatingpoint numbers, the rounding, ceiling, floor, and bitwise operations are isochronous. For the latticebased trapdoor sampling, the short lattice basis determines \(\sigma\) of the Gaussian distribution over the integers, while c is usually variable. As the floatingpoint division rarely offers constanttime execution guarantee (Zhao et al. 2020b), we precompute the parameters \(1/\sigma\), k, \(\lceil k \rceil\) and \(P_k\) in Algorithm 4 and Algorithm 6. We can also precompute \(S = \rho _{\sigma , c}({\mathbb {Z}})\) in Algorithm 8 by Lemma 2^{Footnote 3}. While the implementation of FALCON’s sampler only precomputes \(1/\sigma\)^{Footnote 4}, Algorithm 4 may have to precompute a few more parameters. But its memory consumption is also tolerable (see Table 1).
Von Neumann’s algorithm
Von Neumann’s algorithm (Forsythe 1972) can sample a variable from the exponential distribution and avoid floatingpoint operations. It is the foundation of our exponential Bernoulli sampling algorithm. We review it in Algorithm 1.
Lemma 7
The variable x output by Algorithm 1 obeys the exponential distribution.
Proof
We first analyze the probability that n is odd in step7 of Algorithm 1 for any \(x_2 \in [0,1)\). If \(n = n_0\) in step 7, there are \(n_0 + 1\) variables \(u_1, u_2, ..., u_{n_0 + 1}\) uniformly sampled from [0, 1) in step 2 and step 5. These variables satisfy that \(x_2> u_1> ... > u_{n_0}\) and \(u_{n_0} \le u_{n_0 + 1}\). The probability that \(u_1,...,u_{n_0}\) are all less than \(x_2\) is \(x_2^{n_0}\). In addition, the probability that they are in descending order, which is one of the possible \(n_0!\) permutations, is \(x_2^{n_0}/{n_0}!\). As \(u_{n_0} \le u_{n_0 + 1}\), the probability that \(n = n_0\) in step 7 is \(x_2^{n_0}/{n_0}!  x_2^{n_0 + 1}/(n_0 + 1)!\). For any \(x_2 \in [0,1)\), the probability that n is even in step 7 is
Because \(x_2\) is sampled from [0, 1) uniformly, the probability that n is odd averaged over \(x_2\) is \(\int _0^1(1  \exp (x_2))dx_2 = \exp (1)\). Thus, the probability density that the algorithm terminates with a particular value of \(x_1\) and \(x_2\) is \(\exp ((x_1 + x_2))\), as required. \(\square\)
A tool: exponential Bernoulli sampling algorithm
In this section, we propose a tool, the exponential Bernoulli sampling algorithm, to construct the Gaussian samplers based on rejection samplings. We first introduce the basic algorithm and analyze its relative error. Then we show how to make it isochronous in the particular application. Finally, we apply it to the FALCON signature.
Basic exponential Bernoulli sampling algorithm
When sampling a Bernoulli variable with parameter \(\exp (x)\), the polynomial approximation method needs 14 floatingpoint multiplications in which 12 floatingpoint multiplications are used to compute the exponential function (Zhao et al. 2020b; Prest et al. 2020). To avoid these 12 floatingpoint multiplications, we can sample the Bernoulli variable utilizing step 2 step 6 in Algorithm 1 for any \(x \in [0,1)\). For any \(x \ge 0\), let \(x = u_1\ln {2} + u_2\) where \(u_1\) is a nonnegative integer and \(0 \le u_2 < \ln {2}\), then \(\exp (x)\) can be written as \(\exp (x) = 2^{u_1} \exp (u_2)\). The first part \(2^{u_1}\) can be sampled efficiently in binary form, and the second part \(\exp (u_2)\) can be sampled efficiently too as \(u_2 \in [0,1)\). We present the basic exponential Bernoulli sampling algorithm in Algorithm 2.
Theorem 1
The output b of Algorithm 2 obeys the Bernoulli distribution with parameter \(\exp (x)\).
Proof
The probability that the variable \(b_1\) is true is \(2^{u_1}\). Because step 5–step 9 in Algorithm 2 is equivalent to step 2–step 6 in Algorithm 1, the probability that the variable \(b_2\) is true is \(\exp (u_2)\) according to the proof of Lemma 7. As \(x = u_1\ln {2} + u_2\), the probability that the output b is true is \(\exp (x)\). \(\square\)
In the next theorem, we estimate the expected sampling number of \(r_3\) which is closely related to the efficiency of Algorithm 2.
Theorem 2
For any \(u_2 \in [0,\ln {2})\), in one running process of Algorithm 2, the expected sampling number S of \(r_3\) is less than 2.
Proof
The probability that \(n = n_0\) in step 10 of Algorithm 2 is \(u_2^{n_0}/n_0!  u_2^{n_0 + 1}/(n_0 + 1)!\). If \(n = n_0\) in step 10, the sampling number of \(r_3\) is \(n_0 + 1\). The expected sampling number S of \(r_3\) can be written as
the last inequality is deduced from \(u_2 < \ln {2}.\) \(\square\)
When we prove the correctness of Algorithm 2 in Theorem 1, we treat it as an ideal algorithm that samples the variables \(b_1\) and \(b_2\) with the exact parameters \(2^{u_1}\) and \(\exp (x + u_1 * \ln {2})\). In practice, it is necessary to bound the relative error of Algorithm 2 due to the approximations of \(u_2\) and \(r_3\).
Theorem 3
Let \(P^{(R)}\) be the probability that the output b of the real algorithm is true where the floating numbers \(u_2\), \(r_2\), \(r_3\) of Algorithm 2 keep m (\(m \ge 5\)) significant bits. Let \(P^{(I)}\) be the probability that the output b of the ideal algorithm is true where \(u_2\), \(r_2\), \(r_3\) have infinite significant bits. Then the relative error between \(P^{(R)}\) and \(P^{(I)}\) satisfies that
Proof
Let \(P^{(R)}_1\) and \(P^{(R)}_2\) be the probabilities that the variables \(b_1\) and \(b_2\) of the real algorithm are true respectively, \(P^{(I)}_1\) and \(P^{(I)}_2\) be the probabilities that the variables \(b_1\) and \(b_2\) of the ideal algorithm are true respectively. Because \(u_1\) is an integer and the sampling process of \(b_1\) (step 2–step 3) only involves the integer operations, \(P^{(R)}_1\) and \(P^{(I)}_1\) should be the same exactly. Thus, the relative error between \(P^{(R)}\) and \(P^{(I)}\) is the same as that between \(P^{(R)}_2\) and \(P^{(I)}_2\)
The difference between \(P^{(R)}_2\) and \(P^{(I)}_2\) stems from the approximations of \(u_2\) and \(r_3\). We define an intermediate algorithm where \(u_2\) has m significant bits and \(r_2\) and \(r_3\) have infinite bits. Let \(P_2^{(IN)}\) be the probability that the variable \(b_2\) in the intermediate algorithm is true. Then we have
As \(u_2\) retains m significant bits and \(u_2 \in [0,\ln {2})\), the difference between the approximate number \(u^{(R)}_2\) and the accurate number \(u^{(I)}_2\) is not greater than \(2^{m}\). So we have
We will next analyze the process of the while loop (step 6  step 9) of Algorithm 2. For the ith while loop (\(i \ge 1\)), we denote \(A_i\) as the event \(r_2 = r_3\), \(B_i\) as the event \(r_2 > r_3\) and \(C_i\) as the event \(r_2 < r_3\). In the intermediate algorithm, the uniformly random floatingpoint number \(r^{(I)}_3\) has infinite significant bits, so event \(A_i\) will never happen for any \(i \ge 1\). In the real algorithm, both \(r^{(R)}_2\) and \(r^{(R)}_3\) have m significant bits and \(r^{(R)}_3\) is sampled uniformly, the probability of event \(A_i\) is exactly \(2^{m}\) for any \(i \ge 1\). Our goal is to calculate the probability sum of all \(A_i\) in the real algorithm which is the upper bound of \(P^{(R)}_2  P^{(IN)}_2\).
For \(i \ge 2\), event \(B_{i1}\) must have happened before the ith loop, which means \(\Pr \{B_i\} = \Pr \{B_{i1}\} \cdot r^{(R)}_2\) and \(\Pr \{A_i\} = \Pr \{B_{i1}\} \cdot 2^{m}\). Because \(r^{(R)}_2\) is always no more than \(u^{(R)}_2\) and \(u^{(R)}_2 \le u^{(I)}_2 + 2^{m}\), we have \(\Pr \{A_i\} \le \left( u^{(I)}_2 + 2^{m}\right) ^{i1} \cdot 2^{m}\) for any \(i \ge 1\). As \(u^{(I)}_2 \in [0,\ln {2})\) and \(m \ge 5\), we can give an upper bound:
Combining the constraints (1) (2) (3) (4), we can bound the relative error between \(P^{(R)}\) and \(P^{(I)}\),
which concludes the proof. \(\square\)
In our implementation of Algorithm 2, we use the double type meeting the IEEE754 standard in the floatingpoint arithmetics and 64bit integers to approximate the floating numbers \(u_2\), \(r_2, r_3\), so the number m of significant bits is at least 52. The relative error of our implementation is no more than
Isochronous exponential Bernoulli sampling algorithm
In this subsection, we focus on how to design an isochronous exponential Bernoulli sampling algorithm to resist timing attacks. It seems difficult to make the input and output of Algorithm 2 isochronous with respect to its running time while retaining its efficiency. Because the sampling number of \(r_3\) determining the value of \(b_2\) is closely related to the running time of Algorithm 2. To overcome the obstacle, we restrict the application of our algorithm to the rejection sampling algorithm. In these applications, the distribution of output b only relies on the rejection probability of the rejection sampling algorithm. For the rejection sampling algorithm resisting timing attacks, the rejection probability can not reveal the information of the sensitive variables. Therefore there is no need to hide the value of output b. In Algorithm 3, we successfully make the input independent of the running time while retaining the efficiency.
Theorem 4
The output of Algorithm 3 obeys the Bernoulli distribution with parameter \(\exp (x)\).
Proof
After step 3, \(r_1\) is a uniformly random integer in \([0, 2^{u_1}1]\). The probability that \(b_1\) is true is \(2^{u_1}\). There are two cases where \(b_2\) is true. The first one is \(u_2 < r_3\) in step 7, and the second one is that \(u_2 > r_3\) in step 7 and n is even in step 12. As \(u_2 < t\), their probabilities are \(1u_2\) and \(\sum _{i=1}^{\infty }\left( u_2^{2i}/(2i)!u_2^{2i+1}/(2i+1)!\right)\) respectively. We can get that the probability that \(b_2\) is true is \(\exp (u_2)\) by adding the two probabilities, which concludes the proof. \(\square\)
Theorem 5
Algorithm 3 is perfectly isochronous with respect to its input x. What’s more, in one running process of Algorithm 3, the expected sampling number S of \(r_3\) is \(\exp (t)\).
Proof
We need to prove that in Algorithm 3, \(u_1\) and \(u_2\) are respectively independent of the running times of sampling \(b_1\) and \(b_2\). The sample of \(b_1\) consists of one uniform sampling, one AND operation, and one comparison, which is independent of \(u_1\). The sample of \(b_2\) consists of the uniform samplings of \(r_3\), the comparisons of \(r_2\) and \(r_3\), one comparison of \(u_2\) and \(r_3\). The sampling number of \(r_3\) is the same as the number of comparisons between \(r_2\) and \(r_3\). The distribution of the sampling number of \(r_3\) depends on t rather than \(u_2\). The expected sampling number S of \(r_3\) is
\(\square\)
In our implementation of Algorithm 3, we set \(t = 178 * 2^{8}\) to avoid the error of the floatingpoint number. So the relative error of Algorithm 3 stems from the approximations of \(u_2\) and \(r_3\), which is the same as Algorithm 2. With the almost same proof of Theorem 3, we can get the same upper bound of the relative error of Algorithm 3 in practice.
Laziness Technique. When we use 64bit integers to represent \(r_1\), \(r_2\), and \(r_3\), there are at most \(64 \cdot (1+e)\) bits randomness required on average during each run of Algorithm 3. These integers are only involved in the comparison operations, so we can employ the laziness technique (Ducas and Nguyen 2012; Howe et al. 2020) to speed up the implementation of Algorithm 3. When at least one of the two compared 64bit integers is sampled uniformly, the comparison can be determined by the first i significant bits, except with probability \(2^{i}\) (exactly when the first i bits of the two integers match). Therefore it is unnecessary to sample all bits of the integers at one time. In our implementation, the 64bit integer is sampled 8 bits by 8 bits until the comparison is determined or all 64 bits are sampled. It is easy to estimate that each sampled integer needs less than 9 random bits on average in this way. So the expected number of random bits is less than \(9 \cdot (1+e)\) in one running process of Algorithm 3 with this technique.
It should be noted that the random number \(r_3\) in step 6 of Algorithm 3 has to be compared with t and \(u_2\). We need to carefully select a value of the public parameter t to ensure that these two comparisons with the laziness technique don’t reveal the information of \(u_2\). It is another reason why we set \(t = 178*2^{8}\). The comparison between \(r_3\) and \(178 * 2^{8}\) only needs the first 8 bits. If \(r_3 \ge 178 * 2^{8}\), we can determine \(r_3> \ln {2} > u_2\) through the first 8 bits. Otherwise, \(r_3\) is a uniform random number in \([0, 178 * 2^{8})\). In this case, for the first 8 bits of \(r_3\) and \(u_2\), the probability that they are equal is 1/178. For the ith bits of \(r_3\) and \(u_2\) (\(i \ge 9\)), the probability that they are equal is 1/2. Therefore the value of \(u_2\) does not impact the running time of the two comparisons.
Applications and performances
Our exponential Bernoulli sampling algorithm can be applied to Gaussian samplers based on rejection samplings. In this section, we choose FALCON, a latticebased signature scheme of the thirdround candidates in the NIST PQ project, as an example of the application of Algorithm 3.
The Gaussian sampler in FALCON is based on rejection sampling, so it needs to sample an exponential Bernoulli variable \({\mathcal {B}}_p\) with \(p = \exp (x)\ (x \ge 0)\). Under the parameter sets of FALCON, we can assume \(x \in [0,64 \cdot \ln {2}]\). To achieve the security goal of FALCON, the relative error between the ideal and real probabilities can not be greater than \(2^{43}\) (Howe et al. 2020; Prest et al. 2020). The relative error of our algorithm is not greater than \(2^{48}\), which satisfies the requirement of FALCON. In the latest implementations of FALCON, it uses a polynomial approximation \(P_{\mathrm{facct}}(x)\) (Zhao et al. 2020a) of the exponential function \(\exp (x)\) on \([\ln {2},0]\) to sample \({\mathcal {B}}_p\).
In the round 3 NIST package, FALCON contains the optimized implementations at two security levels. There are three implementations at each security level. Because we don’t consider the specific instruction set optimization, we choose the ones relying on the IEEE754 floatingpoint type to perform our benchmark test. They can work on x68, ARM, and POWER/PowerPC systems.
We apply Algorithm 3 to FALCON and compile the implementations with the compiler option O3 enabled. The benchmark results are showed in Tables 2 and 3. The new discrete Gaussian sampling implementation is about 1.19x faster than the original. While the LDL tree has been built upon the secret key, the new signature generation implementation is about 1.15x–1.16x faster than the original. In this paper, all our experiments are performed on a single Intel Core i74790 CPU core at 3.6 GHz.
The implementation of FALCON does not offer the interface to invoke the algorithm for sampling the exponential Bernoulli variable. So we separate this module and compare it with Algorithm 3. We choose the ChaCha20 in OpenSSL and the AES256 counter mode with hardware AES instructions (AESNI) as the pseudorandom number generators (PRNG) and use g++ 7.5.0 to compile our implementations with the compiler options O3 and maes enabled^{Footnote 5}. Table 4 lists the efficiencies of different implementations. If the ChaCha20 in OpenSSL is used as the PRNG, Algorithm 3 is 1.25x faster than the algorithm in FALCON. Algorithm 3 with AESNI is 1.49x faster than the algorithm in FALCON. The reason is that Algorithm 3 avoids a large number of floatingpoint multiplications and needs more random numbers. The efficiency of the PRNG has more influence on Algorithm 3 than the algorithm in FALCON.
Sampling integers using a discrete Gaussian distribution with a small standard deviation
In this section, we first introduce our new Gaussian sampler based on rejection sampling, and analyze its correctness and the requirements . Then we use our tool Algorithm 3 to implement our Gaussian sampler and make it isochronous concerning its inputs and output. Finally, we compare our implementations with previous works, and apply it to the latticebased cryptography library PALISADE.
New discrete Gaussian sampling algorithm
We build our Gaussian sampler by employing the rejection sampling strategy of Karney’s sampler and present it in Algorithm 4. The rejection sampling strategy of Karney’s sampler can be viewed as the generalization of that of the Gaussian sampler in the BLISS signature scheme (BLISS sampler) (Ducas et al. 2013). The strategy of Karney’s sampler is suitable for the distribution with arbitrary \(\sigma > \sigma _0\) and c, while the strategy of the BLISS sampler is only suitable for the distribution with \(\sigma\) being an integral multiple of \(\sigma _0\) and \(c = 0\). Du et al. (2019) proposed a similar algorithm based on the rejection sampling strategy of Karney’s sampler. Although it is faster than Karney’s sampler, it does not solve the crucial security problem of Karney’s Sampler. To make Algorithm 4 resistant to timing attacks and not affecting the security of cryptographic schemes, we need to craft the modules in Algorithm 4, especially the base sampler (step 3) and the algorithm sampling the exponential Bernoulli variable (step 9). In addition, we also use \(C(\sigma ) \in (0,1]\) to make \(\sigma\) independent from the rejection rate of Algorithm 4. These are the necessary steps for the practicality of Karney’s sampler, which are also our main contributions in this section.
To prove the correctness of Algorithm 4, there are two issues to be addressed: (1). After step 10 and step 11 in Algorithm 4, each integer \(z^*\) can be calculated by exactly one tuple \((x^*, y^*, s^*)\); (2). We need to show the probability of sampling \(z^*\) is exactly proportional to \(\rho _{{\mathbb {Z}}, \sigma , c}(z^*)\). Because our rejection sampling strategy is the same as that of Karney’s sampler, the idea of proof is implicit in Karney (2016), Du et al. (2019). We provide the detailed proof in “Appendix” for completeness.
Theorem 6
The output z sampled by Algorithm 4 is distributed as \(D_{{\mathbb {Z}}, \sigma , c}\).
The floatingpoint error in Algorithm 4 causes the error of the Bernoulli variable \({\mathcal {B}}_{p_{rej}}\), so the error of Algorithm 4 stems from the approximate distribution \(D^{(R)}_{{\mathbb {N}}, \sigma _0}\) and the approximate Bernoulli variable \({\mathcal {B}}^{(R)}_{p_{rej}}\), which is the same as FALCON’s sampler. We utilize the technique in Howe et al. (2020) to estimate the security loss of the cryptographic scheme invoking Algorithm 4 in practice.
Theorem 7
Let \(\lambda ^{(I)}\) (resp. \(\lambda ^{(R)}\)) be the security parameter of an implementation using the ideal distributions \(D^{(I)}_{{\mathbb {N}}, \sigma _0}\) and \({\mathcal {B}}^{(I)}_{p_{rej}}\) (resp. the real distributions \(D^{(R)}_{{\mathbb {N}}, \sigma _0}\) and \({\mathcal {B}}^{(R)}_{p_{rej}}\)). Let the numbers of sampling from \(D_{{\mathbb {N}}, \sigma _0}\) and \({\mathcal {B}}_{p_{rej}}\) be \(Q_{bs}\) and \(Q_{exp}\). If the following conditions are respected, at most two bits of security are lost. In other words, \(\lambda ^{(I)}  \lambda ^{(R)} \le 2\).
In the proof of Theorem 7, we can first bound the R\(\acute{e}\)nyi divergence between \({\mathcal {B}}^{(R)}_{p_{rej}}\) and \({\mathcal {B}}^{(I)}_{p_{rej}}\) by Lemma 6 and the first condition of Theorem 7. Then, combining Lemma 4 and the upper bound of \(R_{2\lambda ^{(R)}+1}\left( D^{(R)}_{{\mathbb {N}}, \sigma _0}, D^{(I)}_{{\mathbb {N}}, \sigma _0}\right)\), we can estimate the security loss. We provide the detailed proof in “Appendix”.
As suggested by the NIST PQ project, an attacker can make no more than \(2^{64}\) signature or decryption queries. The lattice dimension of a cryptographic scheme is usually less than \(2^{12}\). Therefore, it is reasonable to assume that the cryptographic scheme calls Algorithm 4 less than \(2^{76}\) in most applications. According to the acceptance probability of Algorithm 4 in Lemma 8, if \(\sigma _0 > 1 / \sqrt{2\ln {2}}\), the expected number of trials of Algorithm 4 is no more than 3. Then we have \(Q_{bs} = Q_{exp} \le 3 \cdot 2^{76} \le 2^{78}\). If a cryptographic scheme invoking algorithm 4 has 256bit security, the concrete numerical values of the conditions in Theorem 7 are
Isochronous implementations
In this subsection, we first analyze the conditions to make Algorithm 4 isochronous with respect to the standard deviation \(\sigma\), the center c, and its output z, then introduce how to implement Algorithm 4 and select the appropriate parameters to satisfy those conditions.
To get an isochronous implementation of Algorithm 4, there are five necessary conditions:

1.
The base sampler sampling from \(D_{{\mathbb {N}}, \sigma _0}\) does not leak any information of x.

2.
The sampling time of b is independent of \(d(d+2kx)/(2\sigma ^2)\).

3.
The acceptance probability of Algorithm 4 does not rely on c.

4.
The sampling time of y is independent of k.

5.
The acceptance probability of Algorithm 4 does not rely on \(\sigma\).
The implementation isochronous concerning z and c needs to satisfy the first three conditions. The implementation isochronous concerning z, c, and \(\sigma\) needs to satisfy all five conditions.
For the first condition, we adopt two samplers to instantiate Algorithm 4 respectively. The first one is the CDT sampler. We precompute the approximate cumulative distribution table of \(D_{{\mathbb {N}}, 1}\) with 80bit precision shown in Table 5. To produce a sample, we generate a random value r in [0, 1) with the same precision and return the index of the last entry in the table that is greater than r. To make the running time isochronous with respect to the output, the CDT sampler has to read the entire table and compare r with each entry. For any \(a \le 512\), our experiment shows that the R\(\acute{e}\)nyi Divergence \(R_a\left( D^{(R)}_{{\mathbb {N}}, 1}, D^{(I)}_{{\mathbb {N}}, 1}\right)\) between the real distribution \(D^{(R)}_{{\mathbb {N}}, 1}\) and the ideal distribution \(D^{(I)}_{{\mathbb {N}}, 1}\) is bounded by \(1 + 2^{80}\), which satisfies the second condition in Theorem 7. The advantage of the CDT sampler is that it has a very attractive performance.
Another one is the binary sampler, which is introduced in the BLISS sampler Ducas et al. (2013). It sample from the distribution \(D_{{\mathbb {N}}, 1/\sqrt{2\ln {2}}}\). In Algorithm 5, we show how to make the binary sampler isochronous concerning its output. We cut the tail of the distribution and only sample the nonnegative integers no more than 8. Moreover, the while loop in Algorithm 5 never terminates in advance even if the sample has been determined. As each probability of \(D_{{\mathbb {N}}, 1/\sqrt{2\ln {2}}}\) can be accurately represented in binary form, the output distribution \(D^{(R)}_{{\mathbb {N}}, 1/\sqrt{2\ln {2}}}\) does’t have a relative error. By some simple calculations and Lemma 5, for any \(a \in (1, +\infty )\), we have \(R_a\left( D^{(R)}_{{\mathbb {N}}, 1/\sqrt{2\ln {2}}}, D^{(I)}_{{\mathbb {N}}, 1/\sqrt{2\ln {2}}}\right) \le 1 + 2^{80}\) satisfying the second condition in Theorem 7. The advantage of the binary sampler is that it does not need any precomputation.
For the second condition, as \(C(\sigma ) \in (0,1]\), it is straightforward to sample the exponential Bernoulli variable \({\mathcal {B}}_{p_{rej}}\) through Algorithm 3. The relative error of Algorithm 3 is no more than \(2^{48}\) meeting the first condition in Theorem 7.
For the fourth condition, we use Algorithm 6 to sample y. In our implementation, the public parameter h is equal to 32. The acceptance probability of Algorithm 6 is 0.5, so its running time is independent of k.
Both third and fifth conditions are related to the acceptance probability of Algorithm 4. We utilize the technique in FALCON’s sampler to satisfy these two conditions. In the following lemma, we prove that \(\sigma \ge \eta ^+_{\epsilon }({\mathbb {Z}})\) is sufficient for the independence between c and the acceptance probability of Algorithm 4. Moreover, if \(\sigma \ge t\sigma _0\), where t, \(\sigma _0\) are public parameters, and \(C(\sigma )\) in Algorithm 4 has an appropriate value, the acceptance probability could not rely on \(\sigma\).
Lemma 8
Let \(\epsilon \in (0,1)\), \(c \in {\mathbb {R}}\), \(k = \frac{\sigma }{\sigma _0}\) and t be a positive integer. If the standard deviation \(\sigma\) meets the condition that \(\sigma \ge \eta _{\epsilon }^+({\mathbb {Z}})\), we can set \(C(\sigma ) = 1\) and \(p_{\sigma } = \frac{\rho _{\sigma }({\mathbb {Z}})}{2\lceil k \rceil \rho _{\sigma _0}({\mathbb {N}})}\). The acceptance probability \(P_\mathrm{true}(\sigma , c)\) of the while loop in Algorithm 4 satisfies
If the standard deviations \(\sigma _0\), \(\sigma\) meet the condition that \(\sigma \ge max\{\eta _{\epsilon }^+({\mathbb {Z}}), t\sigma _0\}\), we can set \(C(\sigma ) = \frac{t\lceil k \rceil }{(t+1)k} \le 1\) and \(p = \frac{t\sigma _0\sqrt{2\pi }}{2(t+1)\rho _{\sigma _0}({\mathbb {N}})}\). The acceptance probability \(P_\mathrm{true}(\sigma , c)\) of the while loop in Algorithm 4 satisfies
Proof
We use T to represent the set of all tuples \((x^*, y^*, s^*)\) that satisfies the conditions in step 10 and step 11 in Algorithm 4. According to the proof of Theorem 6, each integer \(z^*\) can be calculated by exactly one tuple \((x^*, y^*, s^*)\) in set T. Thus, the acceptance probability \(P_{true}(\sigma , c)\) of the while loop in Algorithm 4 is:
For any \(\epsilon \in (0, 1)\), if \(\sigma \ge \eta _{\epsilon }^+({\mathbb {Z}}) \ge \eta _{\epsilon }({\mathbb {Z}})\) and \(C(\sigma ) = 1\), it holds from Lemma 2 that
which concludes the first part of Lemma 8.
To make \(P_{true}(\sigma , c)\) independent of \(\sigma\), we need to bound \(\rho _{\sigma }({\mathbb {Z}})\). For any \(\epsilon \in (0, 1)\), if \(\sigma \ge \eta _{\epsilon }^+({\mathbb {Z}}) \ge \eta _{\epsilon }({\mathbb {Z}})\), the following relationship holds from Lemmas 2 and 3:
Now, we can use \(C(\sigma )\) to eliminate \(\sigma\) and \(\lceil k \rceil\). As \(k = \frac{\sigma }{\sigma _0}\), t is a positive integer and \(\sigma \ge t\sigma _0\), we can set \(C(\sigma ) = \frac{t\lceil k \rceil }{(t+1)k} \le 1\). Then, we have:
which concludes the second part of Lemma 8. \(\square\)
In the remaining part of this subsection, we will analyze the influence of the acceptance probability \(P_\mathrm{true}(\sigma , c)\) on the running time of Algorithm 4 and the adversary advantage. We only consider the scenarios where both \(\sigma\) and c are secret. If \(\sigma\) is public, we can ignore the last two conditions. This means we don’t need Algorithm 6 and parameter \(C(\sigma )\) to hide \(\sigma\). In this case, we can utilize a similar method to analyze the success probability of the adversary by the first part of Lemma 8. We omit the details here.
In the following Theorem 8, we show that Algorithm 4 is perfect isochronous with respect to z and statistically isochronous for the R\(\acute{e}\)nyi divergence with respect to \(\sigma\), c.
Theorem 8
Let \(\epsilon \in (0,1)\), \(c \in {\mathbb {R}}\), \(k = \frac{\sigma }{\sigma _0}\) and t be an positive integer. Let \(\sigma _0\), \(\sigma\) be the standard deviations such that \(\sigma _0 \in [\frac{1}{\sqrt{2\ln (2)}}, 1]\), \(\sigma \ge max\{\eta ^+_{\epsilon }({\mathbb {Z}}), t\sigma _0\}\). Let \(C(\sigma ) = \frac{t\lceil k \rceil }{(t+1)k} \le 1\), \(p = \frac{t\sigma _0\sqrt{2\pi }}{2(t+1)\rho _{\sigma _0}({\mathbb {N}})}\) be constants in (0, 1). Suppose that the implementation of Algorithm 4 satisfies the first, second, and fourth conditions. Its running time follows a distribution \(T(\sigma , c)\) such that:
for some distribution T independent of \(\sigma\), c, and z.
In the following Theorem 9, we leverage Theorem 8 to prove that the running time of Algorithm 4 does not help an adversary to break the cryptographic scheme. As is analyzed in Howe et al. (2020), we consider that the adversary has access to some function \(g(D_{{\mathbb {Z}}, \sigma , c})\) as well as the running time of Algorithm 4. This is intended to capture the fact that in most applications, the output of Algorithm 4 is not given directly to the adversary, but processed by some function g before.
Theorem 9
Consider an adversary \({\mathcal {A}}\) making \(Q_s\) queries to \(g(D_{{\mathbb {Z}}, \sigma , c})\) for some randomized function g, and solving a search problem with success probability \(2^{\lambda }\) for some \(\lambda \ge 1\). With the notations of Theorem 8, suppose that \(\epsilon \le \frac{1}{\sqrt{24\lambda Q_s}}\). Learning the running time of each call to Algorithm 4 does not increase the success probability of \({\mathcal {A}}\) by more than a constant factor.
To prove Theorem 8, we need to analyze the running time \(T_0\) of one iteration of the while loop in Algorithm 4 and the number \(I(\sigma , c)\) of iterations. Let I be the number of iterations in the ideal case. \(I(\sigma , c)\) (resp. I) is determined by the probability \(P_{true}(\sigma , c)\) (resp. p). As the floatingpoint operations are isochronous and the first, second, and fourth conditions are satisfied, \(T_0\) follows a distribution independent of \(\sigma\), c, and z. By Lemma 4, we have \(R_a(T(\sigma , c),T) = R_a(I(\sigma , c), I)\). We can deduce the upper bound of \(R_a(I(\sigma , c), I)\) from the relation \(P_{true}(\sigma , c) \in p \cdot [1  2\epsilon , 1 + 2\epsilon ]\) and prove the first part of the inequality. The second part can be derived from \(p \in [0.34, 0.36]\). The proof of Theorem 9 requires the flexible applications of the three properties in Lemma 4. We put the detailed deductions in “Appendix”.
Applications and performances
For simplifying the description, we use Type I isochrony and Type II isochrony to clarify whether the implementation hides the information of \(\sigma\). The two isochronous properties satisfy the requirements of the GPV trapdoor sampler and the MP trapdoor sampler. We offer four different implementations of Algorithm 4. The implementations of Type I isochrony are the same as those of Type II isochrony except utilizing Algorithm 6 and parameter \(C(\sigma )\) to hide \(\sigma\). The base samplers are the CDT sampler and the binary sampler. The binary sampler does not need any precomputation, while the CDT sampler need to store the cumulative distribution table in Table 5. Each 80bit integer is represented by two 64bit integers in our implementations, so the table consumes \(20 \cdot 8 = 160\) bytes. All four implementations need at most several hundred bytes of memory consumption containing other precomputed values such as k and \(1/(2\sigma ^2)\).
We choose the AES256 counter mode with the AESNI instruction set as the PRNG and use g++ 7.5.0 to compile the four implementations of Algorithm 4 with the compiler options O3 and maes enabled. In each test, we produce \(10^4\) random centers c in [0,1) for each \(\sigma\), then generate \(10^3\) samples with the same \(\sigma\) and c and measure the consumed time. We calculate the benchmark results in Table 6 based on the measured times. The implementations based on the CDT sampler are a little faster than those based on the binary sampler. In the implementations of Type I isochrony, we set the parameter \(C(\sigma )\) equal to \((2\lceil k \rceil )/(3k)\) for \(\sigma \ge 2\sigma _0\)^{Footnote 6}. It causes a significant decline in the performances. If \(\sigma\) is far greater than \(\sigma _0\), we can adjust \(C(\sigma )\) to get better performances. For example, if \(\sigma \ge 32\sigma _0\), we can set \(C(\sigma )\) equal to \((32\lceil k \rceil )/(33k)\). The implementations based on the binary sampler and the CDT sampler generate \(9.68 \times 10^6\) and \(10.23 \times 10^6\) samples per second for \(\sigma = 32\).
We summarize the performances of previous works in Table 7. We scale all the numbers to be based on 3.6 GHz. The TwinCDT sampler (Melchor and Ricosset 2018) is isochronous concerning c and z. It offers three different tradeoffs between the running time and the precomputation storage. Although it is at least two times faster than our implementations of Type II isochrony for \(\sigma \in [2, 32]\), it has two serious problems. First, it is not generic. For the distributions with different \(\sigma\), TwinCDT sampler needs different precomputations. This problem will reduce the portability of the algorithm and will require to precompute for n distributions with different \(\sigma\) in GPV trapdoor sampler, where n is the dimension of the lattice. Second, its precomputation storage increases significantly along with \(\sigma\) increasing, so it is not suitable for the distribution with large \(\sigma\). The convolution sampler (Micciancio and Walter 2017) satisfies Type I isochrony. Our implementations of Type I isochrony are at least four times faster than the convolution sampler. If we only measure the time during the online phase, our implementations will generate the base samples during the offline phase. We can assume \(\sigma \ge 13 > 4\sqrt{2}\eta _{\epsilon }({\mathbb {Z}})\) for reasonable \(\epsilon\) just as the benchmark test in Micciancio and Walter (2017)^{Footnote 7}. With \(C(\sigma ) = (13\lceil k \rceil )/(14k)\), our implementations of Type I isochrony generate \(12.43 \times 10^6\) and \(13.11 \times 10^6\) samples per second for \(\sigma = 2^{15}\) during the online phase, which are faster than the convolution sampler.
The DWZ sampler (Du et al. 2019) and Karney’s sampler could not resist timing attacks, though the DWZ sampler is faster than our implementations. FALCON’s sampler (Howe et al. 2020) satisfies Type I isochrony. It is faster than our implementations of Type I isochrony. However, its storage and running time increase rapidly along with the maximum standard deviation increasing. In addition, if the minimum standard deviation is larger than 2, our implementations will have better performances.
We apply Algorithm 4 to the latticebased cryptography library PALISADE (Cousins et al. 2018) and use its latest version as of this writing^{Footnote 8} to evaluate the performances. We replace Karney’s sampler in PALISADE with our implementations based on the CDT sampler to sample from the Gaussian distributions over the integers. PALISADE uses the BLAKE2 hash function^{Footnote 9} as the PRNG. In PALISADE, Karney’s sampler generates \(2.49 \times 10^6\) samples per second, while our implementations generate \(5.92 \times 10^6\) and \(3.21 \times 10^6\) samples.
PALISADE contains the implementations (Gür et al. 2019) of IBE (Micciancio and Peikert 2012) and CPABE (Zhang and Zhang 2011). Both rely on the latest MP trapdoor sampler (Genise and Micciancio 2018). We mainly focus on the Glattice sampling algorithm in the MP trapdoor sampler. It determines the running time during the online phase. All standard deviations are public in the Glattice sampling algorithm, so we make it invoke the implementation of Type II isochrony. The parameters include the modulus q, the dimension n of lattice, the base b for the gadget lattice G, and the standard deviation \(\sigma\). Let \((q, n, b, \sigma ) = (12289, 256, 2, 100)\). The benchmark results are shown in Table 8. Our implementation is 1.79x faster than the original.
Sampling integers using a normal distribution
In this section, we first introduce the rejection sampling strategy of the new COSAC sampler, then analyze its correctness and security requirements. Finally, we implement it to confirm the theoretical analysis and compare it with Algorithm 4.
New COSAC sampler
The COSAC sampler (Zhao et al. 2020a) uses the normal distribution \({\mathcal {N}}(0, \sigma )\) and the rejection sampling to sample from the distributions \(D_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) with \(c_F \in [1/2, 1/2]\). The original rejection sampling strategy is independent of \(c_F\) and requires about two trials on average to output a sample. Our new rejection sampling strategy varies with \(c_F\). The expected number of trials of the new sampler is about half of that of the original.
The original COSAC sampler utilizes two normal distributions \({\mathcal {N}}_1(0, \sigma )\) and \({\mathcal {N}}_2(0, \sigma )\) to sample from the distributions \(D_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) with \(c_F \in [1/2, 1/2]\). \({\mathcal {N}}_1(0, \sigma )\) and \({\mathcal {N}}_2(0, \sigma )\) are used to sample the positive and negative integers respectively. In each trial, the sampler chooses \({\mathcal {N}}_1(0, \sigma )\) or \({\mathcal {N}}_2(0, \sigma )\) with equal probability. Assume \({\mathcal {N}}_1(0, \sigma )\) is chosen and \(x^*\) is sampled from \({\mathcal {N}}_1(0, \sigma )\). If there exists a positive integer \(z^*\) such that \(x^* \in (z^*3/2, z^*1/2]\), the positive integer \(z^*\) is accepted with the probability \(\rho _{\sigma ,c_F}(z^*)/\rho _{\sigma }(x^*)\). Assume \({\mathcal {N}}_2(0, \sigma )\) is chosen and \(x^*\) is sampled from \({\mathcal {N}}_2(0, \sigma )\). If there exists a negative integer \(z^*\) such that \(x^* \in [z^*+1/2, z^*+3/2)\), the negative integer \(z^*\) is accepted with the probability \(\rho _{\sigma ,c_F}(z^*)/\rho _{\sigma }(x^*)\). No matter which normal distribution is chosen, the trial fails with a probability of about 0.5.
The new COSAC sampler only needs one normal distribution \({\mathcal {N}}(0, \sigma )\) to sample from the distributions \(D_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) with \(c_F \in [1/2, 1/2]\). In each trial, the sampler sample \(x^*\) from \({\mathcal {N}}(0, \sigma )\). If \(x^* + c_F >= 0\) and \(x^* \in [z^*1c_F, z^*c_F)\), the positive integer \(z^*\) is accepted with the probability \(\rho _{\sigma ,c_F}(z^*)/\rho _{\sigma }(x^*)\); If \(x^* + c_F < 0\) and \(x^* \in [z^*c_F, z^*+1c_F)\), the negative integer \(z^*\) is accepted with the probability \(\rho _{\sigma ,c_F}(z^*)/\rho _{\sigma }(x^*)\).
Our sampler is represented in Algorithm 8. It invokes Algorithm 7 to sample from the distributions \(D_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) with \(c_F \in [1/2, 1/2]\). The correctness of Algorithm 8 and Algorithm 7 is followed by our analysis above and some simple integral calculations. Theorem 10 indicates that the acceptance probability of Algorithm 7 converges to 1 as \(\sigma\) increases. In theory, Algorithm 8 is almost two times faster than the original COSAC sampler.
Theorem 10
The outputs of Algorithm 7 and Algorithm 8 are distributed as \(D_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) and \(D_{{\mathbb {Z}}, \sigma , c}\) respectively. Let \(\epsilon \in (0, 1)\), and \(\sigma > \eta _{\epsilon }({\mathbb {Z}})\), the acceptance probability \(P_{true}(\sigma , c_F)\) of Algorithm 7 satisfies
Due to the impossibility of achieving perfect distributions in practice, we need Theorem 11 to estimate the security loss of the cryptographic schemes invoking Algorithm 8. The error of Algorithm 8 stems from the approximate distribution \({\mathcal {N}}^{(R)}(0,1)\) and the approximate Bernoulli variables \({\mathcal {B}}^{(R)}_{p_0}\), \({\mathcal {B}}^{(R)}_{p_{rej}}\). The proof of Theorem 11 is similar to that of Theorem 7, except estimating the influence of the continuous distribution \({\mathcal {N}}(0, 1)\). We provide the detailed proofs of Theorems 10 and 11 in “Appendix”.
Theorem 11
Let \(\lambda ^{(I)}\) (resp. \(\lambda ^{(R)}\)) be the security parameter of an implementation using the ideal distributions \({\mathcal {N}}^{(I)}(0,1)\) and \({\mathcal {B}}^{(I)}_{p}\) (resp. the real distributions \({\mathcal {N}}^{(R)}(0,1)\) and \({\mathcal {B}}^{(R)}_{p}\)). Let the numbers of sampling from \({\mathcal {N}}(0,1)\) and \({\mathcal {B}}_{p}\) be \(Q_{bs}\) and \(Q_{exp}\), the absolute error of the sample values between \({\mathcal {N}}^{(I)}(0,1)\) and \({\mathcal {N}}^{(R)}(0,1)\) be \(\Delta _{{\mathcal {N}}}\). If the following conditions are respected, at most two bits of security are lost. In other words, \(\lambda ^{(I)}  \lambda ^{(R)} \le 2\).
Performance of new COSAC sampler
We implement Algorithm 8 by modifying the implementation of the original COSAC sampler. The implementation utilizes AVX2 intrinsic instructions to improve the performance and employ the Box–Muller continuous Gaussian sampler (Hülsing et al. 2018) to sample from \({\mathcal {N}}(0, 1)\). The AES256 counter mode with the ASENI instructions is used as the PRNG. For convenience, we use g++ 7.5.0 to compile the implementations with the same compiler options O3 march=native enable as Zhao et al. (2020a). The benchmark results are shown in Table 9.
In Table 9, the average numbers of trials are basically consistent with the theoretical results. However, the new COSAC sampler is 1.46x–1.63x faster than the original. The main reason is the additional operations to hide z. As claimed in Zhao et al. (2020a), our implementation of Algorithm 8 may also reveal \(\sigma\). Although the implementation involves AVX2 intrinsic instructions, we compare it with the implementations of Type II isochrony in Table 6. The new COSAC sampler is faster than the implementation based on the binary sampler but slower than the implementation based on the CDT sampler. It is noteworthy that the absolute error of the Box–Muller continuous Gaussian sampler is no more than \(2^{48}\). For \(\sigma \in [2, 2^{20}]\) and \(\lambda ^{(R)} = 256\), the provable security is accessible only for \(Q_{bs} \le 2^{44}\) based on the second condition in Theorem 11.
Conclusions
In this work, we mainly proposed three algorithms to design a generic, secure and efficient Gaussian sampling algorithm over the integers. They are Algorithm 3, Algorithm 4, and Algorithm 8. Algorithm 3 is to sample the exponential Bernoulli variables. Compared with the polynomial approximation method, our algorithm reduces the floatingpoint multiplications significantly. We applied it to FALCON to decrease the signature generation time. One interesting phenomenon is that the polynomial approximation method can benefit from the Haswell microarchitecture developed by Intel as the fourthgeneration core. If the compiler options include march=haswell, the running time of the polynomial approximation method will decrease by about \(45\%\) on a single Intel Core i74790 CPU at 3.6 GHz. It is an important issue how to improve the performance of our algorithm on the specific architecture.
Algorithm 4 can sample from the Gaussian distributions over the integers while hiding the standard deviation, center, and output. It can be used as the fundamental operation modular in the latticebased cryptography library. We applied it to PALISADE and obtained a more secure and efficient implementation of the MP trapdoor sampler. We can utilize the constanttime implementation of the KnuthYao sampler (Karmakar et al. 2019) as the base sampler and improve the performances of our implementations further. Besides, Theorem 9 is only available for search problems. For decision problems, one may prove the security by the techniques in Micciancio and Walter (2018).
Algorithm 8 also samples from the Gaussian distributions over the integers. It is not as competitive as Algorithm 4 so far. However, it has the lowest expected number of trials among the algorithms based on rejection sampling. It is necessary to find a more efficient algorithm sampling from the normal distribution to improve the performance. In addition, it also deserves further research to make the acceptance probability of Algorithm 8 independent of the standard deviation and center to resist timing attacks.
Availability of data and material
Not applicable.
Notes
The LDL tree is a data structure to reduce the complexity of the Gaussian sampling over rings during the signature generation. It depends only on the secret key.
The rejection sampling strategy is the way of sampling the discrete Gaussian distributions with different \(\sigma\) and arbitrary c by the base sampler.
If the adversary makes no more than \(2^{64}\) queries, for a 256bit cryptographic scheme, \(\epsilon = 2^{40}\) and \(\sigma \ge 2 > \eta _{\epsilon }^+({\mathbb {Z}})\) can ensure security.
Assuming the floatingpoint division is isochronous (Howe et al. 2020), FALCON’s sampler only requires to precompute the cumulative distribution table.
maes is only used to compile AESNI. It can be removed while the implementation does not involve AESNI.
\(\sigma _0\) in our implementations is 1 or \(1/\sqrt{2\ln {2}}\).
The smoothing parameter in Micciancio and Walter (2017) is \(\sqrt{2\pi }\) times that in this paper.
\(D^{(IN)}_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) is not the real output distribution of Algorithm 7, because the Bernoulli distribution is perfect in the Intermediate Case.
References
Agrawal S, Boneh D, Boyen X (2010) Efficient lattice (H)IBE in the standard model. In: Gilbert H (ed) EUROCRYPT, pp 553–572
Ahrens JH, Dieter U (1973) Extensions of Forsythe’s method for random sampling from the normal distribution. Math Comput 27(124):927–937
Ajtai M (1996) Generating hard instances of lattice problems. In: Miller GL (ed) STOC, pp 99–108
Bai S, Langlois A, Lepoint T, Stehlé D, Steinfeld R (2015) Improved security proofs in latticebased cryptography: Using the Rényi divergence rather than the statistical distance. In: Iwata T, Cheon JH (eds) ASIACRYPT, pp 3–24
Barthe G, Belaïd S, Espitau T, Fouque P, Rossi M, Tibouchi M (2019) Galactics: Gaussian sampling for latticebased constant time implementation of cryptographic signatures, revisited. In: Cavallaro L, Kinder J, Wang X, Katz J (eds) CCS, pp 2147–2164
Bootle J, Delaplace C, Espitau T, Fouque PA, Tibouchi M (2018) LWE without modular reduction and improved sidechannel attacks against BLISS. In: Peyrin T, Galbraith S (eds) ASIACRYPT, pp 494–524
Brakerski Z, Langlois A, Peikert C, Regev O, Stehlé D (2013) Classical hardness of learning with errors. In: Boneh D, Roughgarden T, Feigenbaum J (eds) STOC, pp 575–584
Cousins DB, Crescenzo GD, Gür KD, King K, Polyakov Y, Rohloff K, Ryan GW, Savas E (2018) Implementing conjunction obfuscation under entropic ring LWE. In: IEEE symposium on security and privacy, pp 354–371
Dai W, Doröz Y, Polyakov Y, Rohloff K, Sajjadpour H, Savas E, Sunar B (2018) Implementation and evaluation of a latticebased keypolicy ABE scheme. IEEE Trans Inf Forensics Secur 13(5):1169–1184
Du Y, Wei B, Zhang H (2019) A rejection sampling algorithm for offcentered discrete Gaussian distributions over the integers. Sci China Inf Sci 62(3):39103–1391033
Ducas L, Nguyen PQ (2012) Faster Gaussian lattice sampling using lazy floatingpoint arithmetic. In: Wang X, Sako K (eds) ASIACRYPT, pp 415–432
Ducas L, Durmus A, Lepoint T, Lyubashevsky V (2013) Lattice signatures and bimodal Gaussians. In: Canetti R, Garay JA (eds) CRYPTO, pp 40–56
Ducas L, Lyubashevsky V, Prest T (2014) Efficient identitybased encryption over NTRU lattices. In: Sarkar P, Iwata T (eds) ASIACRYPT, pp 22–41
Espitau T, Fouque PA, Gérard B, Tibouchi M (2017) Sidechannel attacks on BLISS latticebased signatures: exploiting branch tracing against strongswan and electromagnetic emanations in microcontrollers. In: Thuraisingham BM, Evans D, Malkin T, Xu D (eds) CCS, pp 1857–1874
Forsythe GE (1972) Von Neumann’s comparison method for random sampling from the normal and other distributions. Math Comput 26(120):817–826
Fouque P, Kirchner P, Tibouchi M, Wallet A, Yu Y (2020) Key recovery from Gram–Schmidt norm leakage in hashandsign signatures over NTRU lattices. In: Canteaut A, Ishai Y (eds) EUROCRYPT, pp 34–63
Genise N, Micciancio D (2018) Faster Gaussian sampling for trapdoor lattices with arbitrary modulus. In: Nielsen JB, Rijmen V (eds) EUROCRYPT, pp 174–203
Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Mitzenmacher M (ed) STOC, pp 169–178
Gentry C, Peikert C, Vaikuntanathan V (2008) Trapdoors for hard lattices and new cryptographic constructions. In: Dwork C (ed) STOC, pp 197–206
Groot Bruinderink L, Hülsing A, Lange T, Yarom Y (2016) Flush, gauss, and reload: a cache attack on the BLISS latticebased signature scheme. In: Gierlichs B, Poschmann AY (eds) CHES, pp 323–345
Gür KD, Polyakov Y, Rohloff K, Ryan GW, Sajjadpour H, Savas E (2019) Practical applications of improved Gaussian sampling for trapdoor lattices. IEEE Trans Comput 68(4):570–584
Howe J, Prest T, Ricosset T, Rossi M (2020) Isochronous Gaussian sampling: from inception to implementation. In: Ding J, Tillich JP (eds) PQC, pp 53–71
Hu Y, Jia H (2019) A new Gaussian sampling for trapdoor lattices with arbitrary modulus. Des Codes Cryptogr 87(11):2553–2570
Hülsing A, Lange T, Smeets K (2018) Rounded Gaussians—fast and secure constanttime sampling for latticebased crypto. In: Abdalla M, Dahab R (eds) PKC, pp 728–757
Karmakar A, Roy SS, Vercauteren F, Verbauwhede I (2019) Pushing the speed limit of constanttime discrete Gaussian sampling. A case study on the falcon signature scheme. In: DAC, p 88
Karney CFF (2016) Sampling exactly from the normal distribution. ACM Trans Math Softw 42(1):3–1314
Lyubashevsky V, Peikert C, Regev O (2010) On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT, pp. 1–23
Lyubashevsky V, Ducas L, Kiltz E, Lepoint T, Schwabe P, Seiler G, Stehle D, Bai S (2020) CRYSTALSDILITHIUM. Technical report, National Institute of Standards and Technology. https://csrc.nist.gov/Projects/postquantumcryptography/round3submissions. Accessed 15 Jan 2022
Melchor CA, Ricosset T (2018) CDTbased Gaussian sampling: from multi to double precision. IEEE Trans Comput 67(11):1610–1621
Micciancio D (2002) Generalized compact knapsacks, cyclic lattices, and efficient oneway functions from worstcase complexity assumptions. In: FOCS, pp 356–365
Micciancio D, Peikert C (2012) Trapdoors for lattices: simpler, tighter, faster, smaller. In: Pointcheval D, Johansson T (eds) EUROCRYPT, pp 700–718
Micciancio D, Regev O (2004) Worstcase to averagecase reductions based on Gaussian measures. In: FOCS, pp 372–381
Micciancio D, Walter M (2017) Gaussian sampling over the integers: efficient, generic, constanttime. In: Katz J, Shacham H (eds) CRYPTO, pp 455–485
Micciancio D, Walter M (2018) On the bit security of cryptographic primitives. In: Nielsen JB, Rijmen V (eds) EUROCRYPT, pp 3–28
Peikert C (2010) An efficient and parallel Gaussian sampler for lattices. In: Rabin T (ed) CRYPTO, pp 80–97
Pessl P, Bruinderink LG, Yarom Y (2017) To BLISSB or not to be: attacking strongswan’s implementation of postquantum signatures. In: Thuraisingham BM, Evans D, Malkin T, Xu D (eds) CCS, pp 1843–1855
Prest T (2017) Sharper bounds in latticebased cryptography using the Rényi divergence. In: Takagi T, Peyrin T (eds) ASIACRYPT, pp 347–374
Prest T, Fouque PA, Hoffstein J, Kirchner P, Lyubashevsky V, Pornin T, Ricosset T, Seiler G, Whyte W, Zhang Z (2020) FALCON. Technical report, National Institute of Standards and Technology. https://csrc.nist.gov/Projects/postquantumcryptography/round3submissions. Accessed 15 Jan 2022
Regev O (2005) On lattices, learning with errors, random linear codes, and cryptography. In: Gabow HN, Fagin R (eds) STOC, pp 84–93
Schwabe P, Avanzi R, Bos J, Ducas L, Kiltz E, Lepoint T, Lyubashevsky V, Schanck JM, Seiler G, Stehle D (2020) CRYSTALSKYBER. Technical report, National Institute of Standards and Technology. https://csrc.nist.gov/Projects/postquantumcryptography/round3submissions. Accessed 15 Jan 2022
Shor PW (1997) Polynomialtime algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J Comput 26(5):1484–1509
van Erven T, Harremoës P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inf Theory 60(7):3797–3820
Zhang J, Zhang Z (2011) A ciphertext policy attributebased encryption scheme without pairings. In: Wu C, Yung M, Lin D (eds) Inscrypt, pp 324–340
Zhao RK, Steinfeld R, Sakzad A (2020) COSAC: compact and scalable arbitrarycentered discrete Gaussian sampling over integers. In: Ding J, Tillich JP (eds) PQC, pp 284–303
Zhao RK, Steinfeld R, Sakzad A (2020) FACCT: fast, compact, and constanttime discrete Gaussian sampler over integers. IEEE Trans Comput 69(1):126–137
Acknowledgements
Not applicable.
Authors' information
Yongbin Zhou (Email: zhouyongbin@iie.ac.cn) is the corresponding author of this paper. He is currently a Fulltime Professor with the School of Cyber Science and Engineering, Nanjing University of Science and Technology. He is also an Adjunct Professor with Institute of Information Engineering, Chinese Academy of Sciences. His main research interests include theories and technologies of network and information security.
Funding
This work is supported in part by National Natural Science Foundation of China (No. U1936209 and No. 62002353), China Postdoctoral Science Foundation (No. 2021M701726) and Yunnan Provincial Major Science and Technology Special Plan Projects (No. 202103AA080015).
Author information
Authors and Affiliations
Contributions
SS and YZ proposed the exponential Bernoulli sampling algorithm and the Gaussian sampling algorithms over the integers and drafted the manuscript. YJ, RZ, and YT participated in problem discussions and improvements of the manuscript. SS and YJ implemented and benchmarked the algorithms proposed in this paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Lemma 3
Proof
By the Fourier transform of \(\rho _{\sigma }(x)\) and the Poisson summation formula, we have: \(\rho _{\sigma }({\mathbb {Z}}) = \sigma \sqrt{2\pi } \cdot \left( 1 + 2\sum _{i \ge 1}\exp (2i^2\pi ^2\sigma ^2)\right)\). If \(\epsilon \in (0, 1)\) and \(\sigma > \eta ^+_{\epsilon }({\mathbb {Z}})\), the inequality holds from Lemma 1: \(\exp \left( 2\pi ^2\sigma ^2\right) \le \epsilon /(2(1+\epsilon ))\). Combining these two relations with \(\epsilon \in (0, 1)\), we can complete the proof
\(\square\)
Proof of Theorem 6
Proof
We first prove that each integer \(z^*\) is calculated by exactly one tuple \((x^*, y^*, s^*)\). We can write c as \(c = c_1 + c_2\) in which \(c_1 \in [0,1)\) and \(c_2 \in {\mathbb {Z}}\). Then we have
If \(c_1 \ne 0\), \(z \le c_2\) for \(s = 1\), and \(z > c_2\) for \(s = 1\). However, if \(c_1 = 0\), \(z = c_2\) can be calculated by two tuples (0, 0, 1) and (0, 0, 1). The step 11 makes z greater than \(c_2\) for \(s = 1\) no matter whether \(c_1\) is equal to 0, which solves the problem. Next, we analyze that each integer \(z^* > c_2\) can be calculated by exactly one tuple \((x^*, y^*, s^*)\) for \(s = 1\). For any \(x \ge 0\), \(\lceil k \rceil  1 \le \lceil k(x+1) + sc_1 \rceil  \lceil kx + sc_1 \rceil \le \lceil k \rceil\). As \(y \in \{0,...,\lceil k \rceil 1\}\), each integer \(z^* > c_2\) can be calculated by at least one tuple \((x^*, y^*, s^*)\). However, if some \(x_{bad}\) satisfies that \(\lceil k(x_{bad}+1) + sc_1 \rceil  \lceil kx_{bad} + sc_1 \rceil = \lceil k \rceil  1\), then \(z = \lceil k(x_{bad}+1) + sc_1 \rceil + c_2\) could be calculated by two tuples \((x_{bad}, \lceil k \rceil  1, 1)\) and \((x_{bad} + 1, 0, 1)\). Due to the fact that \(\lceil k(x+1) + sc_1 \rceil > \lceil kx + sc_1 \rceil + \lceil k \rceil  1\) is equivalent to \(k(x+1) + sc_1 > \lceil kx + sc_1 \rceil + \lceil k \rceil  1\), the tuple \((x_{bad}, \lceil k \rceil  1, 1)\) is rejected after the step 10. Then, each integer \(z^* > c_2\) is calculated by exactly one tuple. For \(s = 1\), it is similar to prove that each integer \(z^* <= c_2\) is also calculated by exactly one tuple.
Now, we calculate the probability that each integer \(z^*\) is sampled. Assume that \(z^*\) is calculated by the tuple \((x^*, y^*, s^*)\), i.e. \(z^* = s^* \cdot \lceil kx^* + y^* + s^*c \rceil\). Let \(z_0^* = \lceil kx^* + y^* + s^*c \rceil\) and \(d^* = z_0^*  (kx^* + s^*c)\), then we have:
which concludes the proof. \(\square\)
Proof of Theorem 7
Proof
We first define three different cases:

1.
(Ideal Case) The implementation uses two ideal distributions \(D^{(I)}_{{\mathbb {N}}, \sigma _0}\) and \({\mathcal {B}}^{(I)}_{p_{rej}}\).

2.
(Intermediate Case) The implementation uses a real distribution \(D^{(R)}_{{\mathbb {N}}, \sigma _0}\) and an ideal distribution \({\mathcal {B}}^{(I)}_{p_{rej}}\).

3.
(Real Case) The implementation uses two real distributions \(D^{(R)}_{{\mathbb {N}}, \sigma _0}\) and \({\mathcal {B}}^{(R)}_{p_{rej}}\).
We recall that \(\lambda ^{(I)}\) (resp. \(\lambda ^{(R)}\)) is the security parameter of the Ideal (resp. Real) Case. We aim at computing \(\Delta \lambda = \lambda ^{(I)}  \lambda ^{(R)}\).
We set the order \(a = 2\lambda ^{(R)}+1\). Let \(\epsilon _{I}\) (resp. \(\epsilon _{IN}\), \(\epsilon _{R}\)) be the probability that the adversary breaks the scheme in the use of the Ideal (resp. Intermediate, Real) Case. By data processing inequality and probability preservation of the R\(\acute{e}\)nyi divergence in Lemma 4:
By definition, \(\epsilon _R = 2^{\lambda ^{(R)}}\), so we can deduce the relationship between \(\epsilon _I\) and \(\epsilon _R\) using \(\epsilon _R^{\frac{a}{a1}} = \epsilon _R/\sqrt{2}\):
So we have:
Based on the first condition in Theorem 7 and \(a = 2\lambda ^{(R)}+1\), an application of Lemma 6 yields to
By combining (5) (6) and the second condition in Theorem 7, we get
where the second inequality is deduced from \(\left( 1 + \frac{x}{n}\right) ^n \le \exp (x)\) for \(x, n > 0\). \(\square\)
Proof of Theorem 8
The number of iterations of the while loop in Algorithm 4 follows a geometric distribution of its acceptance probability. When proving Theorem 8, we will need the following lemma (Howe et al. 2020) to bound the R\(\acute{e}\)nyi divergence between two geometric distributions.
Lemma 9
Let \({\mathcal {P}}\) and \({\mathcal {Q}}\) be geometric distributions of parameters \(p, q \in (0, 1)\). Suppose there exists \(\delta = o(1/(a+1))\) such that:
Then the R\(\acute{e}\)nyi divergence between \({\mathcal {P}}\) and \({\mathcal {Q}}\) is bounded as follows:
Now we can prove Theorem 8.
Proof
Let \(T_0\) denote the running time of one iteration of the while loop in Algorithm 4. If the floatingpoint operations are isochronous, the basic sampler is isochronous with respect to its output and the algorithms sampling b and y is isochronous with respect to their inputs and outputs, then \(T_0\) follows a distribution which is independent of \(\sigma\), c, z.
Let \(I(\sigma , c)\) (resp. I) denote the number of iterations of the while loop when each iteration accepts with probability \(P_{true}(\sigma , c)\) (resp. p). \(I(\sigma , c)\) (resp. I) is a geometric distribution of parameter \(P_{true}(\sigma , c)\) (resp. p). By Lemma 9, we have \(P_{true}(\sigma , c) \in p \cdot [1  \epsilon , 1 + 2\epsilon ]\). Through a few simple computations, we can get the following inequalities:
Let \(\delta = 2\epsilon \cdot \max (1, \frac{p}{1p})\). If \(\epsilon\) is small enough, it follows from Lemma 9 that:
The total running time T (resp. \(T(\sigma , c)\)) of Algorithm 4 is a function of \(T_0\) and I (resp. \(I(\sigma , c)\)) for some function f. This allows to apply the dataprocessing inequality:
Since \(\sigma _0 \in [\frac{1}{\sqrt{2\ln {2}}}, 1]\) and \(p = \frac{t\sigma _0\sqrt{2\pi }}{2(t+1)\rho _{\sigma _0}({\mathbb {N}})} \in [0.34, 0.36]\), we can get the final conclusion:
\(\square\)
Proof of Theorem 9
Proof
Let D denote the output distribution of \(g(D_{{\mathbb {Z}}, \sigma , c})\). In the real (resp. ideal) case, we can consider without loss of generality that the adversary can query the joint distribution \((D, T(\sigma , c))\) (resp. (D, T)), where \(T(\sigma , c)\) (resp. T) is the running time of Algorithm 4 in the real (resp. ideal) case. In the proof of Theorem 8, we have shown that both T and \(T(\sigma , c)\) are independent of the output z. Thus, both T and \(T(\sigma , c)\) are independent of the distribution D. Let \(a = \lambda\), and \(P_0\), \(P_1\) denote the success probabilities of \({\mathcal {A}}\) in the ideal and real cases, respectively. Since \(P_0 = 2^{\lambda }\) and \(\epsilon \le \frac{1}{\sqrt{24\lambda Q_s}}\), it holds from Lemma 4 and Theorem 8 that:
which concludes the proof. \(\square\)
Proof of Theorem 10
Proof
To prove that the output of Algorithm 7 is distributed as \(D_{{\mathbb {Z}}\backslash \{0\},\sigma ,c_F}\), we need to calculate the probability that each nonzero integer \(z^*\) is sampled. If \(z^* > 0\), \(z^*\) is calculated from \(x^*\) in the interval \([z^*1c_F, z^*c_F)\), then we can obtain the probability of \(z^*\)
If \(z^* < 0\), \(z^*\) is calculated from \(x^*\) in the interval \([z^*c_F, z^*+1c_F)\), then it is easy to check that \(\Pr [z = z^*  z^* \in {\mathbb {Z}}, z^* < 0] \propto D_{{\mathbb {Z}}\backslash \{0\},\sigma ,c_F}(z^*)\). So, for any nonzero integer \(z^*\), we can conclude that \(\Pr [z = z^*] \propto D_{{\mathbb {Z}}\backslash \{0\}, \sigma , c_F}(z^*)\).
In Algorithm 8, \(\Pr [z = 0] = \exp (c^2_F/(2\sigma ^2))/S = D_{{\mathbb {Z}}, \sigma , c_F}(0)\). Therefore, variable z is distributed as \(D_{{\mathbb {Z}}, \sigma , c_F}\). Since \(c = c_I + c_F\), \(z + c_I\) is distributed as \(D_{{\mathbb {Z}}, \sigma , c}\).
Now, let’s estimate the acceptance probability \(P_{true}(\sigma , c_F)\) of Algorithm 7:
where the last inequality is deduced from Lemma 2 and \(\rho _{\sigma ,c_F}(0) \le 1\). \(\square\)
Proof of Theorem 11
Proof
The proof of Theorem 11 is very similar to that of Theorem 7. The only difference is that we need to evaluate the impact of the error of the normal distribution \({\mathcal {N}}(0,1)\) on security. We first define three different cases:

1.
(Ideal Case) The implementation uses two ideal distributions \({\mathcal {N}}^{(I)}(0, 1)\) and \({\mathcal {B}}^{(I)}_p\).

2.
(Intermediate Case) The implementation uses a real distribution \({\mathcal {N}}^{(R)}(0, 1)\) and an ideal distribution \({\mathcal {B}}^{(I)}_p\).

3.
(Real Case) The implementation uses two real distributions \({\mathcal {N}}^{(R)}(0, 1)\) and \({\mathcal {B}}^{(I)}_p\).
Our goal is to compute \(\Delta \lambda = \lambda ^{(I)}  \lambda ^{(R)}\). Let the order a of the R\(\acute{e}\)nyi divergence be \(2\lambda ^{(R)} + 1\). Let \(\epsilon _I\) (resp. \(\epsilon _{IN}\), \(\epsilon _R\)) be the probability that the adversary breaks the scheme in the use of the Ideal (resp. Intermediate, Real) Case. Let \(D^{(I)}_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\) (resp. \(D^{(IN)}_{{\mathbb {Z}} \backslash \{0\}, \sigma , c_F}\)^{Footnote 10}) be the output distribution of Algorithm 7 in the use of the the Ideal (resp. Intermediate) Case. By data processing inequality and probability preservation of the R\(\acute{e}\)nyi divergence in Lemma 4:
By the definitions, we have \(\epsilon _R = 2^{\lambda ^{(R)}}\), \(\epsilon _I = 2^{\lambda ^{(I)}}\) and \(\epsilon ^{\frac{a}{a1}}_R = \epsilon _R/\sqrt{2}\), then
Based on the second condition of Theorem 11, we can bound the relative error between \(D^{(IN)}_{{\mathbb {Z}} \backslash \{0\},\sigma ,c_F}\) and \(D^{(I)}_{{\mathbb {Z}} \backslash \{0\},\sigma ,c_F}\). For any positive integer \(z^*\),
The same result holds for any negative integer. Therefore, the relative error between \(D^{(IN)}_{{\mathbb {Z}} \backslash \{0\},\sigma ,c_F}\) and \(D^{(I)}_{{\mathbb {Z}} \backslash \{0\},\sigma ,c_F}\) is no more than \(\frac{1}{\sqrt{(4\lambda ^{(R)}+2)Q_{bs}}}\).
By combining Lemma 6 and the bounds of the relative errors, we can get:
Thus, we have
concluding the proof. \(\square\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sun, S., Zhou, Y., Ji, Y. et al. Generic, efficient and isochronous Gaussian sampling over the integers. Cybersecurity 5, 10 (2022). https://doi.org/10.1186/s42400022001130
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s42400022001130
Keywords
 Latticebased cryptography
 Gaussian sampler
 Rejection sampling
 Timing attacks
 Trapdoor