 Research
 Open access
 Published:
Improved homomorphic evaluation for hash function based on TFHE
Cybersecurity volume 7, Article number: 14 (2024)
Abstract
Homomorphic evaluation of hash functions offers a solution to the challenge of data integrity authentication in the context of homomorphic encryption. The earliest attempt to achieve homomorphic evaluation of SHA256 hash function was proposed by Mella and Susella (in: Cryptography and coding—14th IMA international conference, IMACC 2013. Lecture notes in computer science, vol 8308. Springer, Heidelberg, pp 28–44, 2013. https://doi.org/10.1007/9783642452390_3.) based on the BGV scheme. Unfortunately, their implementation faced significant limitations due to the exceedingly high multiplicative depth, rendering it impractical. Recently, a homomorphic implementation of SHA256 based on the TFHE scheme (Homomorphic evaluation of SHA256. https://github.com/zamaai/tfhers/tree/main/tfhe/examples/sha256_bool) brings it from theory to reality, however, its current efficiency remains insufficient. In this paper, we revisit the homomorphic evaluation of the SHA256 hash function in the context of TFHE, further reducing the reliance on gate bootstrapping and enhancing evaluation latency. Specifically, we primarily utilize ternary gates to reduce the number of gate bootstrappings required for logic functions in message expansion and addition of modulo \(2^{32}\) in iterative compression. Furthermore, we demonstrate that our optimization techniques are applicable to the Chinese commercial cryptographic hash SM3. Finally, we give specific comparative implementations based on the TFHErs library. Experiments demonstrate that our optimization techniques lead to an improvement of approximately 35–50% compared with the stateoftheart result under different cores.
Introduction
Fully homomorphic encryption (FHE) is a cryptographic technique that allows performing arbitrary function on ciphertexts without decryption. This remarkable property makes FHE an ideal solution for addressing security concerns in various domain such as machine learning, cloud computing, medical diagnostic and financial data analysis. Since Gentry (Gentry 2009) proposed ingenius bootstrapping technique to construct the first true fully homomorphic encryption scheme, extensive research spanning over a decade has resulted in significant advancements in both theoretical understanding and practical implementations of FHE. Some representative works include BGV (Brakerski et al. 2012), BFV (Brakerski 2012; Fan and Vercauteren 2012), CKKS (Cheon et al. 2017, 2018), FHEW (Ducas and Micciancio 2015), TFHE (Chillotti et al. 2020) and Final (Bonte et al. 2022).
Indeed, one of the major challenges in FHE is the significant expansion in ciphertext size, which is generally three to six orders of magnitude larger than the plaintext size. Transciphering (Naehrig et al. 2011), by combining FHE with symmetric encryption scheme, can tackle the challenge of ciphertext size expansion, thereby mitigating the impact on communication costs between the client and the cloud. Specifically, instead of encrypting the data using fully homomorphic encryption scheme, the client encrypts the data using traditional symmetric encryption scheme. The encrypted data, in the form of symmetric ciphertexts, is then transmitted to the cloud. In this way, the ciphertext size expansion ratio of the data is only 1 (i.e., the ciphertext size divided by the plaintext size). Some additional operations need to be performed on the server side: convert the symmetric ciphertext to homomorphic ciphertext by evaluating the decryption circuit of symmetric encryption scheme homomorphically. Once the conversion is complete, the cloud can proceed to evaluate the desired function homomorphically. Therefore, optimizing the multiplicative depth of the decryption circuit is vital for achieving efficient execution within the transciphering framework.
Homomorphic evaluation of symmetric encryption schemes, including block ciphers and stream ciphers, has garnered significant attention in recent years. Early in 2012, Gentry et al. (2012) presented a homomorphic evaluation of AES128 encryption using the BGV scheme, and they obtained an execution time of more than 4 min based on the leveled mode and a latency of 18 min based on the boostrapped mode(in updated version of this paper). Since then, optimized evaluations of AES have been developed, and a recent work (Trama et al. 2023) claimed to reduce the evaluation time of an AES block to 30 s. In addition to optimizing AES, researchers have explored the use of lightweight block ciphers to achieve lower evaluation latency. On the other hand, researchers have also delved into the investigation of specialized FHEfriendly block ciphers (Albrecht et al. 2015) or stream cipher (Dobraunig et al. 2018; Cid et al. 2022) with lower multiplicative depth and complexity.
Motivation for homomorphic evaluation of hash function
Fully homomorphic encryption in combination with symmetric encryption solves the ciphertext size expansion problem. What about with hash functions? A direct application is to verify the integrity of data in a homomorphic sense. The earliest evaluation of hash function can be traced back to Mella and Susella (2013), who presented a homomorphic evaluation of the SHA256 hash algorithm based on the BGV scheme. However, the main challenge encountered in evaluating SHA256 homomorphically is the extremely high multiplicative depth caused by its significant number of iteration rounds, and the authors did not provide a practical implementation time. Compared with the BGV scheme, TFHE has the advantage of not being limited by circuit depth, e.g., Lou and Jiang (2019) evaluated deep neural networks by means of TFHE. Recently in Bendoukha et al. (2022), Bendoukha et al evaluated hash functions constructed by lightweight block ciphers such as PRINCE, SIMON, and LowMC using the TFHE scheme. They also proposed several intriguing application scenarios for homomorphic evaluation of hash, such as Homomorphic Data Integrity Check, Single Secret Leader Election, Homomorphic Database Querying and Oblivious Authenticated (Homomorphic) Calculation, which greatly encourage and highlight the need for homomorphic evaluation of hash functions. However, it is worth noting that their homomorphic evaluation of hash functions is directly derived from some previous evaluation of lightweight block ciphers, and these constructed hash functions are not already standardized, making them difficult to deploy in industry. In this paper, we focus on the wellstudied and standardized hash algorithm SHA256 and Chinese commercial cryptographic hash SM3 (https://oscca.gov.cn/sca/xxgk/201012/17/1002389/files/302a3ada057c4a73830536d03e683110.pdf). We note that a homomorphic implementation of SHA256 (Homomorphic evaluation 2023) is proposed based on the TFHE scheme, but there is still significant room for optimization.
Our contributions
In this paper, we revisit the evaluation of SHA256 in the context of TFHE homomorphic encryption and concentrate on improving the latency of SHA256 evaluation. We first discuss modifications to the SHA256 code to make it more friendly to the TFHE scheme. One significant improvement is the utilization of ternary gates, which effectively reduces the number of gate bootstrappings required for evaluating SHA256. Specifically, the logic functions \(\sigma _0\)^{Footnote 1}\(\sigma _1\),^{Footnote 2}\(s_0\),^{Footnote 3}\(s_1\)^{Footnote 4} and Maj^{Footnote 5} required in message expansion can be evaluated with only a single bootstrapping. For the expensive addition of modulo \(2^{32}\), we present a number of optimization techniques to further minimize the number of required gate bootstrappings. Moreover, we show that our optimization techniques are also applicable to the evaluation of SM3 hash algorithm. Finally, we provide a concrete implementation based on the TFHErs library. Our experimental results show that our optimization tricks can achieve about 35%50% efficiency gains compared with the stateoftheart under different CPUs.
Related works
The transciphering framework was initially proposed in Naehrig et al. (2011), and early works mainly focused on some popular symmetric ciphers, such as AES (Gentry et al. 2012), SIMON (Lepoint and Naehrig 2014), SPECK (Togan et al. 2015) and PRINCE (Doröz et al. 2016). However, their evaluation efficiency is not satisfactory due to the high multiplicative depth. Two recent works (Stracovsky et al. 2022; Trama et al. 2023) based on TFHE’s programmable bootstrapping technique greatly improve the evaluation latency of AES.
There has been significant research on designing FHEfriendly symmetric cryptographic primitives, aiming to achieve lower multiplicative complexity and depth. LowMC (Albrecht et al. 2015) is the first FHEfriendly cipher, however, it has been found to be vulnerable to algebraic attack (Dinur et al. 2015; Dobraunig et al. 2015; Rechberger et al. 2018). In 2022, an FHEfriendly block cipher called Chaghri (Ashur et al. 2022) with lower multiplicative depth is proposed, which is 63% faster than the evaluation of AES using the BGV scheme. Another line of research focuses on FHEfriendly stream cipher design that allow some expensive computations to be performed offline due to the fact that their encryption and decryption are simple XORs. Canteaut et al. (2016) first evaluated the Trivum algorithm in the eSTREAM project and proposed Kreyvium with a 128 bit security level. Since then, numerous FHEfriendly stream cipher designs have emerged, such as FLIPlike (Méaux et al. 2016, 2019; Hoffmann et al. 2020; Cosseron et al. 2022) and Rastalike (Dobraunig et al. 2018; Ha et al. 2020; Hebborn and Leander 2020; Dobraunig et al. 2023; Cid et al. 2022). Mandal and Gong (2021) et al studied the gate complexity of boolean circuits from NIST lightweight cryptography (LWC) round 2 candidates and gave their evaluation latency based on the TFHE scheme. Moreover, Cho et al. (2021) proposed a transciphering framework for approximate homomorphic encryption, called RtF, which consists of stream cipher over modular domain and transformation from BFV to CKKS. Also they proposed the stream cipher HERA as building block of the RtF framework. Ha et al. (2022) proposed faster Rubato cipher suitable for the RtF framework, which has lower multiplicative depth.
The first SHA256 evaluation based on BGV scheme was given by Mella and Susella (2013). The required multiplication depths for wordsliced implementation, packed implementation and bit slice implementation are 2762.5, 3310.5 and 2634, respectively. Due to ultra high multiplication depths, it is not possible to give a practical implementation of SHA256. Bendoukha et al. (2022) homomorphically evaluated hash functions based on the construction of “FHEfriendly” grouping ciphers such as PRINCE, LowMC and SIMON. In Homomorphic evaluation (2023) the authors presented a practical implementation of SHA256 based on the TFHE scheme combined with a number of optimization techniques.
Paper organization
The paper is organized as follows. In “Preliminaries” section, we review the preliminary knowledge required for this paper, in particular, about the TFHE cryptosystem. “Specifications of SHA256 and SM3” section gives an introduction about the NIST standard hash SHA256 and the Chinese commercial cryptographic hash SM3. “Hash goes to homomorphic” section provides details about how to convert the these two hash algorithms to efficient homomorphic computation. “Implementation and experimental results” section presents specific performance and implementation results. We conclude this paper in “Conclusion” section.
Preliminaries
Notations
Let \({\mathbb {T}} = {\mathbb {R}}/{\mathbb {Z}}\) be the real torus, i.e., the additive group of real numbers modulo 1. We will use \({\mathbb {T}}_N[X]^{k}\) to denote the set of polynomials of size k that have coefficients in \({\mathbb {T}}\) and modulo \((X^N+1)\), where N is usually a power of 2. \({\mathbb {B}}_N[X]\) denotes the polynomials with binary coefficients and modulo \(X^{N}+1\). \(<,>\) denotes the inner product. We use \(\ggg\) to denote rightrotation, and \(\gg\) to represent rightshift operations, such as \(x \gg n\) by discarding the rightmost n bits and then adding n zeros to the left.
Hash function
Hash function can map message (data) with arbitrary length into hash value with fixed length (also known as message digest), which is widely used in cryptography, typically for signature, encryption, message authentication code and other authentication, etc. Hash function need to satisfy the following security properties:

Collision resistance: Finding two messages with the same hash value is computationally difficult.

Preimage Resistance: Given the value h, which is the output of some hash function H, finding the message m such that \(h = H(m)\) is computationally hard.

Second Preimage Resistance: Given a message m and its hash value h, i.e., \(H(m)=h\), finding another message \(m'\ne m\) such that \(H(m)=H(m')\) is computationally hard.
The TFHE cryptosystem
TFHE (Chillotti et al. 2020) is currently the fastest scheme to achieve bootstrapping, which builds on the FHEW scheme (Ducas and Micciancio 2015). There are three types of ciphertexts defined in the TFHE scheme, and they play different roles in fast bootstrapping.
TFHE ciphertexts

TLWE: \((a, b = <a, s>+m+e) \in {\mathbb {T}}^{n+1}\), where a is uniformly sampled from \({\mathbb {T}}^{n}\), m is the encoded message, the secret key s is uniformly sampled from \({\mathbb {B}}^{n}\), and the error \(e \in {\mathbb {T}}^{n}\) is sampled from Gaussian distribution with mean 0 and standard deviation \(\sigma\).

TRLWE: \((a, b = <a, s>+m+e) \in {\mathbb {T}}_N[X]^{k+1}\), where a is uniformly sampled from \({\mathbb {T}}_N[X]^{k}\), m is the encoded phase polynomial, the secret key s is uniformly sampled from \({\mathbb {B}}_N[X]^k\) and the error \(e \in T_N[X]\) is a polynomial with random coefficients from sampled from Gaussian distribution with mean 0 and standard deviation \(\sigma\). Generally, \(k = 1\).

TRGSW: \(2\ell _{PBS}\) fresh TRLWE samples. In detail, TRGSW encrypts the message \(m \in {\mathbb {B}}\) into C as follows:
$$\begin{aligned} C = \begin{pmatrix} a_1(x) & b_1(x)\\ a_2(x) & b_2(x)\\ \vdots & \vdots \\ a_{\ell _{PBS}}(x) & b_{\ell _{PBS}}(x)\\ a_{\ell _{PBS}+1}(x) & b_{\ell _{PBS}+1}(x)\\ \vdots & \vdots \\ a_{2\ell _{PBS}}(x) & b_{2\ell _{PBS}}(x)\\ \end{pmatrix} + m \cdot \begin{pmatrix} 1/\beta _{PBS} & 0\\ 1/\beta _{PBS}^2 & 0\\ \vdots & \vdots \\ 1/\beta _{PBS}^l & 0\\ 0 & 1/\beta _{PBS}\\ \vdots & \vdots \\ 0 & 1/\beta _{PBS}^{\ell _{PBS}} \end{pmatrix} \end{aligned}$$where \((a_i(x), b_i(x)), \text {for}\ 1 \le i \le 2\ell _{PBS}\) is are TRLWE ciphertexts encrypting 0 using the same secret key, \(\beta _{PBS}\) denotes the basis of gadget decomposition and \(\ell _{PBS}\) is the length of gadget decomposition.
Remark
In TFHE’s bootstrapping, the TLWE ciphertext is the input to be bootstrapped, TRLWE is the ciphertext that encodes the test polynomial and will be used as intermediate ciphertext in the bootstrapping. Each part of the TLWE secret key would be encrypted to be TRGSW ciphertext as bootstrapping key, which can be precomputed.
TFHE bootstrapping
Bootstrapping allows refreshing ciphertext with large noise to support further homomorphic computation. The most important feature of the TFHE scheme is the efficient bootstrapping, which consists of three core algorithms: blind rotation, sample extraction and key switching, as shown in Algorithm 2.
Key Switching Two kinds of Key Switching are proposed by Chillotti et al. (2020). The first one is Public Functional KeySwitching, which allows packing TLWE samples into TRLWE sample or switching secret key. It can also evaluate the public linear function f on the input TLWE samples. The second one is Private Functional KeySwitching, which can evaluate private linear function on the input TLWE samples by encoding the secret f into the KeySwitching key.
Blind Rotation
Blind rotation, as the name implies, rotates a polynomial encrypted as TRLWE ciphertext by an encrypted index, which is the core operation in bootstrapping. In fact, the blind rotation is mainly constructed by successive external products. Algorithm 1 presents the detailed blind rotation operation.
Sample Extraction
This operation can extract the TLWE ciphertext encrypting any \(m_i\) from the TRLWE ciphertext encrypting the message \(m(x) \in {\mathbb {T}}_N[X]\). For example, SampleExtract\(_0(a(x),b(x))\) is \((a_0, a_{N1}, \cdots , a_1, b_0) \in {\mathbb {T}}^{N+1}\), which encrypts \(m_0\). This can be simply proved by the decryption of TRLWE.
Specifications of SHA256 and SM3
SHA256 (Science 2012) is a hash function developed by the NSA and published by NIST in 2001, while SM3 (https://oscca.gov.cn/sca/xxgk/201012/17/1002389/files/302a3ada057c4a73830536d03e683110.pdf) is a Chinese commercial cryptographic hash algorithm standard published by the Chinese National Cryptography Administration in 2010. Both of them are MerkleDamg\(\mathring{\text {a}}\)rd structure that processes a 512bit block of input messages and returns a 256bit hash value. The hash function SHA256 and SM3 operate on 32bit variables, combining NOT, XOR, OR, AND, rotation and addition of modulo \(2^{32}\).
Message padding
Assume that the message m has \(\ell\) bits length. First add “1” to the end of the message followed by k zeros, where k is the smallest nonnegative integer such that \(\ell + k + 1 = 448 \pmod {512}\). And then add a 64bit string which is equal to the binary expansion of \(\ell\). The bit length of the padded message M is a multiple of 512.
Recall on SHA256 hash function
Some useful logical functions
These useful functions will be used in the message schedule and iterative compression function.
SHA256 hash computation
Then, each message block \(M^{1}, M^{2}, \ldots , M^{N}\) would be processed using the following four loop steps, for i from 1 to N:
(1)Message schedule:
(2)Initialization:
(3)Iterative compression:
for \(t=0 \text { to } 63:\)
\(\begin{aligned} T_1&= h + s_1(e) + Ch(e,f,g) + K_t +w_t, T_2 = s_0(a) + Maj(a,b,c) \\ h&= g, g = f, f = e, e = d + T_1, d = c, c = b, b = a, a = T_1 + T_2 \end{aligned}\)
(4) Compute the \(i\)th intermediate hash value H(i):
After repeating steps one through four a total of N times, the resulting 256bit message digest is \((H^{(N)}_0  H^{(N)}_1  H^{(N)}_2  H^{(N)}_3  H^{(N)}_4  H^{(N)}_5  H^{(N)}_6  H^{(N)}_7)\). Figure 1 illustrates the state update step of SHA256.
Recall on SM3 hash function
SM3 consists of two parts: message expansion and status update transformation. Below, we will describe these two parts. The auxiliary functions \(P_{0}\) and \(P_{1}\), which operate on 32bit words, are defined as follows:
Message expansion
The input here is the 512 message block splitted as 16 32bit words \(W_0, \ldots , W_{15}\) and then is expanded to 68 32bit words \(W_i\):
for \(16 \le i < 68\) and 64 expanded words \(W_i^{'} = W_{i} \oplus W_{i+4}, \text {\ for } 0 \le i < 64\).
State update transformation
In SM3, the state update transformation starts with fixed initial values of eight 32bit words and updates them in 64 rounds. Let A, B, C, D, E, \(F,G\ \text {and}\ H\) denote the inner state registers, the jth round transformation is given by
where the bitwise boolean functions \(FF_{j}\) and \(GG_{j}\) are defined by
Note that \(T_j = 0x79cc4519\) for \(0 \le j < 15\) and \(T_j = 0x7a879d8a\), for \(16 \le j < 63\). After the last step of the state update transformation, the initial values are added to the output values of the last step. The result is the final hash value or the initial value for the next message block, as SHA256.
Hash goes to homomorphic
Indeed, when designing hash functions, it is crucial to ensure efficient computation on software platforms. As shown in “Specifications of SHA256 and SM3” section, the core computation units of hash functions typically involve basic instructions such as AND, OR, NOT, and ROTATION. The TFHE scheme boasts efficient gate bootstrapping, and obviously the evaluation of function designed by gates based on this scheme is more flexible and not limited by the circuit depth compared with the BGV or BFV scheme. Therefore we will present the homomorphic computation of SHA256 and SM3 by means of TFHE.
It is important to highlight that gate bootstrapping is computationally demanding when gates are used as the basic computational unit in the encrypted domain. To improve the overall computational performance, minimizing the number of gates consumed by the circuit becomes a crucial consideration. In particular, in SHA256 and SM3, the basic operation mainly consists of functions composed of logic gates and addition of modulo \(2^{32}\). In the following, we present our circuit optimization.
A short reminder of gate bootstrapping.
For ease of representation, in gate bootstrapping, binary messages 0 and 1 are encoded as \(1/8\) and 1/8 over the torus, respectively. Now assume two TLWE ciphertexts \(c_1\) and \(c_2\), then some basic homomorphic gate operations are as follows:

HomNOT(c) = (\(\textbf{0}, 1/8\)) \( c\) (no bootstrapping);

HomAND(\(c_1, c_2\)) = \((\textbf{0}, 1/8) +\) Bootstrap(\(c_1 + c_2\));

HomXOR(\(c_1, c_2\)) = \((\textbf{0},1/4)+\)Bootstrap(\(2(c_1 \pm c_2)\));

HomOR(\(c_1, c_2\)) = \((\textbf{0}, 1/8) +\) Bootstrap(\(c_1+ c_2\));

HomMUX\((c,d_{0},d_{1})\) can evaluate \(c?d_{1}:d_{0}=(c \wedge d_{1})\oplus ((1c)\wedge d_{0})\) using two gate bootstrappings and a public key switching.
Trivial gate reduction in SHA256
In Homomorphic evaluation (2023), the authors proposed optimizations for reducing the usage of logic gates in the Ch and Maj functions of the SHA256 algorithm, thereby reducing the number of gate bootstrappings required. Specifically, for function \(Ch(x, y, z) = (x \wedge y) \oplus (\lnot x \wedge z)\), it can be easily inferred that the result is y when \(x=1\), and z when \(x=0\), which behaves like a bitwise multiplexer. In this way, we can replace the 4 gates in the Ch function with a HomMUX gate in the encrypted domain. The function Thanks to the the boolean distributive law
and
can be simplified as
As a result, the number of gates required by Maj can be reduced from 5 to 4. While these optimizations do improve the overall evaluation efficiency of the SHA256 hash, they are still not sufficient for achieving optimal efficiency within the TFHE scheme.
Further gate reduction of function in SHA256
In this subsection, we further reduce the number of gates needed to evaluate the SHA256 in the encrypted domain. We observe that the \(\sigma _0, \sigma _1, s_0 \text { and } s_1\) functions involve different rotations or shifts of 32bit word, followed by two consecutive XOR operations. The rotation and shift operations are now free due to bitwise encryption, and next we will explain how to implement the XOR between the 3 inputs using one gate bootstrapping (i.e., one blind rotation). Moreover, the Maj function can also be implemented with only one gate bootstrapping.
Ternary gates are introduced into the TFHE scheme in Matsuoka et al. (2021), containing XOR3 and 2OF3^{Footnote 6} gates, where XOR3 is the XOR of 3 inputs, and 2OF3 gate outputs true if at least two inputs are true.
The implementation of the ternary gates in the encrypted domain is as follows:
Now we give a highlevel explanation for their correctness. Note that the test (negacyclic) polynomial in the gate bootstrapping is set to:
From another point of view, for the \(\textit{XOR3}\) function, the result is equal to the least significant bit of the sum of the 3 inputs. As we show in Fig. 2, when three plain inputs are 000 or 110 (independent of the order), i.e., their encoding phase sum = \(\frac{3}{8}\) or \(\frac{1}{8}\), the desired result is 0, i.e., \(\frac{1}{8}\) on torus; and when the input is 100 or 111 (independent of the order), i.e., phase sum = \(\frac{1}{8}\) or \(\frac{3}{8}\), the desired result is 0, i.e., \(\frac{1}{8}\) on torus. Therefore, to match the test polynomial, we simply multiply the sum by \(2\) such that phase can be divided into two separate pieces on the torus. For \(\textit{2OF3}\), the result is the most significant bit of the sum of the three inputs, which exactly match the settings of the test polynomial.
In this way, the \(\sigma _0, \sigma _1, s_0, s_1 \text { and } Maj\) functions can be computed homomorphically by just one expensive blind rotation, while the Ch function needs to be implemented in the encrypted domain using HomMUX at the cost of about two blind rotations. One thing that must be noted is that the ternary gate requires the sum of 3 inputs, and it is better to use larger parameters in order not to affect the correctness of the decryption. In the experiment, we show that the parameter sets satisfy this requirement.
Addition of modulo \(2^{32}\)
In addition to some logical functions, the arithmetic addition of modulo \(2^{32}\) is also widely used in SHA256, which would be the most timeconsuming operation. Integer arithmetic can be directly implemented in the second generation FHE schemes such as BGV and BFV, but bootstrapping efficiency of these schemes currently perform poorly, which is unfriendly to deep circuits. As mentioned in the previous section, we choose the efficient TFHE scheme to implement the hash function homomorphically. A natural question is how to efficiently evaluate the required homomorphic addition of modulo \(2^{32}\) via TFHE.
For the addition of two \(n\)bit integers, a naive method is to use Ripple Carry Adder(RCA), which is constructed by cascading multiple full adder gates, as illustrated in Fig. 3. For an \(n\)bit adder, there must be n full adder gates. The output of the full adder can be obtained by the following equation:
Indeed, \(C_{i+1} = 2\text {OF}3(a_i,b_i, C_i), S_i= \text {XOR}3(a_i, b_i, C_i)\). Klemsa and Önen (2022) also apply this to the addition of integer. Therefore, we only need \(32*21=63\) instead of \(32*53 = 157\) gate bootstrappings to evaluate addition of modulo \(2^{32}\) by utilizing ternary gates.
Optimization of sequential addition
Note that we have
in the message schedule and \(T_1 = h + s_1(e) + Ch(e,f,g) + K_t +w_t\) in the iterative compression. These two functions involve successive addition operations, which can be optimized using the Carry Save Adder(CSA).
CSA has a very small carry propagation delay when performing the addition of multiple numbers, the idea behind it is that the sum of three inputs is reduced to the sum of two inputs and the carry C and sum S are computed separately for each bit, thus it is faster.
It is interesting to note that the carry save adder can be constructed by the full adder, so the optimizations we introduced previously for full adder can be extended to CSA as well.
Parallel implementation
The disadvantage of RCA is that the carryin bit of each full adder is derived from the carryout bit of the previous cascaded full adder, making the critical path of the adder circuit positively correlated with the bit length of input. The Carry LookAhead Adder (CLA) reduces the depth of the critical path by parallel computation. CLA computes one or more carry bits before the sum, which reduces the waiting time of computing the carry bit, so this seems to be very friendly to the BGV scheme. Mella and Susella (2013) firstly used CLA for homomorphic computation of SHA256 based on the BGV scheme, for 32bit addition they estimated to consume 10 multiplication depths. The multiplication depth for computing CLA was further reduced from 10 to 5 for 32bit addition in Togan et al. (2015). The idea is to use the Equation(**) instead of Equation(*) to compute the carry bit, which eliminates the evaluation of the OR function. Specifically, let \(P_i = a_i \oplus b_i\) and \(G_i = a_i \wedge b_i\), then
\(P_i\) and \(G_i\) can be precomputed in parallel when there are more CPUs available, independent of carry bits. In this way, we can rewrite the carry bit of 32bit adder respectively as follows:
Thus, the result of 32bit adder is \(S_i = P_i \oplus C_i, \text { for } 0 \le i \le 31\).
In Mella and Susella (2013), Togan et al. (2015), they exploited the batch packing capability of the BGV scheme. However, it is hard to give a practical time for homomorphic computation of SHA256 and SPECK cipher based on the BGV scheme because the parameters of the leveled BGV scheme are related to the multiplicative depth of the circuit and the bootstrapping is not efficient enough.
In the context of the TFHE scheme, we similarly utilize the Equation(**) rather than Equation(*) like (Togan et al. 2015). The reason for this is that successive XOR give us much room for optimization. For all carry bit \(C_i, \text { for } 2\le i \le 31\), we can still utilize HomXOR3 gate to reduce the gate required in encrypted domain.
Compared to the BGV scheme, the TFHE scheme does not support batch processing. Hence a natural solution for TFHE scheme is to do parallelization using multiple CPUs, which is reasonable for the cloud server with a large number of CPUs. Some advanced Parallel Prefix Adders (Payal et al. 2015) for CLA structures such as Brent–Kung adder, Kogge–Stone adder and Ladner–Fischer adder are proposed for high performance arithmetic structures in industry. In Homomorphic evaluation (2023), they utilized the Brent–Kung and the Ladner–Fischer Adder for optimization. See “Appendix A” for a more detailed description. For a fair experimental comparison, we also exploit these two optimization techniques.
Analysis of functions in SM3
In this subsection, we give an analysis of homomorphic evaluation of the hash algorithm SM3. The interesting observation is that for the \(GG_j\) function, the result is z if \(x=0\) and y otherwise, which is equivalent to the Ch function for \(16 \le j < 64\), i.e., the mux gate. For the \(FF_j\) function, it can be seen from Table 1 that it implements the same function as the Maj function for \(16 \le j < 64\). Thus, the \(FF, GG, P_0 \text { and } P_1\) functions can be implemented using only one bootstrapping. For addition modulo \(2^{32}\), it can be observed that SM3 uses fewer consecutive modulo additions compared to SHA256 in the iterative compression function, enabling it have a lower latency evaluation. For the specific evaluation method of SM3 we use the method mentioned in the above section, please refer to the next section for the specific implementation results.
Implementation and experimental results
In this section we provide a detailed explanation of our implementation for evaluating the hash functions SHA256 and SM3 based on the TFHE scheme. To the best of our knowledge, the TFHErs library^{Footnote 7} is the fastest public implementation of the TFHE scheme among the homomorphic cryptographic libraries (https://www.zama.ai/post/announcingtfhers). Therefore, we implement our evaluation method in the TFHErs library. All tests were conducted on 12th Gen Intel(R) Core(TM) i512500 \(\times\) 12 with 15.3GB RAM, running the Ubuntu 20.04 operating system.
Experimental parameter setting
Now we present our parameter settings in the TFHE scheme. We use two parameter sets from the TFHErs library, as shown in Table 2, both of which provide at least 128 bits of security. “DEFAULT_PARAMS” guarantees an error probability bound of \(2^{40}\) and “TFHE_LIB_PARAMS” provides a lower decryption error rate of \(2^{165}\), which can be used for different scenario requirements.
Performance result
In this subsection, we present a comparison of our evaluation experimental data. A trival implementation of SHA256 based on the TFHErs library is currently publicly available from Homomorphic evaluation (2023). For a fair experimental comparison, we run their code on our machine. One thing to note is that in addition to bitwise encryption, the TFHErs implementation based on twobit encryption is available from Github.^{Footnote 8} However, this implementation takes up to 23 min due to the fact that this encryption is not suitable for rotation operation, resulting in huge latency even when we use multiple CPUs. Therefore, we did not consider further optimization of this implementation.
As in their experiments, we use Rayon, a multithreaded crate of the Rust programming language, to parallelize the implementation when there are available CPUs. Specifically, we can control the number of CPUs used by calling the interface rayon::ThreadPoolBuilder::new().num_threads().build_global().unwrap(). We present the comparison of homomorphic evalaution of SHA256 and SM3 based on the parameter sets “DEFAULT PARAMS” and “TFHE_LIB_PARAMS” for different CPU cores in Figs. 4 and 5, respectively. More detailed data, please refer to Table 3 in “Appendix B”.
Experimental results show that for the SHA256 and SM3 algorithm we achieve about 35%50% efficiency improvement compared to the stateoftheart work, especially up to 50% when only one CPU is used. We observed that the Brent–Kung adder outperforms the LadnerFishcher adder, particularly when fewer CPUs are used. The overall SM3 evaluation latency is lower than SHA256 due to its use of fewer additions. It is worth noting that when using the “TFHE_LIB_PARAMS” parameter, the evaluation latency tends to be higher. However, this parameter set offers the benefit of a lower decryption error rate, ensuring higher reliability in the evaluation results.
Conclusion
In this paper, we explore the application of ternary gates to the various logic functions required for hash functions and further reduce the number of gate bootstrapping required by SHA256 and SM3 in the context of TFHE, realizing an improvement in efficiency. This advancement holds significant potential for various applications, including data integrity checking and private database retrieval, where hash functions play a vital role.
Further optimization directions for hash function evaluation include utilizing the fully homomorphic encryption scheme FINAL (Bonte et al. 2022) constructed by NTRU cipher, which achieves faster gate bootstrapping efficiency compared with TFHE. We believe this can directly reduce the overall runtime latency. Lower latency can be obtained when there is a large number of CPUs available, such as GPU.
Availibility of data and materials
Not applicable.
Notes
\(\sigma _0(x) = (x \ggg 7) \oplus (x \ggg 18) \oplus (x \gg 3)\).
\(\sigma _1(x) = (x \ggg 17) \oplus (x \ggg 19) \oplus (x \gg 10)\).
\(s_0(x) = (x \ggg 2) \oplus (x \ggg 13) \oplus (x \gg 22)\).
\(s_1(x) = (x \ggg 6) \oplus (x \ggg 11) \oplus (x \gg 25)\).
\(\textit{Maj}(x, y, z) =(x \wedge y) \oplus (x \wedge z) \oplus (y \wedge z)\)
Generally, 2OF3 is also called Majority(Maj), that is, the output is the value that accounts for the most of the 3 inputs.
References
Albrecht MR, Rechberger C, Schneider T, Tiessen T, Zohner M (2015) Ciphers for MPC and FHE. In: EUROCRYPT 2015, vol 9056. Springer, Heidelberg, pp 430–454. https://doi.org/10.1007/9783662468005_17
Ashur T, Mahzoun M, Toprakhisar D (2022) Chaghri–A fhefriendly block cipher. In: Proceedings of the 2022 ACM SIGSAC conference on computer and communications security, CCS 2022. ACM, New York, pp 139–150. https://doi.org/10.1145/3548606.3559364
Bendoukha A, Stan O, Sirdey R, Quero N, Souza LF (2022) Practical homomorphic evaluation of blockcipherbased hash functions with applications. In: Foundations and practice of security—15th international symposium, FPS 2022. Lecture notes in computer science, vol 13877. Springer, Cham, pp 88–103. https://doi.org/10.1007/9783031301223_6
Bonte C, Iliashenko I, Park J, Pereira HVL, Smart NP (2022) FINAL: faster FHE instantiated with NTRU and LWE. In: ASIACRYPT 2022, vol 13792. Lecture notes in computer science. Springer, Cham, pp 188–215
Brakerski Z (2012) Fully homomorphic encryption without modulus switching from classical GapSVP. In: CRYPTO 2012. Springer, Heidelberg, pp 868–886
Brakerski Z, Gentry C, Vaikuntanathan V (2012) (leveled) fully homomorphic encryption without bootstrapping. In: Innovations in theoretical computer science 2012. ACM, New York, pp 309–325
Canteaut A, Carpov S, Fontaine C, Lepoint T, NayaPlasencia M, Paillier P, Sirdey R (2016) Stream ciphers: a practical solution for efficient homomorphicciphertext compression. In: FSE 2016. Lecture notes in computer science, vol 9783. Springer, Heidelberg, pp 313–333. https://doi.org/10.1007/9783662529935_16
Cheon JH, Han K, Kim A, Kim M, Song Y (2018) Bootstrapping for approximate homomorphic encryption. In: EUROCRYPT 2018, vol 10820. Lecture notes in computer science. Springer, Cham, pp 360–384
Cheon JH, Kim A, Kim M, Song YS (2017) Homomorphic encryption for arithmetic of approximate numbers. In: ASIACRYPT 2017. Springer, Cham, pp 409–437
Chillotti I, Gama N, Georgieva M, Izabachène M (2020) TFHE: fast fully homomorphic encryption over the torus. J Cryptol 33(1):34–91
Cho J, Ha J, Kim S, Lee B, Lee J, Lee J, Moon D, Yoon H (2021) Transciphering framework for approximate homomorphic encryption. In: ASIACRYPT 2021. Lecture notes in computer science, vol 13092. Springer, Cham, pp 640–669. https://doi.org/10.1007/9783030920784_22
Cid C, Indrøy JP, Raddum H (2022) FASTA—a stream cipher for fast FHE evaluation. In: CTRSA 2022, vol 13161. Lecture notes in computer science. Springer, Cham, pp 451–483
Cosseron O, Hoffmann C, Méaux P, Standaert F (2022)Towards globally optimized hybrid homomorphic encryption—featuring the Elisabeth stream cipher. IACR Cryptol ePrint Arch 180
Dinur I, Liu Y, Meier W, Wang Q (2015) Optimized interpolation attacks on lowmc. In: ASIACRYPT 2015. Lecture notes in computer science, vol 9453. Springer, Heidelberg, pp 535–560. https://doi.org/10.1007/9783662488003_22
Dobraunig C, Grassi L, Helminger L, Rechberger C, Schofnegger M, Walch R (2023) Pasta: a case for hybrid homomorphic encryption. IACR Trans Cryptogr Hardw Embed Syst 3:30–73. https://doi.org/10.46586/TCHES.V2023.I3.3073
Dobraunig C, Eichlseder M, Grassi L, Lallemand V, Leander G, List E, Mendel F, Rechberger C (2018) Rasta: a cipher with low and depth and few ands per bit. In: CRYPTO 2018. Lecture notes in computer science, vol 10991. Springer, Cham, pp 662–692. https://doi.org/10.1007/9783319968841_22
Dobraunig C, Eichlseder M, Mendel F (2015) Higherorder cryptanalysis of lowmc. In: ICISC 2015, vol 9558. Lecture notes in computer science. Springer, Cham, pp 87–101
Doröz Y, Hu Y, Sunar B (2016) Homomorphic AES evaluation using the modified LTV scheme. Des Codes Cryptogr 80(2):333–358
Ducas L, Micciancio D (2015) FHEW: bootstrapping homomorphic encryption in less than a second. In: EUROCRYPT 2015. Springer, Heidelberg, pp 617–640
Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report /144. https://eprint.iacr.org/2012/144
Gentry C (2009) A fully homomorphic encryption scheme
Gentry C, Halevi S, Smart NP (2012) Homomorphic evaluation of the AES circuit. In: CRYPTO 2012, vol 7417. Springer, Heidelberg, pp 850–867
Ha J, Kim S, Choi W, Lee J, Moon D, Yoon H, Cho J (2020) Masta: an hefriendly cipher using modular arithmetic. IEEE Access 8:194741–194751. https://doi.org/10.1109/ACCESS.2020.3033564
Ha J, Kim S, Lee B, Lee J, Son M (2022) Rubato: noisy ciphers for approximate homomorphic encryption. In: EUROCRYPT 2022. Springer, Cham, pp 581–610. https://doi.org/10.1007/9783031069444_20
Hebborn P, Leander G (2020) Dasta—alternative linear layer for rasta. IACR Trans Symmetric Cryptol 2020(3):46–86. https://doi.org/10.13154/TOSC.V2020.I3.4686
Hoffmann C, Méaux P, Ricosset T (2020) Transciphering, using filip and TFHE for an efficient delegation of computation. In: INDOCRYPT 2020, vol 12578. Lecture notes in computer science. Springer, Cham, pp 39–61
Homomorphic evaluation of SHA256 (2023) https://github.com/zamaai/tfhers/tree/main/tfhe/examples/sha256_bool
https://oscca.gov.cn/sca/xxgk/201012/17/1002389/files/302a3ada057c4a73830536d03e683110.pdf
Klemsa J, Önen M (2022) Parallel operations over TFHEencrypted multidigit integers. In: CODASPY ’22. ACM, New York, pp 288–299. https://doi.org/10.1145/3508398.3511527
Lepoint T, Naehrig M (2014) A comparison of the homomorphic encryption schemes FV and YASHE. In: AFRICACRYPT 2014, vol 8469. Lecture notes in computer science. Springer, Cham, pp 318–335
Lou Q, Jiang L (2019) SHE: a fast and accurate deep neural network for encrypted data. In: NeurIPS 2019, pp 10035–10043
Mandal K, Gong G (2021) Homomorphic evaluation of lightweight cipher Boolean circuits. In: FPS 2021. Springer, Cham, pp 63–74. https://doi.org/10.1007/9783031081477_5
Matsuoka K, Hoshizuki Y, Sato T, Bian S (2021) Towards better standard cell library: Optimizing compound logic gates for TFHE. In: WAHC ’21: proceedings of the 9th on workshop on encrypted computing & applied homomorphic cryptography. WAHC@ACM, New York, pp 63–68. https://doi.org/10.1145/3474366.3486927
Méaux P, Journault A, Standaert F (2019) Improved filter permutators for efficient FHE: better instances and implementations. In: INDOCRYPT 2019, vol 11898. Springer, Cham, pp 68–91 https://doi.org/10.1007/9783030354237_4
Méaux P, Journault A, Standaert F, Carlet C (2016) Towards stream ciphers for efficient FHE with lownoise ciphertexts. In: EUROCRYPT lecture notes in computer science, vol 9665. Springer, Heidelberg, pp 311–343 (2016). https://doi.org/10.1007/9783662498903_13
Mella S, Susella R (2013) On the homomorphic computation of symmetric cryptographic primitives. In: Cryptography and coding—14th IMA international conference, IMACC 2013. Lecture notes in computer science, vol 8308. Springer, Heidelberg, pp 28–44. https://doi.org/10.1007/9783642452390_3
Naehrig M, Lauter KE, Vaikuntanathan V (2011) Can homomorphic encryption be practical? In: CCSW 2011. ACM, New York, pp 113–124
Payal R, Goel M, Manglik P (2015) Design and implementation of parallel prefix adder for improving the performance of carry lookahead adder. Int J Eng Tech Res 4:12
Rechberger C, Soleimany H, Tiessen T (2018) Cryptanalysis of lowdata instances of full lowmcv2. IACR Trans Symmetric Cryptol 2018(3):163–181
Science TN Secure hash standard (shs) (2012) http://csrc.nist.gov/publications/PubsFIPS.html
Stracovsky R, Mahdavi RA, Kerschbaum F (2022) Faster evaluation of AES using TFHE. In: Poster Session, FHE.Org—2022. https://rasoulam.github.io/data/posteraestfhe.pdf
Togan M, Lupascu C, Plesca C (2015) Homomorphic evaluation of speck cipher. Proc Roman Acad Ser A: Math Phys Tech Sci Inf Sci 16:375–384
Trama D, Clet P, Boudguiga A, Sirdey R (2023) A homomorphic AES evaluation in less than 30 seconds by means of TFHE. In: Proceedings of the 11th workshop on encrypted computing & applied homomorphic cryptography. ACM, New York, , pp 79–90. https://doi.org/10.1145/3605759.3625260
Wei B, Lu X (2023) Improved homomorphic evaluation for hash function based on TFHE. In: Information security and cryptology—19th international conference, Inscrypt 2023
Acknowledgements
We would like to thank the anonymous reviewers and editors for detailed comments and useful feedback.
Funding
This work was supported by the Huawei Technologies Co., Ltd and CAS Project for Young Scientists in Basic Research Grant No. YSBR035.
Author information
Authors and Affiliations
Contributions
BW completed the major work on this paper, XL participated in problem discussions and all authors have read and agreed to contribute.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is a fulltext version of the paper published as Poster in Inscrypt 2023 (Wei and Lu 2023).
Appendices
Appendix A: Parallel prefix adder
Parallel prefix adder (PPA) can be designed in many different forms depending on the requirements. PPA is faster adder and is used in industry for high performance arithmetic structures. Parallel prefix adder is done in three steps: (1) Preprocessing stage (2) Carry generation network (3) Postprocessing stage.
Parallel prefix adder is mainly categorized into three types according to the carry generation network: Kogge–Stone Adder, Brent–Kung Adder and Ladner–Fischer Adder. The parallel prefix network of the Kogge–Stone structure is shown in Fig. 6. It is characterized by a very small number of logic depth and fanouts, but a very high number of nodes and long spaced interconnecting wires. Brent–Kung structure is shown in Fig. 7, which is characterized by a very small fanout and fewer nodes, but maximum logic depth. It relieves the fanout pressure by adding additional logic depth. Ladner–Fischer structure is shown in Fig. 8. It has low logic depth but large fanouts.
Appendix B: Detailed experimental results
We present detailed experimental results in Table 3 in “Performance result” section.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wei, B., Lu, X. Improved homomorphic evaluation for hash function based on TFHE. Cybersecurity 7, 14 (2024). https://doi.org/10.1186/s42400024002040
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s42400024002040