Comparison and Sorting in HE
Constructing HE ciruits for comparison and sorting using BFV scheme
This blog post gives a comprehensive overview of constructing efficient comparisonbased circuits using Homomorphic Encryption Schemes, right from the basic mathematics that goes behind constructing such primitives to understanding how practical these schemes are. We have tried to make this article as selfcontained as possible. We also refer the readers to the following articles to get a better introduction to HE schemes in general:
 Introduction to HE Schemes: Link
 Homomorphic Encryption  Addition/Multiplication/Relinearization: Link
Homomorphic Encryption
Homomorphic encryption (HE) is a cryptographic primitive which allows computing on encrypted data. It allows one to compute a function over encrypted inputs, rather than over inputs in the clear, and get an encrypted version of the result. This makes HE a very strong primitive. It definitely plays a powerful and essential role in building privacypreserving systems.
Current HE schemes allow users to either add and/or multiply data, depending on what encryption scheme is used, as we explain below. Note that in order to compute any function over encrypted inputs, it is enough to have schemes that provide both addition and multiplication.
Want to know why only addition and multiplication is sufficient. Click to expand this bullet.
This is because one can always imagine computation as a binary circuit. All one would need is to actually be able to compute any of the gates in an encrypted fashion. That is, be able to provide encrypted inputs into a gate to receive the encryption of the gate's output. As the "NAND" gate is a universal gate and we can implement any other gate using NAND gates, it is enough to implement a NAND gate by using additions and multiplications, as follows:
$$\textrm{NAND }(a,b) = a \times b+1 \pmod{2}$$
Types of HE Schemes
HE schemes are divided into the following three categories:
 Partial Homomorphic Encryption (PHE): These schemes allow one to compute only either addition or multiplication, but not both. These schemes do not usually restrict the number of operations.
 Somewhat Homomorphic Encryption (SHE): These schemes do allow one to compute both addition and multiplication, but are restricted to a predefined maximum number of operations. For practical purposes, one can consider additions to be “free” and not worry about an imposed limit, whereas multiplications are capped to the established maximum. They are also called leveledHE schemes.
 Fully Homomorphic Encryption (FHE): These schemes allow both of best worlds, with as many additions and multiplications as needed. Furthermore, the number of operations does not need to be defined in advance.
:::info Noise Growth It is important to note that all SHE (and FHE) schemes require some noise in order to ensure semantic security of the ciphertext. This noise grows with the number of operations, which can lead to an overflow of the field modulus (or characteristic). As we will see below, if the noise is not reduced appropriately, the ciphertext will lose information and the message will not be recovered in full.
To understand more, please read this article.
:::
All HE schemes are composed with the following algorithms:
 Key Generation, which generates the public and secret key of the scheme
 Encryption, which allows one to encrypt a message homorphically
 Decryption, which allows one to decrypt a ciphertext correctly to the underlying message
 Evaluation, which enables the homomorphic computation of functions over the ciphertexts
While the first three algorithms are common to any encryption scheme, but the evaluation algorithm particular to homomorphic encryption schemes. It describes how to add and/or multiply the encrypted values homomorphically.
In this post, we focus on the BrakerskiFanVercauteren (BFV) HE scheme, which is a leveledHE Scheme, and explain how to build efficient integercomparison circuits.
We first look into some of the basic notation and mathematical preliminaries needed to understand the rest of the article.
Notation and Preliminaries
When we write $\mathbb{Z}_p$, we consider integers in the set $[0, p1]$. And when we write $\mathbb{F}_p$, we are considering integers in the set $[\frac{p1}{2}, \frac{p1}{2}]$. Note that the two sets are in fact the same, but encoded in different ranges, and as such we use the two notations interchangeably. The latter provides more efficient operations the evaluate an HE circuit.
We use $R_{p,n}$ to denote the ring of polynomials whose coefficients belong to the field $\mathbb{F}_p$, denoted $\mathbb{F}_p[X]$, and where operations are done modulo the polynomial $X^n + 1$. We refer the reader to these two blog posts ( Link 1, Link 2) for a thorough description of these rings and their basic operations. For the sake of this article, we fix $n$ and write $R_p$ to denote the ring $R_{p,n} = \mathbb{F}_p[X]/(X^n+1)$.
The BFV Encryption Scheme
We describe the four algorithms for the BFV HE scheme below.
Key Generation: $(pk, sk) = \mathsf{KeyGen}(\lambda)$
Using the security parameter, $\lambda$, we first generate the secret key $sk$ by choosing a random polynomial in $R_2$. We do this by choosing each of the coefficients randomly from ${0,1}$.
Having chosen $sk$, we then pick a random polynomial $a$ from $R_q$ and a small error polynomial $e$ from $R_q$ such that $pk = ([(a \cdot sk + e)]_q, a)$. We denote the value $[(a \cdot sk + e)]_q$ as $b$. Hence, we have the public key to be a tuple $pk = (b,a)$.
For the advanced reader, the polynomial $e$ can be sampled from a different probability distributions (Gaussian, discrete normal, and more) over $R_q$.
Encryption: $c = \mathsf{Enc}(pk, m)$
For encrypting a plaintext message $m \in R_{t}$, we use the public key $pk = (b,a)$ and sample $\delta = \lfloor \frac{q}{t} \rfloor$ and $e_1,e_2,u$ randomly over $R_q$. Then the ciphertext, $c$, is computed as follows:
$$ c = ([b \cdot u + e_1 + \delta \cdot m]_ q,[a \cdot u + e_2]_ q) $$
Decryption: $m = \mathsf{Dec}(sk, c)$
For decrypting a ciphertext $c = (ct_1, ct_2)$ with a given secret key $sk$, we have:
$$ m = [\lfloor \frac{1}{\delta} \cdot [ct_1 + ct_2 \cdot sk]_ q \rceil]_ t $$
It is important to note, that in order for the decryption algorithm to output the right message, we have an additional requirement on the noise threshold of the ciphertext:
\begin{equation} e’ \leq \frac{q}{2t} \end{equation}
where $e’ = (e \cdot u + e_1 + e_2 \cdot sk )$.
Evaluation: $c_{out}= \mathsf{Eval}(f, c_1,…,c_k)$
There are four different operations that we are interested in and can be used within the $\mathsf{Eval}$ algorithm to build applications. :
 $\mathsf{AddP}(c_1,m_2)\rightarrow c_3$, which adds a plaintext, $m_2$, to a ciphertext $c_1$, corresponding to plaintext $m_1$, where the result would yield a ciphertext $c_3$ that encrypts message $m_3 = m_1 + m_2$
 $\mathsf{AddC}(c_1,c_2) \rightarrow c_3$, which adds two ciphertexts $c_1$ and $c_2$,corresponding to plaintexts $m_1$ and $m_2$ respectively, where the result would yield ciphertext $c_3$, which corresponds to plaintext $m_3 = m_1 + m_2$.
 $\mathsf{MulP}(c_1,m_2)\rightarrow c_3$, which multiplies a plaintext, $m_2$, to a ciphertext $c_1$, corresponding to plaintext $m_1$, where the result would yield a ciphertext $c_3$ that encrypts message $m_3 = m_1 \cdot m_2$,
 $\mathsf{MulC}(c_1,c_2) \rightarrow c_3$, which multplies two ciphertexts $c_1$ and $c_2$,corresponding to plaintexts $m_1$ and $m_2$ respectively, where the result would yield ciphertext $c_3$, which corresponds to plaintext $m_3 = m_1 \cdot m_2$.
In this post, we will take these as blackbox for the purpose of readability, but if you are interested in the details, please see this article for the plaintextbased operations and this one for the ciphertextbased operations.
Important Considerations for Function Evaluation
Noise increase
Plaintext operations have a significantly lower impact on noise growth(plaintext addition in fact comes for free) compared to ciphertext operations. This is essentially because noise from each ciphertext compounds in ciphertext operations, whereas in plaintext operations, only one ciphertext is present.
Also, multiplications are more expensive because of their noise exponentiation. There are ways to get around this by using techniques like “modulus switching” and “relinearization”, which you can learn more about in this article.
One needs to keep in mind that every (function size) to be evaluated comes with a tradeoff between the size of the parameters and the noise increase / the amount of multiplications. We explain more about this later in the post
Boolean functions are expensive
As explained before, we know that BFV only allows one to compute over arithmetic circuits over fields, ie.. those which allow addition and multiplication only. Hence, when one wants to evaluate boolean functions, we will need to convert those into linear functions before evaluating them using BFV.
This implies that though one can easily evaluate circuits which are linear or polynomial functions of the input, we can’t easily evaluate boolean functions easily using BFV. Integer comparison circuits are primary examples falling into this category. In the next section, we explain about this in detail.
BFV Library Implementations
The BFV scheme is widely used and is also part of the HE standard, here are some of the libraries we tested and worked with and some of their features as listed below in the table. The table clearly shows that each library has it’s own advantages according to the setting you want to use HE in and gives a fair idea, as to what one could choose for a particular case, given they have a decent idea about the setting of the problem.
Libraries  SEAL  Palisade  Lattigo 

Language  C++  C++  Golang 
NTT/RNS Optimizations  Available, provided as an option  Available, provided as an option  Available, no other option 
Encoding Options  Packed Encoding/Normal Encoding  Various options, see documentation  Only packed encoding option 
Other Schemes  CKKS  BFV, CKKS, FHEW, TFHE  CKKS 
Bootstrapping  Yes, for CKKS only  Yes, for FHEW,TFHE only  Yes, for CKKS only 
Comparison Circuits
In this section, we aim to construct efficient integer comparison circuits. Comparison functions play a pivotal role in computing, as they are used in all areas ranging from sorting elements to computation heavy data analytic functions. Hence, implementing a HE based comparison circuit becomes essential.
Integer comparison circuits are nonlinear functions(boolean functions more particularly). As discussed previously, these circuits are tough to construct as the HE scheme only allows addition and multiplication over integers, which means we can only evaluate linear functions. Hence, we need to find a function equivalent to comparing integers but is also linear in nature.
To build these circuits, we represent comparison operations as boolean values. To compute these boolean values, one can just use an equality function and lessthan comparison function to cover all possible relations pertaining to comparing two integers. Say we have lessthan function $\mathtt{LT}$, representing $a<b$, and equality function $\mathtt{Eq}$, representing $a=b$, described as below:
 $\mathtt{LT}(a,b) = 1$ if $a<b$ and $0$ otherwise.
 $\mathtt{Eq}(a,b) = 1$ if $a=b$ and $0$ otherwise.
Then the following table shows how to obtain other relations from these functions:
Relation  $a \leq b$  $a > b$  $a \geq b$  $a \neq b$ 

Function  $\mathtt{LT}(a,b) + \mathtt{Eq}(a,b)$  $1 \mathtt{LT}(a,b) \mathtt{Eq}(a,b)$  $1  \mathtt{LT}(a,b)$  $1  \mathtt{Eq}(a,b)$ 
Hence, we only concentrate on implementing these two functions homomorphically.
Equality Circuit
Before we look into the equality function, we want to look into Fermat’s Little Theorem. It states that for a prime $p$, any number $a \in \mathbb{F}_p \setminus {0}$, we have $a^{p1} \equiv 1 \pmod p$.
We can use this to get a function for checking the equality of two integers modulo a prime. Let us analyse the following function description for Eq:
\begin{equation} \texttt{Eq}(x,y) = 1  (xy)^{p1} \end{equation}
Notice that if $x=y$, $\texttt{Eq}(x,y) = 1$ as $xy=0$. Otherwise, by Fermat’s little theorem we know that $(xy)^{p1} = 1$. Hence, $\texttt{Eq}(x,y) = 0$. This circuit works for all values of prime $p$'s.
Efficiency and Noise for computing the equality circuit
To see the noise growth in a circuit, we will identify the number of homomorphic multiplications done on a ciphertext. In this case, we have a subtraction with a constant and 1 exponentiation. As this is modular exponentiation, we will need $\mathcal{O}(\log p)$ multiplications. This is also the most efficient way to implement the circuit.
Note that the most efficient implementation might not be the one having the least noise. Hence, in multiple instances, we have to face problems with noiseefficiency tradeoffs. We will show at a later instance an example of this.
Lessthan Comparison Circuit
To determine if a given integer $x$ is less than $y$, we compute the variable $z = xy$ and check the sign of z. However, we will need to implement a sign function that can output the sign of a given value. As defined earlier, we know that $\texttt{LT}(a,b) = 1$ if $a<b$ and $0$ otherwise.
For this, we firstly split the corresponding field into two parts. We denote $\mathbb{F}_p^+$ as the set of nonnegative integers and $\mathbb{F}_p^$ as the set of negative integers. So, $\mathbb{F}_p^ = [\frac{p1}{2}, 1]$, $\mathbb{F}_p^+ = [0, \frac{p1}{2}]$.
Input Size
Coming back to the comparison circuit, we wanted to determine the sign of the value $z$. To compute the sign of the value z, the following should hold true:
\begin{equation}
z = xy \in \Bigg{
\begin{array}{ll}
\mathbb{F}_ p^{+} & \mbox{if } \mathtt{LT}(x,y) = 0 \
\mathbb{F}_ p^ & \mbox{if } \mathtt{LT}(x,y) = 1
\end{array}
\end{equation}
There are 2 possible cases with the values of $x,y$:
 $x,y \in \mathbb{F}_p^+$  Then the above equations hold naturally.
 $x$ or $y \in \mathbb{F}_p^$  The above condition doesn’t hold always.
One can easily verify the claims we make above. We hence state that for any $x,y \in \mathbb{F}_p^+$, their difference $z$ belongs to $\mathbb{F}_p^ \Leftrightarrow x<y$.
From the above paragraph, it is important to note that we need to take the field size to be 2 times the size of $x$ and $y$ to be able to compute the lessthan function this way.
Lessthan comparison as a polynomial function
We want a function that can output 1 if z is negative and 0 if z is positive. We can, from the equality function described above, construct a lessthan function in the following way:
If z is negative, then we know that $z \in \mathbb{F}_p^$. Hence, there exists a unique $a \in \mathbb{F}_p^$ such that $z = a$. Hence, we can describe the function as
\begin{equation} \texttt{LT}(z) = \sum_{a = \frac{p1}{2}}^{1} \texttt{Eq}(z,a) = \sum_{a = \frac{p1}{2}}^{1} 1(za)^{p1} \end{equation}
After further simplification, one can also denote the function in the following way:
\begin{equation} \texttt{LT}(z) = \frac{p+1}{2} \cdot z^{p1} + \sum_{i = 1,\textrm{odd}}^{p2} c_i \cdot z^i \end{equation}
where $c_i = \sum_{a = 1}^{\frac{p1}{2}} a^{p1i}$ and $z=xy$.
TODO: benchmarks table
Efficiency and Noise for computing the lessthan comparison circuit
This circuit acts as a perfect example to demonstrate noiseefficiency tradeoffs in homomorphic evaluations. Look at the equation above describing $\mathtt{LT}(z)$. This equation involves computing the values $z,z^3,\cdots,z^{p2}$ for the second term and $z^{p1}$ for the first term. Another important thing to notice here is that the values of $c_i$ can be precomputed for a given field. So, in addition to computing the powers of $z$ as described above, one will have to do $\frac{p+1}{2}$ additions and $\frac{p+1}{2}$ scalar multiplications to compute the second part and finally 1 addition and 1 scalar multiplication to compute $\texttt{LT}(z)$. While all these are pretty much standard, we can compute the powers of $z$ in two ways:

Computation Optimal Circuit: We can compute $z^2$ and then iteratively multiply $z$ with $z^2$ to generate $z^3,z^5, \cdots,z^{p2}$ and get $z^{p1}$ from multiplying $z$ with $z^{p1}$. While this seems to be the most efficient way to compute it in a standard setting, we need to also account for the increase in noise when we compute it this way. Notice that the ciphertext $z^{p2}$ is generated as a result of $\frac{p1}{2}$ ciphertext multiplications. To summarise:
 Advantage: Efficient way to compute the function, ie.. least number of operations.
 Problem: We know that we need to minimise the number of ciphertext multiplications to stay within the noise threshold. This method has $\mathcal{O}(p)$ homomorphic multiplications. Concretely,

Noise Optimal Circuit: We can compute each power separately using modular exponentiation. While this seems to be highly inefficient in computation, notice that we reduce the no. of multiplications needed to generate a particular value. We can generate $z_{p2}$ in just $\mathcal{O}(\log p)$ multiplications. Therefore, the overall complexity becomes $\mathcal{O}(p \log p)$. In this case:
 Advantage: We have reduced the number of multiplications needed to generate the powers almost logarithmically.
 Problem: We are calculating each power freshly without reusing previously computed power. This increases the evaluation time noticeably.
The no. of multiplications is an extremely limiting factor, and there are more than 1 instances where even 1 or 2 more multiplications in a circuit can affect its correctness. Hence, we stick to the second method to preserve correctness. Moreover, when we use the output of this circuit, $\mathtt{LT}(z)$, as an intermediate value in some other homomorphic circuit, it can get even messier, as we see in the next section when we discuss sorting circuits.
Sorting Circuits
A comparison circuit forms the basis of any sorting circuit one would want to construct. In this article, we construct a circuit to output the sorting index of a given array. It doesn’t give us an encryption of the sorted elements but instead the encryption of the sorting index(the positions of unsorted elements in the sorted array).
Say, there is an unsorted array $A = {a_0,\cdots,a_{n1}}$. We want to receive the sorting index $I = {i_0, \cdots, i_{n1}}$ where $i_j \in {0,\cdots,n1}$ such that if $A’ = {a’_0,\cdots,a’_{n1}}$ is a sorted version of $A$, then $a’_{i_j} = a_{j}$.
Note that it is indeed possible to get the sorted version $A'$ directly, but it reduces efficiency and increases noise significantly, and hence we avoid doing it.
Sorting index circuit  Descending Order
The most efficient way to compute the sorting index, is by performing $n^2$ less than comparisons in the following way:
 We first compute a comparison matrix $\mathtt{L}$ as described below:
\begin{equation}
\mathtt{L}_ {i,j} = \Bigg{
\begin{array}
0\texttt{LT}(a_i,a_j) & \mbox{if } \mathtt{i < j}\
0 & \mbox{if } \mathtt{i = j} \ 1  \mathtt{LT}(a_j,a_i) & \mbox{if } \mathtt{i > j} \end{array} \end{equation}
For example, let’s assume $A = {7,3,6,2,5}$. Then, the sorting index $I$ would be the array ${0,3,1,4,2}$. The matrix $\mathtt{L}$ for this array $A$ is:
$$
\mathtt{L} =
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 \
1 & 0 & 1 & 0 & 1 \
1 & 0 & 0 & 0 & 0 \
1 & 1 & 1 & 0 & 1 \
1 & 0 & 1 & 0 & 0
\end{bmatrix}
$$
 We then construct vectors from the matrix L by taking each row as the vector elements. We compute the Hamming weight of this vector. If one notices the above example matrix $\mathtt{L}$, it is clear that the no. of 1’s in each row is unique and covers the set ${0,1,2,3,4}$. Similarly, for an nsized array, this generalizes to the Hamming weight being unique and covers the set ${0,1,\cdots,n1}$. Lets denote $\mathtt{L}[j]$ to be the $j$'th row and $\mathtt{wt}$ denote the hamming weight function. Then for the above example: $$ \mathtt{wt}(\mathtt{L}[0]) = 0, \mathtt{wt}(\mathtt{L}[1]) = 3, \mathtt{wt}(\mathtt{L}[2]) = 1, \mathtt{wt}(\mathtt{L}[3]) = 4, \mathtt{wt}(\mathtt{L}[4]) = 2 $$
It is clear that this indeed is the sorting index $I$ needed by us. One can compute the matrix $\mathtt{L}$ homomorphically(by evaluating all the above lessthan functions homomorphically). One can also compute the hamming weights for a particular row by adding the elements in the row in a homomorphic fashion.
Efficiency and noise for computing the sorting circuit
It is evident that the first step of the above circuit is expensive as one needs to compute $\frac{n+1}{2}$ many $\texttt{LT}$ circuits to compute the matrix $\mathtt{L}$. The second step, however, is only homomorphic addition and is cheaper. The noise growth here is the same as a single $\mathtt{LT}$ circuit(other than the increase due to homomorphic additions).
Complete sorting
One can also sort the array completely and give that as an output homomorphically. However, this needs a substantially higher number of operations and also increases noise. It involves manually checking if a particular element has a hamming weight needed using an equality circuit. We don’t implement this, but it is an important step if sorting is used as an intermediate operation in some circuit construction.
Parameter Selection and Implementation
In this section, we look into a few concrete details of the above circuits and also talk about their implementations. Here, in particular, I will be using the SEAL library implementation for the BFV HE Scheme.
Noise threshold
The noise threshold depends mainly on the degree $n$ of the polynomial you are working in and the ciphertext size $q$ (Note that ciphertext space is $R_q$). The higher the value of $n$, the more the threshold is. Though this seems like a good think to have, it is not in absolute terms as it slows down the homomorphic evaluation by a huge amount. Hence, setting an optimal $n$ value is essential for the following reasons:
 We always have a minimal n value needed based on the no of multiplications one needs to perform to evaluate their circuit. A lower $n$ value can always affect correctness of the circuit.
 Setting a higher $n$ slows down the circuit evaluation in a homomorphic setting and hence, we don’t want it to be higher than what is absolutely needed.
Field Size $p$ for encoding message
One of the most important parameters of a HE scheme is the field size used for encoding the message to be encryted as plaintext. We use the most simple encoding scheme for this scheme where we encrypt a message $m$, say an integer less than a chosen prime $p$, we can encode it into a polynomial $f(x)$ in $R_p$ by making $f$ a constant polynomial equal to $m$. However, there are various other things to consider when one encodes a message into a polynomial.
In the comparison circuit, we noticed that the input size can only be half the size of the field. In this case we can only encode integers having a value less than $\frac{p}{2}$. One will need to consider these things even when they construct circuits for other operations as it might break correctness of these algorithms otherwise.
Another important factor to consider here is the computational complexity of the circuit. As we looked into the efficiency of the circuits we described above, we noticed that we need $\mathcal{O}(p \log p)$ multiplications for the noise optimal version of the lessthan comparison circuit. In this case, we assume that the noise threshold is very low. Also, the no. of multiplications per ciphertext increases logarithmically with $p$.
Encoding messages outside the given set of values
If we want to encode a message $m$, whose value is greater than $p$ (in case of equality circuits) or $\frac{p}{2}$ (in case of lessthan comparison), we can always perform a base$p$ decomposition of the number and work on individual elements in the base decomposition. We show how to do it for numbers less than $p^2$. One can extend it to any $p^d$ and even beyond (refer paper).
Consider integers $x,y \in Z_{p^2}$ which are to be compared. Let $x = x_1 x_0$ and $y = y_1 y_0$ be the base$p$ decompositions of $x$ and $y$ respectively. Then, to check if $x=y$, one can compute the following:
\begin{equation} x = y \Leftrightarrow (x_1 = y_1) \wedge (x_0 = y_0) \Leftrightarrow \texttt{Eq}(x_1,y_1) \cdot \texttt{Eq}(x_0,y_0) = 1 \end{equation}
We can also compute $x<y$ (only if $x_i,y_i \leq \frac{p}{2}$, otherwise choose a prime greater than $2p$). We do the following to check the inequality:
\begin{equation} x < y \Leftrightarrow (x_1 < y_1) \vee (x_1 = y_1) \wedge (x_0 < y_0) \Leftrightarrow \texttt{LT}(x_1,y_1) + (\texttt{Eq}(x_1,y_1) \cdot \texttt{LT}(x_0,y_0)) = 1 \end{equation}
As we notice, this does increase the no of multiplications, but we can use recursion to slow the growth of the multiplicatiove depth to logaritmic. Moreover, basep decomposition for $p=17$ covers very big numbers(almost 1 million) just by decomposing it into 5 elements, ie.. considering numbers less than $17^5$.
Hence, it is advisable to choose prime $p$ from $17$ to $37$. We can cover a huge range of integers by the decomposition method as stated above with just these p values and still achieve good efficiency and low noise.
Usual Encoding Methods and why they won’t work
Usual encoding methods try packing many plaintexts into one ciphertext by using number theoretic methods like RNS(CRT)/NTT transforms. To use these, one faces with an additional constraint $2p \equiv 1 \pmod n$. We usually have $n$ values as high as $8192$ to $32768$ based on the noise threshold we want to set for the circuit. This constraint then forces us to set $p$ as high as $65537$ at times. This is not acceptable for the circuits described above and hence we cannot use these encodings.
In the next section we discuss implementing these things using the SEAL HE library. While one can use other libraries like Palisade which allows normal encodings ass suggested above and don’t use optimizations like NTT, we notice that we can’t use the Lattigo library as it inherently uses the optimizations and hence has an implicit requirement of the constraint listed above.