Skip to main content

Optimum noise mechanism for differentially private queries in discrete finite sets

Abstract

The differential privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specified parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries, where the parameters defining the noise distribution are finite and could be chosen optimally. This paper addresses this gap by introducing a novel framework for designing an optimal noise probability mass function (PMF) tailored to discrete and finite query sets. Our approach considers the modulo summation of random noise as the DP mechanism, aiming to present a tractable solution that not only satisfies privacy constraints but also minimizes query distortion. Unlike existing approaches focused solely on meeting privacy constraints, our framework seeks to optimize the noise distribution under an arbitrary \((\epsilon , \delta )\) constraint, thereby enhancing the accuracy and utility of the response. We demonstrate that the optimal PMF can be obtained through solving a mixed-integer linear program. Additionally, closed-form solutions for the optimal PMF are provided, minimizing the probability of error for two specific cases. Numerical experiments highlight the superior performance of our proposed optimal mechanisms compared to state-of-the-art methods. This paper contributes to the DP literature by presenting a clear and systematic approach to designing noise mechanisms that not only satisfy privacy requirements but also optimize query distortion. The framework introduced here opens avenues for improved privacy-preserving database queries, offering significant enhancements in response accuracy and utility.

Introduction

Differential Privacy (DP) is a technique used for publishing database queries that conceal confidential attributes. Some of its real-world applications are in the publication of the United States of America Census 2020 data (using disclosure avoidance system Census 2020), Google’s historical traffic statistics (Eland 2015), Microsoft’s telemetry data (Ding et al. 2017), LinkedIn user engagement information to third parties for advertisements (Rogers et al. 2020), etc.

DP hinges on a randomized mechanism wherein the data publisher, who owns the database, responds to analyst queries. The key principle is to generate similar distributions of query answers for data differing by a specific attribute, making it statistically challenging to discern whether data with that attribute were involved in the query computations.

In the existing literature, randomizing query answers commonly involves adding noise with a parametric distribution featuring one or two degrees of freedom. In contrast, this paper proposes a novel approach: optimizing all the parameters of the probability mass function (PMF) for queries with finite discrete answers. This ensures that the randomized query outcome meets DP constraints while minimizing expected distortion for a given database and discrete finite set of answers. Notably, existing literature does not undertake the optimization of noise to minimize distortion under an arbitrary \((\epsilon , \delta )\) constraint, as it typically limits the mechanism to a single parameter. Our approach stands out because it optimizes the entire distribution.

Our formulation, applicable to both discrete numerical data (for which perhaps mean squared error (MSE) is the best error metric) and categorical data (where error rate (ER) is preferable), finds resonance in real-world scenarios. For instance, queries related to discrete numerical data include: (a) The number of households in a census tract with at least one college-educated member (0 through n, where n is the total number of households in that census tract); (b) what is the most popular promotion on a website; or how many users in a certain set accepted a sales promotion on a website; (c) The hour of peak electricity usage in a neighborhood (00 through 23), etc. Similarly, queries related to categorical data are: (a) Type of consumer? (Subscriber or Trial user, Residential or Commercial, etc.); (b) What month of the year? (January: 1 through December: 12); (c) What gender is a person? (Male: 1, Female: 2, Other: 3), same idea for ethnicity, blood type, etc. For discrete numerical data, just perform the modulo addition of random noise of size \(n+1\), and for categorical data, we assign numerical values to categories and perform the modulo addition of random noise on them.

Before outlining our contributions, we review the relevant literature.

Literature review

In the literature, several papers studied the additive noise mechanisms for discrete query outputs (Geng and Viswanath 2015; Soria-Comas and Domingo-Ferrer 2013; Geng and Viswanath 2015). For discrete queries with infinite support, the additive noise mechanism for \(\epsilon -\)differential privacy that minimizes any convex function of the query error was found in Geng and Viswanath (2015); the optimum PMF is shown to have a specific decreasing staircase trend. The problem of finding the optimal data-independent noise mechanism for \(\epsilon -\)differential privacy is also addressed in Soria-Comas and Domingo-Ferrer (2013). Even though the authors focus on continuous query outputs, they claim one can easily extend the method to discrete queries. Neither paper (Geng and Viswanath 2015; Soria-Comas and Domingo-Ferrer 2013) explored the optimization of the \((\epsilon ,\delta )-\)differential privacy trade-off for \(\delta >0\). For integer query outputs, the optimal noise mechanism design for \((\epsilon ,\delta )-\)differential privacy is the subject of Geng and Viswanath (2015). Another approach to integer count queries is carried out in Ghosh et al. (2009), where a double-sided geometric distribution noise mechanism is used. A recent study on the count query DP problem is found in Cormode et al. (2019), in which the authors use a set of constrained mechanisms that achieve favorable DP properties. The related problem of publishing the number of users with sensitive attributes from a database is addressed in Sadeghi et al. (2020). In their proposed DP mechanism, they add an integer-valued noise before publishing it to protect the privacy of individuals. Though the randomized query response, produced by the proposed mechanism in Sadeghi et al. (2020), lies in the actual query support range, the additive noise PMF used depends on the query output, which requires storing several PMFs. In the context of discrete queries, an additive discrete Gaussian noise-based mechanism is proposed in Canonne et al. (2020). They show that the addition of discrete Gaussian noise provides the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. Another recent study focuses on the mechanisms of discrete random noise addition (Qin et al. 2022). In this study, the basic DP conditions and properties of general discrete random mechanisms are investigated. In Ravi et al. (2022) a randomized mechanism for the labels obtained from K-means clustering is provided using the modulo addition-based mechanism.

In the literature, a joint DP mechanism is proposed for key-value data in Ye et al. (2019), where key uses categorical data and value uses numerical data. Two potential applications have been identified: video ad performance analysis and mobile app activity analysis. The key in the former is the ad identifier, and the value is the time a user has watched this video ad, whereas the key in the latter is the app identifier, and the value is the time or frequency this app appears in the foreground. In another work (Wang et al. 2019), local DP mechanisms for multidimensional data that contain both numeric and categorical attributes are proposed.

Paper contributions

This paper critically revisits the design of DP random mechanisms, specifically focusing on ensuring \((\epsilon ,\delta )-\)DP for queries with \(n+1\) possible answers, each mapped onto the integers 0 through n. The mechanism we study involves the modulo \(n+1\) addition of noise. The significant and novel contributions of this paper can be summarized as follows:

  • Optimal Noise PMF: In Section “Numerical optimization”, we demonstrate that the additive noise PMF minimizing a linear error metric under a given \((\epsilon , \delta )\) budget can be obtained as the solution of a Mixed Integer Linear Program (MILP). Notably, for the case when \(\delta =0\), the optimum PMF can be found using a Linear Program (LP), as established in previous literature (Geng et al. 2015), which is a special case of our general formulation.

  • Explicit PMF expressions for minimum error: Sections “PMF for single distance neighborhood” and “The BD neighborhood” delve into two special cases, providing explicit expressions for the optimum PMF that minimizes error for specific \((\epsilon ,\delta )\) pairs. This analysis extends and subsumes prior work (Geng et al. 2015).

  • Structure of Optimum PMF and Error Rate: We unveil the structural characteristics of the optimum PMF and error rate functions. Specifically, the derived error rate function exhibits a piece-wise linear nature. Our findings reveal that the optimum \((\epsilon ,\delta )\) trade-off curve, for a given error rate, experiences an exponential decrease as \(\delta\) increases. Moreover, a discrete set of discontinuities in the curve precludes any change in the exponential rate of decay as \(\epsilon\) increases.

  • Numerical Validation and Comparative Analysis: The contributions outlined above are corroborated by a thorough numerical analysis presented in Section “Numerical results”. Our simulations include comprehensive comparisons with prior methods, demonstrating the superiority of our proposed approach.

Notation

In this section, we present a summary of general notations used throughout the paper. A comprehensive list of notations and abbreviations can be found at the top of the paper in the Nomenclature section.

Let \(\mathbb {N}, \mathbb {N}^+, \mathbb {Z}, \mathbb {R}\) denote the sets of natural numbers including zero, natural numbers excluding zero, integers, and real numbers, respectively. The set integers \(\{0,1, \ldots , n\}\), \(n \in \mathbb {N}\), is referred to as [n], \([n]_{+}\) is, instead, \(\{1, \ldots , n\}\). The symbols \(\lfloor k \rfloor\) and \(\lceil k \rceil\) denote the floor and ceiling functions of k, respectively. The cardinality of set \(\mathcal {A}\) is denoted by \(|\mathcal {A}|\), and the set difference of sets \(\mathcal {A}\) and \(\mathcal {B}\) is denoted as \(\mathcal {A} {\setminus } \mathcal {B}\).

The query function applied to data X from a database, denoted by \(\mathcal X\), is represented as \(\mathbbm {Q}(X)\), and \(\mathcal {Q}\) denotes a discrete finite set of query answers. In this paper, the query domain is discrete and finite, mapped onto the set [n] of size \(n+1\). The numerical outcome of the query is denoted by the variable \(q\in [n]\), while \(\tilde{q}\) represents the outcome after the randomized publication, with distribution \(f(\tilde{q}|X)\).

For vector queries with outcomes, \(\varvec{q}\), in a finite discrete domain, one can map the result onto the set [n], and hence, the optimization we propose applies. We use \(f(\eta )\) to represent the noise distribution \(f_\eta (\eta )\) whenever possible without confusing the reader.

Preliminaries

Definition 1

(\((\epsilon ,\delta )\)-Differential Privacy (DP) Dwork et al. 2006) A randomized mechanism \(\tilde{q}:\mathcal {Q} \rightarrow \mathcal {Q}\) is \((\epsilon ,\delta )\)-differentially private if for all datasets X and \(X'\) differ by a unique data record, given any arbitrary event \(\mathcal {S} \subseteq \mathcal {Q}\) pertaining to the outcome of the query, the randomized mechanism satisfies the following inequality

$$\begin{aligned} Pr(\tilde{q}(X) \in \mathcal {S}) \le e^\epsilon Pr(\tilde{q}(X') \in \mathcal {S}) + \delta , \end{aligned}$$
(1)

where \(Pr(\mathcal {A})\) denotes the probability of the event \(\mathcal {A}\) and the PMF used to calculate the events probability is \(f(\tilde{q}|X)\).

Conventionally, given the random published answer \(\tilde{q}\) in the differential privacy literature, the privacy loss function name is a synonym for the log-likelihood ratio:

$$\begin{aligned} L_{{X\!X'}}(\tilde{q})\triangleq \ln \frac{f_{\tilde{q}}(\tilde{q}|X)}{f_{\tilde{q}}(\tilde{q}|X')}, \end{aligned}$$
(2)

where \(X \in \mathcal {X}\) is the set of data used to compute the query and \(X'\) is an alternative set with a unique attribute or data point that is different. For each X we denote by \({\mathcal X}^{(1)}_{X}\) the neighborhood of set of X which contains all data sets \(X'\in \mathcal {X}\) that differ from X by a predefined sensitive attribute we want to conceal. Note that, if the event \(L_{{X\!X'}}(\tilde{q})<0\) under the experiment with distribution \(f(\tilde{q}|X)\) then, in classical statistics, the observer of the outcomes \(\tilde{q}\) will choose erroneously the alternative hypothesis that \(X'\) was queried (where the emission probability is \(f(\tilde{q}|X')\)) rather than X. By looking at the tail of the distribution for \(L_{{X\!X'}}(\tilde{q})>0\) under the distribution \(f(\tilde{q}|X)\), one can gain insights into how frequently the mechanism allows to differentiate X from \(X'\) with great confidence, leaking private information to the observer.

We now introduce the definition of \((\epsilon , \delta )\)—probabilistic differential privacy (PDP) we consider in this paper, which applies to any random quantity \(\tilde{q}\) for any given X:

Definition 2

(\((\epsilon , \delta )\)-Probabilistic DP Machanavajjhala et al. 2008) Consider random data that can come from a set of emission probabilities \(\tilde{q} \sim f(\tilde{q}|X)\) that change depending on \(X\in \mathcal {X}\). The data \(\tilde{q}\) are \((\epsilon , \delta )\)- probabilistic differentially private \(\forall X \in \mathcal {X}\) and \(X^{'}\in \mathcal {X}^{(1)}_{X}\), iff:

$$\begin{aligned} \delta&\ge \delta ^{\epsilon }_{\tilde{q}} \triangleq \sup _{X\in \mathcal {X}} \sup _{X'\in \mathcal {X}_{X}^{(1)}} Pr \left( L_{{X\!X'}} (\tilde{q})>\epsilon \right) , \end{aligned}$$
(3)

and the PMF used to calculate the probability is \(f(\tilde{q}|X)\).

The following theorem guarantees that \((\epsilon ,\delta )\)-PDP is a strictly stronger condition than \((\epsilon ,\delta )\)-DP.

Theorem 1

(PDP implies DP McClure 2015) If a randomized mechanism is \((\epsilon ,\delta )\)-PDP, then it is also \((\epsilon ,\delta )\)-DP, i.e.,

$$\begin{aligned} (\epsilon ,\delta )-\text {PDP} \! \Rightarrow \! (\epsilon ,\delta )-\text {DP}, \text {but } (\epsilon ,\delta )-\text {DP} \! \nRightarrow \! (\epsilon ,\delta )-\text {PDP}. \end{aligned}$$

The proof of Theorem 1 is shown in McClure (2015), Triastcyn and Faltings (2020). This motivates us to use \((\epsilon ,\delta )\)—PDP throughout this paper. The \((\epsilon ,\delta )\)—PDP has applications in the contexts of location recommendations Zhang and Chow (2021), privacy protection from sampling and perturbations (Sholmo and Skinner 2012), to create a realistic framework for statistical agencies to distribute their goods (Machanavajjhala et al. 2008), etc.

In our setup \(\mathbbm {Q}(X)\) is a scalar value in \({\mathcal Q}\equiv [n]\). The query response \(\tilde{q}\) is obtained by adding a discrete noise \(\eta\), whose distribution is denoted by \(f(\eta )\), i.e.:

$$\begin{aligned} \tilde{q}=\mathbbm {Q}(X)+{\eta }\,\,\Rightarrow \,\,f_{\tilde{q}}(\tilde{q}|X)=f_{\eta }(\tilde{q}-\mathbbm {Q}(X)). \end{aligned}$$
(4)

The PMF associated with the privacy loss function, called privacy leakage probability, for the additive noise mechanism, can be derived from  (2), (3) and (4):

$$\begin{aligned} Pr(L_{{X\!X'}}(\tilde{q})>\epsilon )=Pr\left( \ln \frac{f_{\eta }(\tilde{q}-\mathbbm {Q}(X))}{f_{\eta }(\tilde{q}-\mathbbm {Q}(X'))}>\epsilon \right) . \end{aligned}$$
(5)

For the discrete query case, we denote the “distance one set” of \(X \in \mathcal {X}\) as \(\mathcal {X}_{X}^{(1)} \subset {\mathcal X} {\setminus } X\) and let:

$$\begin{aligned} \mu _{{X\!X'}}\triangleq \mathbbm {Q}(X)-\mathbbm {Q}(X'), \,\,\,\,\, \forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}. \end{aligned}$$
(6)

where \(X'\) differs from X for one user data record or a sensitive user attribute. Let us define the indicator function \(u_{{X\!X'}} (\eta )\), \(\eta \in [n]\) such that \(i^{th}\) entry is one if \(L_{{X\!X'}}(\tilde{q}_i)>\epsilon\) and zero otherwise, i.e.:

$$\begin{aligned} u_{{X\!X'}} (\eta ) \triangleq&{\left\{ \begin{array}{ll} 1, &{} \,\,\,\,\,f(\eta ) > e^\epsilon f(\eta + \mu _{{X\!X'}})\\ 0, &{}\,\,\,\,\,\,\hbox {otherwise} \end{array}\right. } \end{aligned}$$
(7)

where we omitted the suffix \(\eta\) and used \(f(\eta )\) in lieu of \(f_{\eta }(\eta )\). It is easy to verify that we have:

$$\begin{aligned} Pr(L_{{X\!X'}}(\tilde{q})>\epsilon )=\sum _{\eta =0}^{n} u_{{X\!X'}} (\eta ) f(\eta ). \end{aligned}$$

Before describing our design framework in Section “Optimal additive noise”, a few considerations on \((\epsilon ,\delta )-PDP\) are in order.

Post-processing

A randomized DP mechanism maps a query output onto a distribution designed to meet Definition (1) or (3). If the randomized query answer generation requires multiple steps, it is important to ensure that the \((\epsilon ,\delta )-PDP\) (\((\epsilon ,\delta )-DP\)) are met after the very last step. In fact, in Meiser (2018) it was pointed out that, unless \(\delta =0\), in general \((\epsilon ,\delta )-PDP\) with \(\delta >0\) cannot be guaranteed after post-processing, where post-processing refers to processing steps that follow the noise addition prior to the release the query response. The objections in Meiser (2018) are valid for mechanisms that include post-processing like the popular “truncation” or “clamping” mechanisms that consists of first adding unbounded noise \(\eta\) and then projecting (clamping) the sum \(\mathbb {Q}(X)+\eta\) in the prescribed range to generate \(\tilde{q}\). In this case \((\epsilon , \delta )-DP\) are guaranteed before the post-processing step, but not after, unless \(\delta =0\). The proposition below provides guarantees for \((\epsilon ,\delta )-{PDP}\).

Proposition 1

Let \(\tilde{q}\in \mathcal {Q}\) be a randomized \((\epsilon ,\delta )-\text {PDP}\) response for query \(q \in {\mathcal Q}\). Let \(g: \mathcal {Q} \rightarrow \mathcal {Q}\) be an arbitrary invertible mapping. Then \(g \circ \tilde{q}= g(\tilde{q})\) is also a \((\epsilon ,\delta )-{PDP}\) response for any \(\delta \ge 0\). Furthermore, if \(\delta =0\), \(\epsilon -{PDP}\) is preserved, irrespective of g.

Proof

The proof is in Appendix 6. \(\square\)

Proposition 1 clarifies the importance of designing randomized responses whose domain is consistent with the query output, since it does not require post-processing to generate answers in the right set. Clamping is not bijective and changes the masses of probability in a way that alters \(\delta\) for a given \(\epsilon\).

Optimal additive noise

For queries \(q = \mathbbm {Q}(X) \in [n]\), a possible approach other than clamping is to assume that the noise addition is modulo \(n+1\) with \(\eta \in [n]\) so that the outcome \(\tilde{q}=q + \eta \,(\text {mod } n+1)\), is always in the appropriate range. In this paper, we seek to obtain the optimum noise distribution \(f(\eta )\) for such a mechanism. Since \((\epsilon , \delta )-PDP\) implies \((\epsilon , \delta )-DP\) and hence, it is a stronger notion, we use the definition of \((\epsilon , \delta )-PDP\) throughout.

Next, we omit the \((\text {mod } n+1)\) to streamline the notation, with the understanding that, from now on, sums and differences of query outcomes and noise values are always modulo \(n+1\). Observe that adding uniform noise would lead necessarily to a scalar query \({\tilde{q}}\) being uniform and thus, high privacy (i.e. \(\delta =0\) for any \(\epsilon >0\)) but poor accuracy since \(1-1/(n+1)\). This motivates the search for an optimal solution. Using (5) and (6):

$$\begin{aligned} Pr(L_{{X\!X'}}(\tilde{q})>\epsilon )&=Pr\left( \ln \frac{f(\eta )}{f(\eta + \mu _{{X\!X'}})}>\epsilon \right) . \end{aligned}$$
(8)

The reasons for using the modulo addition of noise are:

  • The randomized answers fall within the range expected for the query, which allows us to leverage Proposition 1.

  • The mechanisms require defining a single distribution rather than distinct distributions for all possible \(X\in \mathcal {X}\).

  • In the optimization, any pair with the same modulo difference results in a single \((\epsilon , \delta )-PDP\) constraint, simplifying the search for the optimum distribution.

  • The simplifications allow us to derive the optimum distribution in closed form for specific use cases.

From (8) it is evident that the \((\epsilon ,\delta )\) privacy curve is entirely defined by the noise distribution and its change due to a shift in the mean. As a result, the probability mass \(f(q+\eta )\) is obtained as a circular shift of the PMF \(f(\eta )\); therefore (8) can be used with the denominator \(f(\eta +\mu _{{X\!X'}})\) also representing a circular shift of \(f(\eta )\). This result motivates us to define the neighborhood sets, using only \(\mu _{{X\!X'}}\), in Section “Analytical solutions”.

Numerical optimization

In this section, we show that the problem of finding an optimal additive noise mechanism for a given pair \((\epsilon ,\delta )\) and expected distortion cost can be cast into a MILP formulation, i.e. an optimization problem with linear cost, linear equality and inequality constraints, and real as well as integer variables. While MILPs are non-convex, several stable solvers have convergence guarantees. Our MILP formulation (in (13a)–(13g)) finds optimum noise distributions minimizing a specified expected distortion cost:

$$\begin{aligned} \mathbb {E}[\rho _{\eta }]= \sum _{\eta =0}^n \rho _{\eta } f(\eta ), \end{aligned}$$
(9)

where \(\rho _{\eta }\) denotes the distortion caused by the noise value \(\eta \in [n]\). There are two typical metrics:

Definition 3

(Error Rate) For \(\tilde{q}=\mathbbm {Q}(X)+ \eta\), the error rate metric, denoted by \(\rho ^{\text {ER}}\), is the expected value of the function \(\rho _0=0\) and \(\rho _{\eta }=1, \eta >0\). Thus:

$$\begin{aligned} \rho ^{\text {ER}} \triangleq 1 - f(0). \end{aligned}$$
(10)

Definition 4

(Mean Squared Error) For \(\tilde{q}=\mathbbm {Q}(X) + \eta\), the MSE corresponds to \(\rho _{\eta }=\eta ^2\):

$$\begin{aligned} \rho ^{\text {MSE}} \triangleq \mathbb {E}[|\tilde{q}-\mathbbm {Q}(X) |^{2}] = \sum _{\eta =1}^{n} \eta ^{2} f(\eta ). \end{aligned}$$
(11)

Remark 1

In the case of an ‘ordered’ query domain, the MSE metric can be preferable over an error rate metric. That is why we are focusing on the minimization of any linear cost in the MILP formulation. However, the numerical results shown in Section “Numerical results” indicate that even when we target the error rate metric \(\rho ^{ER} = 1 - f(0)\), the optimal solution tends to have values that diminish as they move away from the actual query.

Because our analytical results in Section “Analytical solutions” consider the error rate metric, whenever \(\rho\) is mentioned without specification, this implies \(\rho ^{ER}\) is being discussed. Having established the cost, the constraints (see (13b)–(13h)) are derived as follows. From the databases, we calculate the set \(\{\mu _{{X\!X'}}\}\) using (6), which are the only database parameters needed in the formulation. As we know, the sum of probability masses is 1 (see (13b)). The indicator function \(u_{{X\!X'}}(\eta )\), defined in (7), can be mapped on the integrality constraint, \(u_{{X\!X'}} (\eta ) \in \{0,1\}, \forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\) (see 13h) and on two linear inequality constraints, \(f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}}) - u_{{X\!X'}} (\eta ) \le 0\) and \(e^\epsilon f(\eta +\mu _{{X\!X'}}) - f(\eta ) + e^{\epsilon }u_{{X\!X'}} (\eta ) \le e^{\epsilon }\) (see (13f) and (13g)).Footnote 1 We can rewrite (8) as \(Pr(L_{{X\!X'}}(\tilde{q})>\epsilon ) = Pr(f(\eta ) > e^\epsilon f(\eta + \mu _{{X\!X'}}))= \sum _{\eta =0}^n u_{{X\!X'}} (\eta ) f(\eta ) \le \delta\), \(\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\). Since it is a bilinear constraint, and for a MILP formulation we need the constraints to be linear, we introduce the auxiliary variables \(y_{\eta }, \eta \in [n]\):

$$\begin{aligned} y_{\eta }&\triangleq u_{{X\!X'}} (\eta ) f(\eta ), \,\,\,\,\eta \in [n]. \end{aligned}$$
(12)

so that we can use \(\sum _{\eta =0}^n y_{\eta } \le \delta\) to constrain \(\delta\) instead; to enforce \(y_{\eta }=f(\eta )\) for \(u_{{X\!X'}}(\eta )=1\), and \(y_{\eta }=0\) for \(u_{{X\!X'}}(\eta )=0\), the trick is to use, respectively, the following linear constraints: \(u_{{X\!X'}} (\eta ) + f(\eta ) - y_{\eta } \le 1\), \(u_{{X\!X'}} (\eta ) -f(\eta ) + y_{\eta } \le 1\),Footnote 2 and \(y_{\eta } - u_{{X\!X'}} (\eta ) \le 0\).Footnote 3 This completes the explanation of the optimization constraints shown in (13c)–(13e).Footnote 4 Let

$$\begin{aligned} \mathcal{M}\!\triangleq \!\{\mu _{{X\!X'}}| \mu _{{X\!X'}}=\mathbb {Q}(X)-\mathbb {Q}(X'),\forall X\!\!\in \!\! \mathcal {X},\forall X'\!\!\in \!\! \mathcal {X}_{X}^{(1)}\} \end{aligned}$$

The form of the MILP is:

$${\min _{f(\eta ),u_{{X\!X'}}(\eta ), y_{\eta }} \,\,\sum _{\eta =0}^{n}\rho _{\eta }f(\eta )}$$
(13a)
$$\text {s.t.}\,\sum _{\eta =0}^n f(\eta )= 1,$$
(13b)
$$\sum _{\eta =0}^n y_{\eta } \le \delta , \,\,\,\,\,\,y_{\eta } - u_{{X\!X'}} (\eta ) \le 0,$$
(13c)
$$u_{{X\!X'}} (\eta ) + f(\eta ) - y_{\eta } \le 1,$$
(13d)
$$u_{{X\!X'}} (\eta ) -f(\eta ) + y_{\eta } \le 1,$$
(13e)
$$f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}}) - u_{{X\!X'}} (\eta ) \le 0,$$
(13f)
$$e^\epsilon f(\eta +\mu _{{X\!X'}}) - f(\eta ) + e^{\epsilon }u_{{X\!X'}} (\eta ) \le e^{\epsilon },$$
(13g)
$$\begin{aligned} & f(\eta ),\,y_{\eta } \in [0, 1], \,\eta \in [n], \nonumber \\&\,\,\,\,\,\,u_{{X\!X'}} (\eta ) \in \{0,1\}, \forall \mu _{{X\!X'}} \in \mathcal {M}. \end{aligned}$$
(13h)

The MILP has \(3(n+1)\) variables out of which \(f(\eta )\) and \(y_{\eta }\), \(\eta \in [n]\) are real numbers (\(2(n+1)\) in total) in [0, 1] and \(u_{{X\!X'}} (\eta )\) (also \((n+1)\) in number) are integers \(\{0,1\}\). The computational complexity of the MILP solution is determined by the algorithm used, which can be branch and bound, cutting plane, or branch and cut. A detailed study on the complexity of the MILP solution is found in Jiang et al. (2021). When \(\delta = 0\), it reduces to the following LP:

$$\min _{f(\eta ),u_{{X\!X'}}(\eta ), y_{\eta }} \sum _{\eta =0}^{n}\rho _{\eta }f(\eta )$$
(14a)
$$\text {s.t.}\,\sum _{\eta =0}^n f(\eta )= 1,$$
(14b)
$$f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}})\le 0,\,\,\eta \in [n],$$
(14c)
$$f(\eta ) \in [0,1], \,\forall \mu _{{X\!X'}} \in \mathcal {M}$$
(14d)

A possible useful variant of the optimization in () that will be explored in our numerical results is to minimize \(\delta\) instead, under a distortion constraint \(\overline{\rho }\), i.e.:

$$\min _{f(\eta ),u_{{X\!X'}}(\eta ), y_{\eta }} \delta$$
(15a)
$$\text {s.t.}\,\sum _{\eta =0}^{n}\rho _{\eta }f(\eta )\le \overline{\rho },$$
(15b)
$$(13{\text{b}}) - \cdots - (13{\text{h}})$$
(15c)

In the next sections, we derive analytical solutions of (13) that minimize the error probability \(\rho ^{ER} = 1 - f(0)\) for some special database structures and corroborate the results in Section “Numerical results” comparing the formulas with the MILP solutions obtained using  Gurobi (2021) as a solver.

Analytical solutions

In this section, we analytically study the solution that minimizes the error rate \(\mathbb {E}[\rho _{\eta }]=1-f(0)\). To give closed-form solutions for the optimum PMF, we focus on the following instances of possible \(\mu _{{X\!X'}}\):

Definition 5

(Single Distance (SD)) In this setting \(\forall X \in \mathcal {X},\forall X'\in {\mathcal X}_{X}^{(1)}\), the difference \(\mu _{{X\!X'}}\) is constant, i.e. \(\mu _{{X\!X'}} = \hat{\mu }\). Note that \(\hat{\mu } \le n\).

Definition 6

(Bounded Difference (BD)) In this setting \(\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\), \(\mu _{{X\!X'}}\) take values in the set \([\bar{\mu }]_+\), \(\bar{\mu } \le n\), at least once.

The most general case is the following:

Definition 7

(Arbitrary) In this case \(\mu _{{X\!X'}}\) can take values from any subset of [n],\(\,\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\).

The next lemma clarifies that the optimum PMF for the BD case for a given \((\epsilon ,\delta )\) is useful to attain the same guarantees for the case of an arbitrary neighborhood.

Lemma 1

Suppose \(\bar{\mu }=\sup _{\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}}\mu _{{X\!X'}}\). Then any noise PMF that provides \((\epsilon ,\delta )\) privacy for the BD neighborhood with parameter \(\bar{\mu }\) will give the same guarantees in terms of \((\epsilon ,\delta )\) and \(\rho\) for the case of arbitrary neighborhoods. However, lower distortion is achievable by solving the MILP in (13).

Proof

The proof is simple: the set of constraints that need to be met to satisfy \((\epsilon ,\delta )\) privacy for the arbitrary case is a subset of the BD neighborhood case with \(\bar{\mu }\) as a parameter. This also means, however, that the minimum \(\rho ^\star\) for the latter case is sub-optimum. \(\square\)

Note that the SD neighborhood setting is a simple case while the previous lemma indicates that the BD case is more useful in general.

Next, we find an explicit solution for the optimum noise PMF \(f^\star (\eta ), \eta \in [n]\) for the SD and BD neighborhood cases. In Section “Optimal noise mechanism for vector queries”, we discuss the case of discrete vector queries whose entries are independently subjected to the mechanism vs. the optimal solution.

PMF for single distance neighborhood

The natural way to express the optimum PMF for an SD neighborhood setting is by specifying the probability masses sorted in decreasing order of f(k), \(\forall k \in [n]\). They are referred to as \(\{f_{(k)}| 1 \ge f_{(0)} \ge f_{(1)} \ge \cdots \ge f_{(n)} \ge 0\}\).

Lemma 2

Considering the case in Definition 5 where \(\mu _{{X\!X'}} = \hat{\mu }\) is constant \(\,\forall X \in {\mathcal X},\forall X'\in \mathcal {X}_{X}^{(1)}\), the noise PMF minimizing the error rate is such that \(f^\star _{(h)} = f^\star (h\hat{\mu })\), \(\forall h \in [n]\) and the inequality in (14c), can be written in terms of the sorted PMF as follows:

$$f^\star _{(h)} - e^\epsilon f^\star _{(h+1)}\le 0,\,\,\forall h \in [n].$$
(16)

Proof

The proof is in Appendix 7. \(\square\)

To start, let us consider the case \(\delta = 0\):

Lemma 3

For the SD neighborhood and \(\delta = 0\), the optimal noise PMF for the modulo addition mechanism is:

Case 1: If \((\hat{\mu }, (n+1))\) are relatively prime.

$$f_{(k)}^\star= e^{-k \epsilon } f_{(0)}^\star , \,\,\,\,k \in [n]_+,$$
(17a)
$$f_{(0)}^\star= \frac{1 - e^{-\epsilon }}{1 - e^{-(n+1)\epsilon }}\equiv f^\star (0).$$
(17b)

Case 2: If \((\hat{\mu }, (n+1))\) are not relatively prime.

$$f_{(k)}^\star \,= e^{-k \epsilon } f_{(0)}^\star , \,\,k \in [N_{\hat{\mu }}-1]_+$$
(18a)
$$f^\star _{(k)}= 0, \quad \quad \,\,\,\,\,k \in [n] {\setminus } [i\hat{\mu }], \,\forall i \in [N_{\hat{\mu }}-1]$$
(18b)
$$f_{(0)}^\star= \frac{1 - e^{-\epsilon }}{1 - e^{-(N_{\hat{\mu }})\epsilon }}\equiv f^\star (0).$$
(18c)

where \(N_{\hat{\mu }} = \frac{(n+1)}{\gcd ((n+1),\hat{\mu })}\) and \(\rho ^{\star } = 1-f^\star (0)\).

Proof

The proof is in Appendix 8. \(\square\)

Lemma 3 is verified numerically in Section “Numerical results” in both Case 1 and 2 (see Fig. 9). We illustrate the two cases in Fig. 1.

Fig. 1
figure 1

In these examples \(f^\star {(\hat{\mu })}\) is single distance, \(\hat{\mu }\), away from \(f^\star {(0)}\) hence it is assigned \(e^{-\epsilon }f^\star {(0)}\), next \(f^\star {(2\hat{\mu })}\) is assigned \(e^{-\epsilon }f^\star {(\hat{\mu })}\) since it is \(\hat{\mu }\) away from \(f^\star {(\hat{\mu })}\) and so on and so forth. So the order of assignment of values for plot (a) example is: 0, 3, 6, 1, 4, 7, 2, 5 and the order of assignment of values for plot (b) example is: 0, 2, 4, 6. Since the values at 1, 3, 5, 7 are not \(\hat{\mu }\) away from \(f^\star {(2k\hat{\mu })}, k \in [N_{\hat{\mu }}-1=3]\) they are assigned 0 value to have a higher \(f^\star {(0)}\)

From (17b) we can observe that for case 1, the error rate \(\rho ^{\star }\) depends only on \(\epsilon\) and n, and for case 2 (see (18c)), \(\rho ^{\star }\) depends on both \(\epsilon\) and \(N_{\hat{\mu }}\).

Remark 2

It is notable that in this formulation where the cost is the error rate, positive probability masses corresponding to higher noise outcomes tend to be less likely than those having smaller outcomes. This is why these designs exhibit low MSE.

Mechanisms with better error rate (lower \(\rho\)) must allow for \(\delta > 0\). It can be proven (see Theorem 2) that the optimal error rate \(\rho ^{\star }(\delta ,\epsilon )\) vs. \(\delta\) curve is piece-wise linear, interleaving flat regions with intervals with linear negative slope, see Fig. 2.

Fig. 2
figure 2

The variation of \(\rho\) as a function of \(\delta\) for SD neighborhood showing the alternate flat and linear regions

We categorize them as “linear regions” and “flat regions”. Let \(\underline{\delta }^{\epsilon }_k\) and \(\overline{\delta }^{\epsilon }_k\) be the instances of \(\delta\) at which \(\rho ^{\star }(\delta ,\epsilon )\) changes from kth linear region to kth flat region and kth flat region to \((k+1)\)th linear region, respectively.

Remark 3

We state the following results of this section, i.e., Section “PMF for single distance neighborhood”, for the case \((n+1, \hat{\mu })\) that are relatively prime. For the case where they are not, the results are obtained by replacing n with \(N_{\hat{\mu }} = \frac{(n+1)}{\gcd ((n+1),\hat{\mu })}\) in all the expressions and explanations.

Let us define the following quantities:

$$\begin{aligned} \overline{\delta }^{\epsilon }_k&:= e^{-(n-k-1)\epsilon }\frac{1-e^{-\epsilon }}{1-e^{-(n-k+1)\epsilon }}, \,\hbox {for }k\in [n\!-\!1] \end{aligned}$$
(19a)
$$\begin{aligned} \overline{\delta }^{\epsilon }_n&:=1, \,\,\,\, \underline{\delta }^{\epsilon }_0:=0 \end{aligned}$$
(19b)
$$\begin{aligned} \underline{\delta }^{\epsilon }_k&:= e^{-(n-k)\epsilon }\frac{1-e^{-\epsilon }}{1-e^{-(n-k+1)\epsilon }}, \,\hbox {for }k\in [n]_+. \end{aligned}$$
(19c)

Theorem 2

\(k\in [n]_+\), in the kth section where \(\rho ^\star (\delta ,\epsilon )\) is flat within \(\underline{\delta }^{\epsilon }_{k} \le \delta \le \overline{\delta }^{\epsilon }_k\):

$$\begin{aligned} f^\star _{(h)}&= {\left\{ \begin{array}{ll} \underline{\delta }^{\epsilon }_{k} e^{(n-k-h)\epsilon }, &{}h \in [n-k],\\ 0, &{}h \in [n-k+1:n], \end{array}\right. } \end{aligned}$$
(20)

In the portion of the kth section where \(\rho ^\star (\delta ,\epsilon )\) decreases linearly with \(\delta\), which are within \(\, \overline{\delta }^{\epsilon }_{k-1}< \delta \le \underline{\delta }^{\epsilon }_k\):

$$\begin{aligned} f^\star _{(h)}= {\left\{ \begin{array}{ll} \delta e^{(n-k-h)\epsilon }, &{}\!\!\!\!h \!\in \! [n\!-\!k],\\ e^{(n-h)\epsilon }\frac{e^{\epsilon }-1}{e^{k\epsilon }-1}\left( 1-\frac{\delta }{\underline{\delta }^{\epsilon }_k} \right) ,&{} \!\!\!\!h\!\in \! [n\!-\!k+1\!:\!n]. \end{array}\right. } \end{aligned}$$
(21)

and \(\rho ^\star (\delta ,\epsilon ) =1-f^\star _{(0)}\).

Proof

The proof is in Appendix 9. \(\square\)

From Theorem 2 we note that the boundary point \(\underline{\delta }^{\epsilon }_k\) determines the value of \(f^\star _{(n-k)}\), which is the smallest non-zero probability mass in the kth flat region. Similarly, the other boundary point \(\overline{\delta }^{\epsilon }_k \equiv e^{\epsilon } \underline{\delta }^{\epsilon }_k\) indicates the value of \(f^\star _{(n-k-1)}\), which is the smallest non-zero probability mass in the \((k-1)^{th}\) flat region. Having calculated the optimal PMF for the SD neighborhood case in Theorem 2, the \((\epsilon ,\delta )\) curves correspond to \(f^{\star }(0)=1-\rho ^{\star }\) for all its \(n+1\) possible expressions or, better stated, they are the level curves \(\rho (\delta , \epsilon )=\rho ^{\star }\). The trend of \(\delta\) curves is monotonically decreasing with respect to \(\epsilon\) for a given \(\rho ^{\star }\). Let \(\epsilon _0^{\rho ^{\star }}\) be the solution obtained setting \(f^{\star }(0)\) in (17b) to be equal to \(1-\rho ^{\star }\):

$$\begin{aligned} \epsilon _0^{\rho ^{\star }}:\,\,\rho ^{\star }=1-\frac{1-e^{-\epsilon _0^{\rho ^{\star }}}}{1-e^{-(n+1)\epsilon _0^{\rho ^{\star }}}} \end{aligned}$$
(22)

Then we must have that \(\delta =0\) for \(\epsilon \ge \epsilon _0^{\rho ^{\star }}\). Because this corresponds to a flat region for \(f^{\star }(0)\), there has to be a discontinuity moving towards lower values \(\epsilon <\epsilon _0^{\rho ^{\star }}\), and \(\delta\) must immediately jump to \(\overline{\delta }^{\epsilon _0^{\rho ^{\star }}}_0\) when \(\epsilon\) is an infinitesimal amount below \(\epsilon _0^{\rho ^{\star }}\). This point is the edge of the linear region. For a range of values of \(\epsilon <\epsilon _0^{\rho ^{\star }}\), \(\delta\) with respect to \(\epsilon\) must have the negative exponential trend \(\delta =(1-\rho ^{\star })e^{-n\epsilon }\) obtained from Eq. (21) for \(h=0,k=1\) until the next jump occurs, for a value \(\epsilon _1^{\rho ^{\star }}\) which is obtained by setting \(f^{\star }(0)\) for \(h=0\) and \(k=1\) in (20) to be equal to \(1-\rho ^{\star }\):

$$\begin{aligned} \epsilon _1^{\rho ^{\star }}:\,\,\rho ^{\star }=1-\underline{\delta }_1^{\epsilon _1^{\rho ^{\star }}} e^{(n-1)\epsilon _1^{\rho ^{\star }}}=1-\frac{1-e^{-\epsilon _1^{\rho ^{\star }}}}{1-e^{-n\epsilon _1^{\rho ^{\star }}}}. \end{aligned}$$
(23)

Following this logic, one can prove that the optimum \((\epsilon , \delta )\) curve for a given error rate \(\rho ^{\star }\) is:

Corollary 1

For a given \(\epsilon >0\), the privacy loss for the SD neighborhood case with the optimal noise mechanism, is a discontinuous function of \(\epsilon\), where:

$$\begin{aligned} \delta ^\epsilon = e^{-(n-k)\epsilon }(1-\rho ^{\star }),\,\,\,\epsilon ^{\rho ^{\star }}_{k} \le \epsilon <\epsilon ^{\rho ^{\star }}_{k-1} \end{aligned}$$
(24)

when \(\rho ^\star = 1 - f^\star _{(0)}\) is in kth section, \(k \in [n]_+\) and \(\epsilon _k^{\rho ^{\star }}\) are the solutions of the following equations:

$$\begin{aligned} \epsilon _k^{\rho ^{\star }}\!\!:\!\rho ^{\star }=1-\underline{\delta }_k^{\epsilon _k^{\rho ^{\star }}} e^{(n-k)\epsilon _k^{\rho ^{\star }}}=1-\frac{1-e^{-\epsilon _k^{\rho ^{\star }}}}{1-e^{-(n+1-k)\epsilon _k^{\rho ^{\star }}}}. \end{aligned}$$
(25)

Proof

As discussed before the Corollary, for \(\delta =0\), the level curves of \(\rho ^{\star }(\delta ,\epsilon )=\rho ^{\star }\) as a function of \(\epsilon\) must be monotonically decreasing for \(\epsilon \ge \epsilon _0^{\rho ^{\star }}\) (see (22)) as \(\epsilon\) increases. Then the curve will have discontinuities that correspond to the flat regions and the trend between these discontinuities is obtained through the equation \((1-\rho ^{\star })=f^{\star }(0)=\delta e^{(n-k)\epsilon }\), which implies \(\delta =(1-\rho ^{\star })e^{-(n-k)\epsilon }\). The kth interval starts at the point \(\epsilon =\epsilon _k^{\rho ^{\star }}\) such that \(\delta =\underline{\delta }_k^{\epsilon =\epsilon _k^{\rho ^{\star }}}\) ensures that \(f^{\star }(0)=1-\rho ^{\star }\); thus, \(\epsilon =\epsilon _k^{\rho ^{\star }}\) must be the solution of (25). \(\square\)

The BD neighborhood

Also in this case we first start with the optimum noise for \(\delta = 0\). First, let us express n as:

$$\begin{aligned} n = b\bar{\mu }+r \end{aligned}$$
(26)

where \(b = \lfloor {n/\bar{\mu }}\rfloor\) and \(r \in [\bar{\mu }-1]\). For this case, the inequalities in (14c) can be written as the following:

$$\begin{aligned} f^\star {(h)} - e^\epsilon f^\star {(h+\mu )}\le 0,\,\,\forall h \in [n], \,\forall \mu \in [\bar{\mu }]_+. \end{aligned}$$
(27)
Fig. 3
figure 3

The PMF of the optimal noise mechanism for the BD neighborhood follows a staircase pattern. In the flat region, it has \(b+1\) steps and \(i^{th}\) step height is \(\phi ^k_i\). Similarly, in the linear region, the PMF has \(b+2\) steps and \(i^{th}\) step height is \(\psi ^k_i(\delta )\)

Lemma 4

In the case of a BD neighborhood of size \(\bar{\mu }\), the optimum PMF \(f^\star (\eta )\) has the following form for \(i\in [b+1]_+\):

$$\begin{aligned} f^\star (\eta )=f^\star (0)e^{-i\epsilon }\!\equiv \phi _i, \,\,(i\!-\!1)\bar{\mu }+1\le \eta \le \min (i\bar{\mu },n). \end{aligned}$$
(28)

and the mass at zero is:

$$\begin{aligned} f^\star (0)&=\left( 1+\bar{\mu }\sum _{i=1}^b e^{-i\epsilon }+ re^{-(b+1)\epsilon }\right) ^{-1}\equiv \phi _0. \end{aligned}$$
(29)

The PMF has, therefore, a staircase trend with steps of length \(\ell _i=\bar{\mu }\) for \(i\in [b]_+\) and \(\ell _{b+1}=r\) and \(\rho ^\star =1-\phi _0\).

Proof

The proof is similar to that of Lemma 3 because it recognizes that it is best to meet the inequalities in (27) as equalities, since that allows for the largest \(f^{\star }(0)\). The only difference is that the masses in the first group \([\bar{\mu }]_+\) are equal to \(\phi _1=e^{-\epsilon }f^{\star }(0)\), thus they constrain a second group to have value \(\phi _2=e^{-\epsilon }\phi _1\) and so on. There are b of them that contain \(\bar{\mu }\) masses of probability and only the last group includes the last r values. \(f^\star (0)\) is obtained normalizing the PMF to add up to 1. \(\square\)

The PMF for \(\delta >0\) has staircase pattern (see Fig. 3), similar to Lemma 4 and, also in this case, for \(\delta >0\) the \(\rho ^\star (\delta ,\epsilon )\) has a piece-wise linear trend that alternates between flat and linear regions. However, the BD neighborhood case has an intricate pattern in which the constraints become violations, as the privacy loss \(\delta \rightarrow 1\). Rather than having \(k=n\), the number of sections k is quadratic with respect to b. To explain the trend, it is best to divide the section of the \(f^{\star }(0) \equiv 1 - \rho ^\star (\delta ,\epsilon )\) curve vs. \(\delta\) in b segments, indexed by \(h \in [b]_+\) as shown in Fig. 4. Except for the last interval corresponding to \(h=b\), the other segments, indexed by \(h \in [b-1]_+\), are further divided into h segments, indexed by \(j \in [h]_+\) and this index refers to one of the alternating flat and linear regions within the \(h^{th}\) interval. This results in \(k=\frac{b(b-1)}{2}\) alternating flat and linear regions. Instead, in the segment indexed by \(h=b\), there is only one linear region i.e., \(f^{\star }(0) = \delta\). The optimum distribution is specified in the following theorem:

Fig. 4
figure 4

The variation of \(f^{\star }(0)\) as a function of \(\delta\) for BD neighborhood showing b segments with the alternate flat and linear regions

Theorem 3

Let \(b+r<\bar{\mu }\). For a given \(\epsilon >0\) and for \(\mu _{{X\!X'}} \in [\bar{\mu }]_+\), \(\,\forall X \in \mathcal {X},\forall X'\in \mathcal {X}_{X}^{(1)}\) (i.e. the BD neighborhood), \(f^{\star }(0)\) versus \(\delta\) features flat and linear regions as shown in Fig. 4. In the first \((b-1)\) segments, indexed by \(h\in [b-1]_+\), each alternating a pair of flat and linear regions indexed \(j \in [h]_+\), with respective boundaries \(\underline{\delta }^{\epsilon }_{h,j}\le \delta \le \overline{\delta }^{\epsilon }_{h,j}\) and \(\overline{\delta }^{\epsilon }_{h,j-1} < \delta \le \underline{\delta }^{\epsilon }_{h,j}\), with the convention \(\overline{\delta }^{\epsilon }_{h,0}=\overline{\delta }^{\epsilon }_{h-1,h-1}\). The following facts are true:

(a) In the kth flat region, \(k=\sum _{j'=1}^{h-1} j'+j=\frac{(h-1)h}{2}+j\), the optimum PMF (c.f. Fig. 3a) for \(i\in [b+1]_+\) is:

$$\begin{aligned} f^\star (\eta )= f^\star (0)e^{-i\epsilon }\equiv \phi _i^k,\, \underline{\eta }_{i-1}< \eta \le \underline{\eta }_{i},\underline{\eta }_{0}=0, \end{aligned}$$
(30)

where what distinguishes the distributions for each k are the intervals \(\underline{\eta }_{i-1}< \eta \le \underline{\eta }_{i},i \in [b+1]_+\) with equal probability mass \(\phi ^k_i\). More specifically, considering the kth flat region, corresponding to the pair hj with \(h\in [b-1]_+\), \(j\in [h]_+\), the intervals \(\underline{\eta }_{i-1}< \eta \le \underline{\eta }_{i}\) of the optimum PMF, for \(i\in [b+1]_+\), have lengths \(\ell _i=\underline{\eta }_{i}-\underline{\eta }_{i-1}\):

$$\begin{aligned} \ell _i&\!=\! {\left\{ \begin{array}{ll} 1,&{} \!\!\!\!\hbox {for}\, i=0\\ \bar{\mu },&{} \!\!\!\!\hbox {for} \,i \in [b\!-\!h]_+ \\ \bar{\mu }\!-\!1,&{} \!\!\!\!\hbox {for}\,i \in [b\!-\!h\!+\!1\!:\!b], \\ {} &{}\,\,\,i \! \ne \!b\!-\!h\!+\!j\!+\!1\\ \bar{\mu },&{} \!\!\!\!\hbox {for} \,i \!=\!b\!-\!h\!+\!j\!+\!1 \hbox { when } \!j\! \ne \!h \\ r+h-u_{hj},&{} \!\!\!\!\hbox {for}\,i=b+1. \end{array}\right. }\end{aligned}$$
(31)
$$\begin{aligned} u_{hj}&:= {\left\{ \begin{array}{ll} 1, \,&{} \hbox {for}\, j \ne h,\\ 0, \,&{} \hbox {for}\, j=h. \end{array}\right. } \end{aligned}$$
(32)

The corresponding indexes sets are obtained as:

$$\begin{aligned} \underline{\eta }_{0}=0, \,\,\underline{\eta }_{i}=\underline{\eta }_{i-1}+\ell _{i},\,\,i\in [b+1]_+. \end{aligned}$$
(33)

To normalize the distribution \(f^\star (0)= \phi ^k_0\) must be:

$$\begin{aligned} \phi ^k_0&=\frac{1}{\sum _{i=0}^{b+1}\ell _i e^{-i\epsilon }} =\left( \bar{\mu }\alpha _{hj}^{\epsilon } \!+\!\beta _{hj}^{\epsilon }\right) ^{-1}, \end{aligned}$$
(34)
$$\begin{aligned} \alpha ^{\epsilon }_{hj}&:=\! e^{-\epsilon } \xi _{b}^{\epsilon } \!-\! (1\!-\!u_{hj})e^{-\!(b-h+j+1)\epsilon }, \end{aligned}$$
(35)
$$\begin{aligned}&\,\,\,\,\,\,\text {where }\xi _{a}^{\epsilon } := \sum _{i'=0}^{a-1} e^{-i'\epsilon }, \forall a\in \mathbb {N}^+ \nonumber \\ \beta _{hj}^{\epsilon }&:=1+ e^{-(b-h+1)\epsilon }(e^{-j\epsilon } - \xi _{h}^{\epsilon }) \nonumber \\&\,\,\,\,\,\,+(r+h-u_{hj}) e^{-(b+1)\epsilon } \end{aligned}$$
(36)

and the PMF is valid within \(\underline{\delta }^{\epsilon }_{h,j}\le \delta \le \overline{\delta }^{\epsilon }_{h,j}\) where:

$$\begin{aligned} \overline{\delta }^{\epsilon }_{h,j}&= \phi _{0}^{k} e^{-(b-h)\epsilon } \sum \limits _{j'=0}^{j} e^{-j'\epsilon }\nonumber \\&=\phi _{0}^{k} e^{-(b-h)\epsilon }\xi ^{\epsilon }_{j+1},j\in [h-1] \end{aligned}$$
(37a)
$$\begin{aligned} \overline{\delta }^{\epsilon }_{h,h}&=\phi _{0}^{k} e^{-(b-h-1)\epsilon }, \,\overline{\delta }^{\epsilon }_{0,0}=\phi _{0} e^{-(b-1)\epsilon }, \end{aligned}$$
(37b)
$$\begin{aligned} \underline{\delta }^{\epsilon }_{h,j}&= \phi _{0}^{k}e^{-(b-h)\epsilon } \xi _{j}^{\epsilon } \equiv \left( \frac{\phi _{0}^{k}}{\phi _{0}^{k-1}}\right) \overline{\delta }^{\epsilon }_{h,j-1} \end{aligned}$$
(37c)

(b) In each of the linear regions, i.e., \(\overline{\delta }^{\epsilon }_{h,j-1} < \delta \le \underline{\delta }^{\epsilon }_{h,j}\), the PMF values vary linearly in groups with respect to \(\delta\). The group lengths are:

$$\begin{aligned} \ell _i&= {\left\{ \begin{array}{ll} 1,&{} \hbox {for}\,i=0\\ \bar{\mu },&{} \hbox {for}\,i \in [b-h]_+\\ \bar{\mu }-1,&{} \hbox {for}\,i \in [b-h+1:b], \\ &{}\,\,\,\,\,i \! \ne b-h+j+1\\ 1,&{} \hbox {for}\,i=b-h+j+1\\ r+h-1,&{} \hbox {for}\,i=b+2\\ \end{array}\right. } \end{aligned}$$
(38)

each with probability \(\psi _i^k(\delta )\) (as shown in Fig. 3b):

$$\begin{aligned}&\psi _i^k(\delta ) = {\left\{ \begin{array}{ll} {\delta e^{(b-h-i) \epsilon }}/{\xi _{j}^{\epsilon }}, &{}\!\!\!\!i \in [b-h+j] \\ {\delta e^{(b-h-i+1) \epsilon }}/{\xi _{j}^{\epsilon }}, &{}\!\!\!\!i \!\in \! [b\!-\!h\!+\!j\!+\!2:\!b+2], \end{array}\right. }\nonumber \\&\psi _i^{b-h+j+1}(\delta ) \!= 1 -\!\!\!\!\!\! \sum \limits _{\begin{array}{c} i=0\\ i \ne b-h+j+1 \end{array}}^{b+2}\!\!\! \ell _i \psi _i^k(\delta ). \end{aligned}$$
(39)

(c) In the \(b^{th}\) segment, i.e. \(\overline{\delta }^{\epsilon }_{b-1,b-1} < \delta \le 1\), the objective function maximum \(f^\star (0)=\delta\). So any set of \(f^\star (\eta ), \,\eta \in [n]\), that satisfy \(\sum \limits _{\eta =1}^{n} f^\star (\eta ) = 1 - \delta\) and \(u_{{X\!X'}} (0) = 1, u_{{X\!X'}} (\eta ) = 0, \,\eta \in [n]_+\), provides the optimal PMF. One of the possible solutions is:

$$\begin{aligned} f^\star (0)&= \delta , \end{aligned}$$
(40)
$$\begin{aligned} f^\star (\eta )&= \frac{1-\delta }{n}, \,\,\eta \in [n]_+. \end{aligned}$$
(41)

Note that, in the last segment, the optimal PMF no longer follows the staircase pattern.

Proof

The proof is in Appendix 10. \(\square\)

In Theorem 3 we cover the case \(b+r<\bar{\mu }\). When \(b+r \ge \bar{\mu }\) the result is not conceptually difficult, but the optimal PMF is hard to express in a readable form. We discuss the general case towards the end of Appendix 10.

Corollary 2

For a given \(\epsilon >0\), the privacy loss for the BD neighborhood case with the optimal noise mechanism is also a discontinuous function of \(\epsilon\), where:

$$\begin{aligned} \delta ^\epsilon = \xi _{j}^{\epsilon } e^{-(b-h)\epsilon }(1-\rho ^{\star }),\,\,\,\epsilon ^{\rho ^{\star }}_{h,j} \le \epsilon <\epsilon ^{\rho ^{\star }}_{h,j-1} \end{aligned}$$
(42)

when \(\rho ^\star = 1 - \psi _0^k(\delta )\) is in kth section, \(k \in \left[ \frac{b(b-1)}{2}\right] _+\) and \(\epsilon _{h,j}^{\rho ^{\star }}\), \(h \in [b-1]_+,\,j \in [h]_+\), are the solutions of:

$$\begin{aligned} \epsilon _{h,j}^{\rho ^{\star }}:\,\,\rho ^{\star }=1-\phi _0^k=1-\left( \bar{\mu }\alpha _b^{\epsilon ^{\rho ^{\star }}_{h,j}} \!+\!\beta _{hj}^{\epsilon ^{\rho ^{\star }}_{h,j}}\right) ^{-1}. \end{aligned}$$
(43)

Proof

The proof is a direct extension of that for Corollary 1 and is omitted for brevity. \(\square\)

Optimal error rates as \(n \rightarrow \infty\)

In this section, we study the limit for \(n\rightarrow \infty\) of the distributions for the two cases we studied, the SD and BD neighborhoods. First, we discuss the \(\delta =0\) case. The goal is to find the relationship between \(\epsilon\) and \(\rho\) when n \(\rightarrow \infty\). From Lemma 3 (17b), we see that for SD neighborhood case, \(f^{\star }(0) \equiv 1 - \rho ^{\star }(\epsilon ) \rightarrow 1 - e^{-\epsilon } \implies \rho ^{\star }(\epsilon ) \rightarrow e^{-\epsilon }\) as \(n \rightarrow \infty\). Since, \(\rho ^{\star }(\epsilon )\) is also constrained by 0.5, we have that the limit function \(\rho ^{\star }_{\infty }(\epsilon )\):

$$\begin{aligned} \rho ^{\star }_{\infty }(\epsilon )&= {\left\{ \begin{array}{ll} 0.5, \,\epsilon \in (0, \ln {2}] \\ e^{-\epsilon }, \,\epsilon \ge \ln {2}. \end{array}\right. } \end{aligned}$$
(44)

And the optimum PMF is zero for all \(\eta \ne h\hat{\mu }\), and:

$$\begin{aligned} f_{\infty }^\star (h\hat{\mu })=(1-\rho ^{\star }_{\infty }(\epsilon ))e^{-h\epsilon }, \,\,h\in \mathbb {N} \end{aligned}$$
(45)

Similarly, for the BD neighborhood case and \(\delta =0\) in Lemma 4 as \(n \rightarrow \infty \implies b \rightarrow \infty\), from (29) we get:

$$\begin{aligned} \phi _0 \equiv 1 - \rho ^{\star }(\epsilon )&\rightarrow \left( 1+\frac{\bar{\mu }e^{-\epsilon }}{ 1-e^{-\epsilon }}\right) ^{-1} \end{aligned}$$
(46)
$$\begin{aligned} \implies \rho ^{\star }(\epsilon )&\rightarrow \frac{\bar{\mu }e^{-\epsilon }}{\bar{\mu }e^{-\epsilon } + (1-e^{-\epsilon })} \end{aligned}$$
(47)

Hence, the expression for \(\rho ^{\star }(\epsilon )\) for any \(\epsilon >0\) is:

$$\begin{aligned} \rho ^{\star }_{\infty }(\epsilon )&= {\left\{ \begin{array}{ll} 0.5, &{}\,\epsilon \in (0, \ln {(1+\bar{\mu })}] \\ \frac{\bar{\mu }e^{-\epsilon }}{\bar{\mu }e^{-\epsilon } + (1-e^{-\epsilon })}, &{}\,\epsilon \ge \ln {(1+\bar{\mu })}. \end{array}\right. } \end{aligned}$$
(48)

Each of the PMF staircase steps becomes of size \(\bar{\mu }\) and the values have an exponentially decaying trend:

$$\begin{aligned} f_{\infty }^\star (0)&=1-\rho ^\star _{\infty }(\epsilon ) \end{aligned}$$
(49)
$$\begin{aligned} f^{\star }_{\infty }(\eta )&=f_{\infty }^\star (0)e^{-h\epsilon },\,(h\!-\!1)\bar{\mu }<\eta \le h\bar{\mu }, \,h\in \mathbb {N}^+. \end{aligned}$$
(50)

For \(0<\delta \le 1\), it is convenient to use the index \(i=n-k\) looking at the trend of the distortion from \(\delta =1\), where \(f^{\star }(0)=1\) backward. Because the discontinuities between flat and linear regions happen at the points where \(\delta =\underline{\delta }^{\epsilon }_{n-i},\,i\in [n-1]\) we can see from Theorem 2 the distortion for \(i\in [n-1]\):

$$\begin{aligned} f_i^\star (0)\le \frac{1\!-\!e^{-\epsilon }}{1\!-\!e^{-(i+1)\epsilon }}\Rightarrow \rho ^\star (\delta ,\epsilon )\ge 1- \frac{1\!-\!e^{-\epsilon }}{1\!-\!e^{-(i+1)\epsilon }}, \end{aligned}$$
(51)

and the size of the intervals shrinks like an \(o(e^{-i\epsilon })\), as \(i\rightarrow +\infty\) quickly leading to the same result as \(\delta \rightarrow 0\), where the distortion tends to \(e^{-\epsilon }\) as stated before.

Similarly, for the BD neighborhood case, to find the expressions for \(\phi _0^{\infty }\) and \(\phi _i^{\infty }\), it is convenient to use a new index \(c=b-h\), looking at the trends of the distortion from \(\delta =1\), where \(f^{\star }(0)=1\), going backwards towards \(\delta =0\). In the \(b^{th}\) segment (part (c) of Theorem 3, \(f_b^\star (\eta ) \rightarrow 0\) as \(b \rightarrow \infty\) and \(n \rightarrow \infty\) for \(\eta \in [n]_+\) and thus \(f_b^\star (0) \rightarrow 1\). For \(\delta \approx 1\), in the \(c^{th}\) region we get the following expressions by using (34):

$$\begin{aligned} \alpha ^{\epsilon }_{b-c,j}&\!\rightarrow \! \frac{e^{-\epsilon }}{1-e^{-\epsilon }}; \,\,\beta ^{\epsilon }_{b-c,j} \!\rightarrow \! 1 - \frac{e^{-(c+j+1)\epsilon }}{1-e^{-\epsilon }} \end{aligned}$$
(52)
$$\begin{aligned} \implies \!\! \phi _0^{\infty }(c,j)&\rightarrow \frac{1-e^{-\epsilon }}{\bar{\mu }e^{-\epsilon } \!+\! (1\!-\!e^{-\epsilon }(1\!+\!e^{-(c+j)\epsilon }))}, \end{aligned}$$
(53)
$$\begin{aligned} \phi _i^{\infty }&\rightarrow e^{-i\epsilon } \phi _0^{\infty }, \,\,i \in [b+1]_+. \end{aligned}$$
(54)

Now, as \(c \rightarrow \infty\), the expression of \(\phi _0^{\infty }(c,j)\) in (53) converges to \(\phi _0\) as shown in (46), i.e. the result for \(\delta \rightarrow 0\).

Optimal noise mechanism for vector queries

Next, we briefly discuss the optimal noise mechanism design for vector queries to explore what difference it makes to optimize after mapping each vector onto a number in [n] vs. adopting the mechanism on an entry-by-entry basis. In fact, let each entry of a vector query be in the set \(\mathcal {Q}\). In the first case, the MILP formulation of the problem defined in (13a)–(13g) can be applied directly to vectors of queries considering the masses of probabilities as a joint PMF, with arguments corresponding to all possible tuples in \(\mathcal {Q}^k\). In Section “Numerical results”, we provide two examples for 2D vector queries—one for the BD neighborhood and another for an arbitrary neighborhood (see Fig. 13). We observe that for the BD neighborhood case, the optimum noise mechanism follows a staircase pattern in 2D as well and for the arbitrary neighborhood, the optimum noise mechanism has \(e^{-\epsilon } f^{\star }(\varvec{0})\) values at \(\varvec{\eta }= \varvec{\mu }_{{X\!X'}}\), where boldface letters are vectors.

Remark 4

In general, the optimal multidimensional noise mechanism does not amount to adding independent random noise to each query entry. In Section “Numerical results” we corroborate this statement by showing a counter-example obtained using the MILP program for the vector case, considering a two-dimensional vector query.

Numerical results

Fig. 5
figure 5

Comparison of the proposed optimal mechanism in terms of the expected distortion costs with those proposed in Ghosh et al. (2009), Canonne et al. (2020), Sadeghi et al. (2020), McSherry and Talwar (2007), Ravi et al. (2022). In plot (a), the optimal noise mechanism is compared in terms of MSE, \(\rho ^{MSE}\), with the discrete geometric mechanism, discrete Gaussian mechanism, and Gumbel mechanism for a fixed a value of \(\delta =0.3\) and \(n=8\). In plot (b), the optimal noise mechanism is compared in terms of error rate, \(\rho ^{ER}\), with the discrete geometric mechanism, discrete count mechanism, and data independent mechanism for a fixed a value of \(\delta =0.5\) and \(n=7\)

First, we compare the performance of our proposed optimal noise mechanism with the discrete geometric mechanism (Ghosh et al. 2009), discrete Gaussian mechanism (Canonne et al. 2020), classical exponential mechanism (McSherry and Talwar 2007), discrete count mechanism (Sadeghi et al. 2020), and data independent mechanism (Ravi et al. 2022). In plot 5a, the performance is compared in terms of \(\rho ^{MSE}\) vs. \(\epsilon\) for a fixed value of \(\delta =0.3\). Similarly, in plot 5b, the performance is compared in terms of \(\rho ^{ER}\) vs. \(\epsilon\) for a fixed value of \(\delta =0.5\). From the plots, it is clear that the proposed optimal noise mechanism significantly outperforms all these mechanisms.

Popular mechanisms for discrete queries are adding a random variable from a geometric distribution (Ghosh et al. 2009; Balcer and Vadhan 2019) or a quantized Gaussian distribution (see e.g. Canonne et al. 2020). Clamping is an operation in which the query response \(\tilde{q}\) is projected onto the domain [n] for any \(\eta \in \mathbb {Z}\) (Ghosh et al. 2009), i.e.:

$$\begin{aligned} \tilde{q} = \min (\max (0, q+\eta ),n). \end{aligned}$$
(55)

Let \(F_{\eta }(\eta )\) denote the cumulative distribution function of \(\eta\); then the distribution of \(\tilde{{q}}\) in terms of the distribution of \(\eta\) after clamping is as follows:

$$\begin{aligned} f_{\tilde{q}}(k|q) =&{\left\{ \begin{array}{ll} F_{\eta }(-q), &{} \,\,\,\,\,\,k = 0\\ f_{\eta }(k-q), &{}\,\,\,\,\,\,k \in [n-1]_+\\ 1-F_{\eta }(n-q) &{} \,\,\,\,\,\,k = n. \end{array}\right. } \end{aligned}$$
(56)

From (56) one can compute the \((\epsilon , \delta )\) privacy curve using (2), (3) and (56). After clamping the \((\epsilon ,\delta )\) guarantees provided by the said DP mechanisms are different from the ones calculated for \(n=+\infty\) as shown in Fig. 6, reported for the same MSE = 3.38. From the figure, it is clear that clamping increases \(\delta\) for the same \(\epsilon\) budget, and this effect is particularly pronounced in the case of the Gaussian mechanism. Hence, it is not advisable to use infinite support-based noise mechanisms, such as discrete geometric and discrete Gaussian, in tandem with clamping operations to publish the discrete query response with finite support.Footnote 5

Fig. 6
figure 6

This plot shows the adverse effect of clamping operation on the \((\epsilon ,\delta )\) trade-off for discrete geometric and discrete Gaussian mechanisms. The following parameters are used: \(n=8, \alpha = 0.7\) and \(\sigma ^2 = 3.38\)

Next, we now test the modified MILP problem (15a) of minimizing \(\delta\), constraining the error rate \(\rho = \rho ^{ER}\) and solve the MILP numerically using Gurobi as the solver. For a fair comparison, we consider the measure of errors vs. the ER and MSE, and respective parameters of the noise mechanisms viz., \(\alpha\) in Ghosh et al. (2009), \(\sigma ^2\) in Canonne et al. (2020), \(\beta '\) in Adams (2013)Footnote 6, and \(\rho\) in Sadeghi et al. (2020), Ravi et al. (2022); the results are shown in Fig. 7. In plot 7a, we show the comparison of the proposed optimal noise mechanism with the discrete geometric, discrete Gaussian, and Gumbel mechanisms for a given MSE = 0.6101. Similarly, in plot 7b, we show the comparison of the proposed optimal noise mechanism with the discrete Gaussian mechanism, discrete count mechanism, and data independent mechanisms for a given ER = 0.3. From the plots, it is clear that the proposed optimal noise mechanism significantly outperforms all these mechanisms \(\forall \epsilon > 0\).

Fig. 7
figure 7

Comparison of the proposed optimal mechanism in terms of \((\epsilon , \delta )\) trade-offs with those proposed in Ghosh et al. (2009), Canonne et al. (2020), Sadeghi et al. (2020), McSherry and Talwar (2007). In plot (a), the optimal noise mechanism is compared with the discrete geometric mechanism, discrete Gaussian mechanism and Gumbel mechanism for a fixed MSE, i.e., \(\rho ^{MSE}_{Geo.} = \rho ^{MSE}_{Gau.} = \rho ^{MSE}_{Gum.} = \rho ^{MSE}_{Opt.} = 0.6101\) and \(n=8\). In plot (b), the optimal noise mechanism is compared with the discrete geometric mechanism, discrete count mechanism, and data independent mechanism, i.e. \(f(0) = 1-\rho ^{ER}_{Ind.}\), \(f(\eta ) = \rho ^{ER}_{Ind.}/n,\,\eta \in [n]_+\), for a fixed ER, i.e., \(\rho ^{ER}_{Geo.} = \rho ^{ER}_{Cnt.} = \rho ^{ER}_{Opt.} = \rho ^{ER}_{Ind.} = 0.3\) and \(n=7\)

Fig. 8
figure 8

This plot shows the PMF of the optimal noise mechanism compared with both the Gaussian and geometric mechanisms. The following parameter values are used: \(n=8, \alpha = 0.7, \epsilon =1\), \(\delta = 0\), and \(\sigma ^2 = 3.38\)

Fig. 9
figure 9

These plots show the PMF of the optimal noise mechanisms for the SD neighborhood case and the following parameter values are used for both \(n=7, \epsilon = 0.75\). \(\hat{\mu } = 3\) is used in plot (a), whereas \(\hat{\mu } = 2\) in used in plot (b)

Now, we provide the comparison between the PMFs of optimal noise distribution with regard to Gaussian and geometric distributions in Fig. 8 for the same MSE parameter for all the distributions. From the plot, we can observe that the probability mass at \(\eta =0\) is maximum for the proposed mechanism, which validates our claim of the least error rate among these mechanisms.

In the following figures, we show the structure of the PMF associated with the optimal noise mechanism. First, we consider the SD neighborhood case. More specifically, the plots in Fig. 9 show the PMF of the optimal noise mechanisms, \(f^\star {(\eta )}, \eta \in [n]\), for SD neighborhood case for \(\hat{\mu } = 3\) in Fig. 9a and for \(\hat{\mu } = 2\) in Fig. 9b. In the left plot, we see that \(f^\star {(\eta )}\) is non-zero for all \(\eta \in [n]\) since (\(n+1, \hat{\mu }\)) are relatively prime but in the right plot, we see that \(f^\star {(\eta )}\) is zero for \(\eta \in \{1, 3, 5, 7\}\), since (\(n+1, \hat{\mu }\)) are not relatively prime (see Fig. 1 for the reasoning). From the plots we observe that as \(\delta\) increases, \(f^\star {(0)}\) increases and since error rate, \(\rho ^{ER}\), is \(1 - f^\star {(0)}\) it decreases with increase in \(\delta\). And in the right plot, at some of the \(\eta \in [n]_+\) values, zero probability mass is assigned. Hence, we see a higher value of \(f^\star {(0)}\) compared to the corresponding values in the left plot, thus having a lower error rate in the right plot for a given \(\delta\) value. Recall Lemma 2 and Lemma 3, to see that \(f^\star {(0)} \ge f^\star {(3)} \ge f^\star {(6)} \ge f^\star {(1)} \ge f^\star {(4)} \ge f^\star {(7)} \ge f^\star {(2)} \ge f^\star {(5)}\) in Fig. 9a which are represented using \(f^\star _{(i)}, i \in [n]\), respectively. Similarly, we observe \(f^\star {(0)} \ge f^\star {(2)} \ge f^\star {(4)} \ge f^\star {(6)}\) and rest \(f^\star {(1)} = f^\star {(3)} = f^\star {(5)} = f^\star {(7)} = 0\) in Fig. 9b. Next, we consider the BD neighborhood case. The plots in Fig. 10 show the PMF of the optimal noise mechanisms, \(f^\star {(\eta )}, \eta \in [n]\), for BD neighborhood case for \(\delta = 0\) in Fig. 10a and for \(\delta > 0\) in Fig. 10b. In the left plot, we clearly see the staircase pattern with step sizes equal to \(\bar{\mu }\), except for the last step. In the right plot, we see that step widths change as \(\delta\) increases its values while the staircase structure is maintained. Note that the vertical height of each step is \(e^{\epsilon }\) times higher than the previous one, as can be seen in Fig. 10b and in Table 1. Also, from the plot in Fig. 10b and Table 1, one can observe as the value of \(f^\star {(0)}\) increases (thus the value of \(\rho ^{ER}\) decreases) as \(\delta\) increases, as it is expected.

Fig. 10
figure 10

These plots show the PMF of the optimal noise mechanisms for BD neighborhood case and the following parameter values are used: \(n=8, \bar{\mu } = 3, \epsilon = 1.5\). In plot (a) we see that the optimal noise mechanism follows staircase pattern starting from \(\eta = 1\) with \(b=2\) steps of length \(\bar{\mu } = 3\) and one last step of length \(r=2\). In plot (b) we show how staircase pattern and step lengths change with \(\delta\). It can be seen at \(\delta = \overline{\delta }^{\epsilon }_{0} = 0.1212\), \(\delta = \underline{\delta }^{\epsilon }_{1} = 0.1238\), and \(\delta = \underline{\delta }^{\epsilon }_{2} = 0.1522\) step lengths are: (1,3,3,2) (blue coloured bars), (1,3,2,3) (red coloured bars), and (1,3,2,2,1) (yellow coloured bars), respectively

Table 1 The PMF of the optimal noise mechanism for different values of \(\delta\) for \(n=8, \bar{\mu } = 3, \epsilon = 1.5\)

Next, we consider the vector query case and provide a counter example to support Remark 4 for a two-dimensional vector query. Let \(n=4, \epsilon _1 =1.5, \epsilon _2 =1.5, \epsilon = 3, \mu _{XX'} = \{0,1,2\}, \delta =0\). The optimal noise mechanism for \((\epsilon _1,0)\) and \((\epsilon _2,0)\) DP are: \(f^{\star }_1 (\eta )\) = \(f^{\star }_2 (\eta )\) = [0.6469, 0.1443, 0.1443, 0.0322, 0.0322]; the values of the optimal noise mechanism PMF \(f^{\star } (\eta _1, \eta _2)\) for \((\epsilon ,0)\) are in (57):

$$\begin{aligned} \begin{bmatrix} 0.6954 &{} 0.0346 &{} 0.0346 &{} 0.0017 &{} 0.0017\\ 0.0346 &{} 0.0346 &{} 0.0346 &{} 0.0017 &{} 0.0017 \\ 0.0346 &{} 0.0346 &{} 0.0346 &{} 0.0017 &{} 0.0017 \\ 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 \\ 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 &{} 0.0017 \\ \end{bmatrix} \end{aligned}$$
(57)

The marginal distributions happen to be equal, which makes sense in terms of symmetry: \(f_1 (\eta _1) = \sum _{\eta _2=0}^n f^{\star } (\eta _1, \eta _2)\equiv f_2 (\eta _2)\) and they have masses [0.7681, 0.1073, 0.1073, 0.0086, 0.0086]. We can observe that \(f^{\star } (\eta _1, \eta _2) \ne f_1 (\eta _1)f_2 (\eta _2)\).

The plots in Figs. 11, 12, 13 and 14 are self explanatory.

Fig. 11
figure 11

These plots show the error rate for the BD neighborhood case v/s parameter \(\bar{\mu }\). In the left plot \(n=9, \,\delta =0.2\), and in the right plot \(n=9, \,\epsilon =2\) are used

Fig. 12
figure 12

Error rate \(\rho\) for the SD and BD neighborhood cases v/s parameter \(\delta\), respectively, confirming the trends predicted by Theorems 2 and 3. In plot (a) \(n=9, \,\hat{\mu }=3\) and in plot (b) \(n=9, \,\bar{\mu }=3\) are used

Fig. 13
figure 13

The optimal noise joint PMF for two-dimensional vector query case for a BD and arbitrary neighborhoods, respectively. In plot (a) the parameters are: \(n=4, \,\epsilon =3, \,\bar{\mu }_1=\bar{\mu }_2=2\). In this BD neighborhood example, the staircase pattern is similar to the scalar query case in Theorem 3. In plot (b) the following parameters are used: \(n=6, \,\epsilon =3, \,{\mu }_1= \{1, 3\}, \,{\mu }_2= \{2, 5\}\). In this arbitrary neighborhood example, the second largest peaks can be observed at the union of distance one set of each dimension, i.e., \([1,2], \,[1,5],\,[3,2], \,[3,5]\)

Fig. 14
figure 14

These plots show the PMF of the optimal noise mechanisms when the Advanced Metering Infrastructure (AMI) database is queried from 1416 houses that belong to 12 distribution circuits across California, USA. We use \(\tilde{\mu } = \bigcup _{\forall X \in \mathcal {X}} \bigcup _{\forall X'\in \mathcal {X}_{X}^{(1)}} \mu _{{X\!X'}}\). In the left plot, 11 quantization levels are used, hence \(n=10\). In this example, \(\tilde{\mu } = \{1, 2, 3, 4\}\) is observed. In the right plot, 17 quantization levels are used, hence \(n=16\). In this example, \(\tilde{\mu } = \{1, 3, 5,6\}\) is observed. From these figures we observe that \(f^\star (\eta )\) is considerably larger for \(\eta \in \{\tilde{\mu } \cup 0\}\) than those \(\eta \in [n] {\setminus } \{\tilde{\mu } \cup 0\}\)

Conclusions and future work

Considering queries whose domain is discrete and finite, in this paper we proposed a novel MILP formulation to determine what is the PMF for an additive noise mechanism that minimizes the error rate of the DP answer for any (\(\epsilon ,\,\delta\)) pair. The modulo addition between the noise and the queried data is modulo \(n+1\) equal to the size of the query domain. For two special cases, which we referred to as the SD neighborhood and bounded difference (BD) neighborhood, we have provided closed-form solutions for the optimal noise PMF and its probability of error versus \(\delta\) for a given \(\epsilon\) and studied the asymptotic case for \(n\rightarrow \infty\). We also compared the proposed optimal noise mechanism to state-of-the-art noise mechanisms and found that it significantly outperforms them for a given ER or MSE. In the future, we plan to leverage these results in several applications that have to do with labeling data as well as a building block to study theoretically queries with finite uncountable support as well as the case of vector queries, whose optimum PMF can be calculated with our MILP and does not appear to be the product of the optimum PMFs for each entry.

Availability of data and materials

Not applicable.

Notes

  1. The indicator function \(u_{{X\!X'}} (\eta )\) (in short u) can take two values (see (7)). In the first case, when \(u=1\), the Eqs. (13f) and (13g) become \(f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}}) \le 1\) and \(e^\epsilon f(\eta +\mu _{{X\!X'}}) - f(\eta ) \le 0\). In (13f), since \(\epsilon > 0\) and \(f(\eta ) \in [0,1]\), the maximum difference between \(f(\eta )\) and \(e^\epsilon f(\eta +\mu _{{X\!X'}})\) is 1 which happens when \(f(\eta ) = 1\) and \(f(\eta +\mu _{{X\!X'}}) = 0\), and for the rest of the cases difference is less than 1. The Eq. (13g) is true due to the definition of u (see (7)). Next, when \(u=0\), the Eqs. (13f) and (13g) become \(f(\eta ) - e^\epsilon f(\eta +\mu _{{X\!X'}}) \le 0\) and \(e^\epsilon f(\eta +\mu _{{X\!X'}}) - f(\eta ) \le e^\epsilon\). In this case, (13f) is true due to the definition of u (see (7)) and the other equation can be modified by multiplying \(e^{-\epsilon }\) both sides to \(f(\eta +\mu _{{X\!X'}}) - e^{-\epsilon } f(\eta ) \le 1\), which is true due to the similar arguments made earlier for the case \(u=1\).

  2. This inequality ensures that \(y_{\eta } = f(\eta )\) if \(u_{{X\!X'}} (\eta ) = 1\).

  3. This inequality ensures that \(y_{\eta } = 0\) when \(u_{{X\!X'}} (\eta ) = 0\).

  4. Note that other constraints can be added, for instance, that of having a zero mean distribution, or forcing some of the PMF values to be identical, etc.

  5. The clamped Geometric mechanism has only one predefined parameter, \(\alpha\), from which one can choose, whereas we choose all probability masses, so we do not have a single distribution, but we define a class of optimal distributions as the output of the optimization problem.

  6. To realize the classical exponential mechanism based DP (McSherry and Talwar 2007), we utilize the addition of \(Gumbel(\epsilon ,\beta ')\) distribution to the query inputs (Adams 2013), where \(\beta '\) is a parameter.

  7. Note that for \(f^\star (n\hat{\mu })\), there is no choice to assign any value since it will be automatically fixed once \(f^\star (k\hat{\mu })\), \(k \in [n-1]\) values are fixed and it is trivial to see that \(f^\star (n\hat{\mu }) = \min f^\star (k\hat{\mu }), \forall k \in [n]\).

Abbreviations

\(\mathbb {N}\) :

Set of natural numbers including zero

\(\mathbb {N}^{+}\) :

Set of natural numbers excluding zero

\(\mathbb {Z}\) :

Set of integers

\(\mathbb {R}\) :

Set of real numbers

[n]:

\(\{0, 1, \ldots , n\}\), \(n\in \mathbb {N}\)

\([n]_{+}\) :

\(\{1, \ldots , n\}\), \(n\in \mathbb {N}^{+}\)

\(\lfloor k \rfloor\) :

Floor function of k

\(\lceil k \rceil\) :

Ceil function of k

\(|\mathcal {A}|\) :

Cardinality of set \(\mathcal {A}\)

\(\mathcal {A}{\setminus }\mathcal {B}\) :

Set difference between sets \(\mathcal {A}\) and \(\mathcal {B}\)

\(\mathcal {X}\) :

Database

X :

Data point(s) in database \(\mathcal {X}\)

\(X'\) :

A neighboring dataset of X differing by a data record

\(Pr(\mathcal {A})\) :

The probability of occurrence of an event \(\mathcal {A}\)

\(\mathbb {Q}\) :

Query function

\(\mathcal {Q}\) :

Discrete set of query answers

q :

A query answer

\(\tilde{q}\) :

The randomized query answer

\(f_A\) :

The PMF of a random variable A

\(f^{\star }_A\) :

The optimum noise PMF of random variable A

\(f^{\star }_{\infty }\) :

The optimum noise PMF as \(n \rightarrow \infty\)

\(f_{(k)}\) :

The \((k+1)\)th largest probability mass

\(\epsilon\) :

DP budget

\(\delta\) :

Probability of information being leaked

\(L_{{X\!X'}}\) :

Privacy loss function (Eq. 2)

\(\mathcal {X}^{(1)}_{X}\) :

Neighborhood set of dataset X

\(\eta\) :

Discrete noise random variable

\(\mu _{{X\!X'}}\) :

Query distance between X and its distance one neighbor \(X'\)

\(u_{{X\!X'}}\) :

Indicator function which is active when the privacy leakage function is more than the privacy budget, \(\epsilon\)

\(\rho _{\eta }\) :

Distortion caused by noise value \(\eta\)

\(\mathbb {E}[A]\) :

Expectation of a random variable A

\(\rho ^{\text {ER}}\) :

Error rate

\(\rho ^{\star }\) :

Optimum error rate

\(\rho ^{\star }_{\infty }\) :

Optimum error rate as \(n \rightarrow \infty\)

\(\rho ^{\text {MSE}}\) :

Mean Squared Error

\(y_{\eta }\) :

Auxiliary variable equal to \(f(\eta )u_{{X\!X'}}(\eta )\)

\(\mathcal {M}\) :

Set of query distances, \(\forall\) \(X\in \mathcal {X}\) and \(X'\in \mathcal {X}_X^{(1)}\)

\(\hat{\mu }\) :

The constant query distance in the Single Distance case

\(\bar{\mu }\) :

The maximum query distance in the Bounded Difference case

\(\bar{\mu }\) :

The maximum query distance in the Bounded Difference case

\(\phi _i^k\) :

ith step height in the flat region of the PMF of the optimal noise mechanism for BD neighbourhood case

\(\psi _i^k(\delta )\) :

ith step height in the linear region of the PMF of the optimal noise mechanism for BD neighbourhood case

\(\underline{\delta }^{\epsilon }_k\) :

The instance of \(\delta\) at which \(\rho ^{\star }(\delta ,\epsilon )\) changes from kth linear region to kth flat region

\(\overline{\delta }^{\epsilon }_k\) :

The instance of \(\delta\) at which \(\rho ^{\star }(\delta ,\epsilon )\) changes from kth flat region to \((k+1)\)th linear region

r :

Reminder when n is divided by \(\overline{\mu }\)

b :

Quotient when n is divided by \(\overline{\mu }\)

\(N_{\mu }\) :

Ratio between \(n+1\) and gcd((\(n+1\)), \(\mu\))

\(F_A\) :

The cumulative distribution function (CDF) of random variable A

\(\alpha\) :

Parameter of geometric distribution used in the paper

\(\beta '\) :

Parameter of Gumbel distribution used in the paper

AMI:

Advanced metering infrastructure

BD:

Bounded distance

DP:

Differential privacy

ER:

Error rate

LP:

Linear program

MSE:

Mean squared error

PDP:

Probabilistic differential privacy

PMF:

Probability mass function

SD:

Single distance

GCD:

Greatest common divisor

MILP:

Mixed integer linear program

2D:

Two-dimensional

References

  • Adams R (2013) The gumbel-max trick for discrete distributions. https://lips.cs.princeton.edu/the-gumbel-max-trick-for-discrete-distributions/

  • Balcer V, Vadhan S (2019) Differential privacy on finite computers. arXiv:1709.05396

  • Canonne CL, Kamath G, Steinke T (2020) The discrete gaussian for differential privacy. Adv Neural Inf Process Syst 33:15676–15688

    Google Scholar 

  • Census U (2020) Decennial census: processing the count: disclosure avoidance modernization. https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/process/disclosure-avoidance.html

  • Cormode G, Kulkarni T, Srivastava D (2019) Constrained private mechanisms for count data. IEEE Trans Knowl Data Eng 33(2):415–430

    Google Scholar 

  • Ding B, Kulkarni J, Yekhanin S (2017) Collecting telemetry data privately. Adv Neural Inf Process Syst 30:3574–3583

    Google Scholar 

  • Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference. Springer, Berlin, pp 265–284

  • Eland A (2015) Tackling urban mobility with technology. https://europe.googleblog.com/2015/11/tackling-urban-mobility-with-technology.html

  • Geng Q, Viswanath P (2015) The optimal noise-adding mechanism in differential privacy. IEEE Trans Inf Theory 62(2):925–951

    Article  MathSciNet  Google Scholar 

  • Geng Q, Viswanath P (2015) Optimal noise adding mechanisms for approximate differential privacy. IEEE Trans Inf Theory 62(2):952–969

    Article  MathSciNet  Google Scholar 

  • Geng Q, Kairouz P, Oh S, Viswanath P (2015) The staircase mechanism in differential privacy. IEEE J Sel Topics Signal Process 9(7):1176–1184

    Article  Google Scholar 

  • Ghosh A, Roughgarden T, Sundararajan M (2009) Universally utility-maximizing privacy mechanisms. In: Proceedings of the forty-first annual ACM symposium on theory of computing. STOC ’09. Association for Computing Machinery, New York, NY, USA, pp 351–360. https://doi.org/10.1145/1536414.1536464

  • Jiang H et al (2021) On the complexity of mixed integer programming. PhD thesis, Johns Hopkins University

  • Machanavajjhala A, Kifer D, Abowd J, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: 2008 IEEE 24th International Conference on Data Engineering. IEEE, Cancun, Mexico, pp 277–286

  • McClure D (2015) Relaxations of differential privacy and risk/utility evaluations of synthetic data and fidelity measures. PhD thesis, Duke University

  • McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: 48th annual IEEE symposium on foundations of computer science (FOCS’07). IEEE, Providence, RI, USA, pp 94–103

  • Meiser S (2018) Approximate and probabilistic differential privacy definitions. Cryptology ePrint Archive. https://eprint.iacr.org/2018/277

  • Qin S, He J, Fang C, Lam J (2022) Differential private discrete noise adding mechanism: conditions and properties. In: 2022 American Control Conference (ACC). IEEE, pp 946–951

  • Ravi N, Scaglione A, Kadam S, Gentz R, Peisert S, Lunghino B, Levijarvi E, Shumavon A (2022) Differentially private K-means clustering applied to meter data analysis and synthesis. IEEE Trans Smart Grid 13(6):4801–4814. https://doi.org/10.1109/TSG.2022.3184252

    Article  Google Scholar 

  • Rogers R, Subramaniam S, Peng S, Durfee D, Lee S, Kancha SK, Sahay S, Ahammad P (2020) LinkedIn’s audience engagements API: a privacy preserving data analytics system at scale. arXiv:2002.05839

  • Sadeghi P, Asoodeh S, Calmon FdP (2020) Differentially private mechanisms for count queries. arXiv:2007.09374

  • Sholmo N, Skinner CJ (2012) Privacy protection from sampling and perturbation in survey microdata. J Privacy Confid 4(1):155–169

    Google Scholar 

  • Solver GO (2021) Gurobi. https://www.gurobi.com/

  • Soria-Comas J, Domingo-Ferrer J (2013) Optimal data-independent noise for differential privacy. Inf Sci 250:200–214

    Article  MathSciNet  Google Scholar 

  • Triastcyn A, Faltings B (2020) Bayesian differential privacy for machine learning. In: International conference on machine learning. PMLR, Proceedings of Machine Learning Research, pp 9583–9592

  • Wang N, Xiao X, Yang Y, Zhao J, Hui SC, Shin H, Shin J, Yu G (2019) Collecting and analyzing multidimensional data with local differential privacy. In: 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, pp 638–649

  • Ye Q, Hu H, Meng X, Zheng H (2019) PrivKV: key-value data collection with local differential privacy. In: 2019 IEEE symposium on security and privacy (SP). IEEE, pp 317–331

  • Zhang J-D, Chow C-Y (2021) Enabling probabilistic differential privacy protection for location recommendations. IEEE Trans Serv Comput 14(2):426–440. https://doi.org/10.1109/TSC.2018.2810890

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This research was supported by the Director, Cybersecurity, Energy Security, and Emergency Response, Cybersecurity for Energy Delivery Systems program, of the U.S. Department of Energy, under contract DE-AC02-05CH11231. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsors of this work.

Author information

Authors and Affiliations

Authors

Contributions

SK Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing- Original draft preparation, Writing- Reviewing and Editing, Visualization; AS Conceptualization, Resources, Formal analysis, Investigation, Writing- Reviewing and Editing, Supervision; NR Data curation, Visualization, Investigation, Writing- Reviewing and Editing; SP Supervision, Funding acquisition, Writing- Reviewing and Editing; BL Project administration; AS Supervision, Funding acquisition.

Corresponding author

Correspondence to Sachin Kadam.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Proof of Proposition 1

Let us define the domains of \(q, \tilde{q},\) and \(g(\tilde{q})\) to be \(\mathcal {Q}_1, \mathcal {Q}_2\), and \(\mathcal {Q}_3\), respectively. The given function is \(g: \mathcal {Q}_2 \rightarrow \mathcal {Q}_3\). We can prove that \((\epsilon ,\delta )-\text {PDP}\) is preserved in general under post-processing if \(g(\tilde{q})\) is bijective. In fact, in this case \(\mathcal {Q}_2=\mathcal {Q}_3\) and the probability mass of \(g(\tilde{q})\) in \(\mathcal {Q}_3\) domain, i.e., \(f_{g(\tilde{q})}(g(\tilde{q}))\), is equal to the probability mass of the corresponding argument \(\tilde{q}\). So, \(\forall X\in {\mathcal X}, \forall {X'\in \mathcal {X}_{X}^{(1)}}\), it is trivial to see that:

$$\begin{aligned} L_{{X\!X'}} (g(\tilde{q})) = L_{{X\!X'}} (\tilde{q}) \end{aligned}$$
(58)

and that the probability \(\delta\) of the leakage event \(L_{{X\!X'}} (g(\tilde{q}))>\epsilon\) remains the same. If \(\delta =0\) then the \(\epsilon -PDP\) is always preserved. The case \(|\mathcal {Q}_2|\ge |\mathcal {Q}_3|\) is the interesting one (e.g. clamping): in this case multiple values of \(\tilde{q}\in \mathcal {V}_g\) map onto a single value \(g(\tilde{q})=g\). Next, we show that \(L_{{X\!X'}} (\tilde{q})\le \epsilon\) with probability one implies \(L_{{X\!X'}} (g(\tilde{q}))\le \epsilon\) with probability one. In fact, since for all \(\tilde{q}\), \(f(\tilde{q}|X)\le e^{\epsilon }f(\tilde{q}|X')\):

$$\begin{aligned} L_{{X\!X'}} (g(\tilde{q}))&=\log \frac{f(g|X)}{f(g|X')}= \log \frac{\sum _{\tilde{q}\in \mathcal {V}_g}f(\tilde{q}|X)}{ \sum _{\tilde{q}\in \mathcal {V}_g}f(\tilde{q}|X')}\nonumber \\&\le \log \frac{\sum _{\tilde{q}\in {\mathcal V}_g}e^{\epsilon }f(\tilde{q}|X')}{ \sum _{\tilde{q}\in {\mathcal V}_g}f(\tilde{q}|X')}\le \epsilon \end{aligned}$$
(59)

Proof of Lemma 2

When the objective is minimizing error rate, it is natural to introduce the inequality constraint \(f^\star _{(0)} = f^\star (0) \ge 0.5=1-\rho ^{ER}\), . Hence, \(f^\star _{(0)} = \sup _{\forall k \in [n]} f^\star _{(k)}\). Now, substitute \(\eta =0\) and \(\mu _{{X\!X'}} = \hat{\mu }\) in (14c) we see that \(f^\star (0) \le e^{\epsilon }f^\star (\hat{\mu })\). Since we are minimizing the sum of the mass away from zero, we assign \(f^\star (\hat{\mu })\) to the minimum possible value, which is \(f^\star (\hat{\mu }) = e^{-\epsilon }f^\star (0)\) in this case. Similarly, from (14c) we see that \(f^\star (k\hat{\mu }) \le e^{\epsilon }f^\star ((k+1)\hat{\mu })\), \(k \in [n-1]_+\), and we are minimizing the sum of the mass away from zero, we need to assign \(f^\star ((k+1)\hat{\mu })\) to the minimum possible value, which is \(f^\star ((k+1)\hat{\mu }) = e^{-\epsilon }f^\star (k\hat{\mu })\).Footnote 7 Since \(\epsilon > 0\), \(f^\star (k\hat{\mu }) \ge f^\star ((k+1)\hat{\mu })\), \(\forall k \in [n]\). Hence, we can write \(f^\star _{(k)} = f^\star (k\hat{\mu })\), \(\forall k \in [n]\).

Proof of Lemma 3

Case 1: \((\hat{\mu }, (n+1))\) are relatively prime.


The proof logic is as follows. Since minimizing the error rate is equivalent to having the maximum mass possible at \(\eta =0\), we expect \(f^\star (0) = \sup _{\eta \in [n]}f^\star (\eta )\). The constraint (16), implies \(f^\star (\hat{\mu }) \ge e^{-\epsilon } f^\star (0)\). Also that for any \(\eta = k\hat{\mu },\,k\in [2:n]\), \(f^\star (k\hat{\mu }) \ge e^{-\epsilon } f^\star ((k-1)\hat{\mu }) \ge e^{-k\epsilon } f^\star (0)\) and \(f^\star ((n+1)\hat{\mu }) \ge e^{-\epsilon } f^\star (n\hat{\mu })\). From all these inequalities and the constraint (14a), we conclude that what would allow having the largest mass of probability at \(\eta =0\) is meeting all constraints as equality, starting from the first. Since \(\hat{\mu }\) and n are prime, the multiples of \(\hat{\mu }\) eventually cover the entire range [n] and therefore: \(f^\star (k\hat{\mu }) = e^{-k\epsilon } f^\star (0),\,k \in [n]\). This result leads to the optimum distribution in Lemma 3. Now, \(f^\star (0)\) can be computed as follows:

$$\begin{aligned} f^\star (0)&+ e^{-\epsilon }f^\star (0) + \ldots + e^{-n\epsilon }f^\star (0) = 1 \end{aligned}$$
(60a)
$$\begin{aligned} \Rightarrow \quad f^\star (0)&= \frac{1 - e^{-\epsilon }}{1 - e^{-(n+1)\epsilon }}. \end{aligned}$$
(60b)

Case 2: \((\hat{\mu }, (n+1))\) are not relatively prime.


The same argument of Case 1 holds for \(k \in [N_{\hat{\mu }}-1]_+\), where \(N_{\hat{\mu }} = \frac{(n+1)}{\gcd ((n+1),\hat{\mu })}\), i.e., \(f^\star (k \hat{\mu }) = e^{-k \epsilon } f^\star (0), \,\,\,\,k \in [N_{\hat{\mu }}-1]_+\). However in this case, since for \(k = N_{\hat{\mu }}\), \(f^\star (N_{\hat{\mu }} \hat{\mu }) = f^\star (0)\) and the cycle repeats over the same exact values covered from zero up to \((N_{\hat{\mu }}-1)\) which does not include all PMF entries. Since we are minimizing the objective function to satisfy all the inequality constraint \(f^\star _{(0)} = f^\star (0) \ge 0.5=1-\rho ^{ER}\) and (16) for all k values that are not constraining \(f^\star (0)\), the best choice is to assign them zero, i.e., \(f^\star (k) = 0,\,k \in [n] {\setminus } [i\hat{\mu }], \,\forall i \in [N_{\hat{\mu }}-1]\).

Proof of Theorem 2

We focus on Case 1, as Case 2 follows from Remark 3. In Lemma 2, we clarified that it is best to deal with ordered values, and in Lemma 3, we specified \(f^\star _{(h)}\), \(h \in [n]\), as a function of \(\epsilon >0\) for \(\delta =0\). The best solution for \(\rho (\epsilon ,\delta )\) initially does not change until violating an inequality in (16) yields better accuracy. This happens as soon as the second smallest value \(f^{\star }_{(n-1)}\) corresponding to \(\delta =0\) is \(f^{\star }_{(n-1)}=\delta\), which is the upper-limit \(\overline{\delta }_0\) from (). The reason why it is \(f^{\star }_{(n-1)}\) and not \(f^{\star }_{(n)}\) matters because surely \(f^{\star }_{(n)}<e^{\epsilon }f^{\star }(0)\) which in the modulo n sum is the value that follows and that we aim at maximizing. At this point, for \(\overline{\delta }_0<\delta \le \underline{\delta }_1\) the value of \(f^{\star }_{(n-1)}=\delta\), all the values for \(0\le h<n-1\) meet the constraints with equality and thus \(f^{\star }_{(h)}=e^{(n-1-h)}\delta\), while the last value \(f^{\star }_{(n)}\) progressively diminishes until it becomes zero, as shown in equation (21), at the start of the next flat region. This pattern continues until eventually one by one all \(n-1\) masses become zero except for \(f^{\star }(0)=1=\underline{\delta }_n^{\epsilon }\).

Proof of Theorem 3

In Lemma 4, we have the expressions for \(\phi _i, \,i \in [b+1]_+\) for \(\epsilon >0\) and \(\delta =0\). The best solution for \(\rho ^{\star }(\delta ,\epsilon )\) does not change w.r.t \(\delta >0\) until violating an inequality in (27) to yield better accuracy. The key to the proof is understanding that the first inequality to be violated occurs when considering \(\delta\) greater or equal not to the smallest PMF value but to the PMF values of the third to the last group in \(\delta =0\) case, i.e., \(\phi _{b-1} = \delta\), whose value equals the first boundary point \(\overline{\delta }^{\epsilon }_{0,0}\) (see (37b)). The inequality violated is with respect to the PMF of the second to the last group, which is \(\phi _{b}\), which becomes \(=\delta\) and \(\phi _{b-1}>e^{\epsilon }\delta\). The reason why the PMF of the last group, i.e. \(\phi _{b+1}\), does not violate the inequality in (27) is that \(\phi _{b+1} < e^{\epsilon }\phi _0\) (we use \(\phi _0\) since we are doing modulo summation) is always true for all members of this group, due to the fact that \(\phi _0\) is the objective function which we are maximizing. Similarly, the PMF of the second last group, i.e. \(\phi _{b}\), does not violate the inequality in (27) because some members of this group do not violate the inequality \(\phi _{b} < e^{\epsilon }\phi _0\) for the same reason as stated above. At this point, for \(\overline{\delta }_{0,0}<\delta \le \underline{\delta }_{1,1}\) the value of \(\phi _{b-1}=\delta\), the PMF values for \(i \in [b-1]\) meet the constraints with equality and thus \(\psi _i^1(\delta )=\delta e^{(b-1-i)\epsilon }\), for \(i \in [b-1]\). At this point, the group with PMF \(\phi _b\), whose length is \(\ell _b=\bar{\mu }\), splits into two groups, one of length \(\ell _b=(\bar{\mu }-1)\) and the other with a singleton step \(\ell _{b+1}=1\). This split happens to assign more probability mass at \(\psi _0^1(\delta )\), which is our objective, while still violating the constraint in (27) between \(\delta = \psi _{b-1}^1(\delta )\) and \(\psi _{b+1}^1(\delta )\) in order to lower the error further. For \(i=b\) and \(i=b+2\), the PMF values satisfy the constraints with equality, hence \(\psi _{b}^1(\delta )=\delta e^{-b\epsilon }\) and \(\psi _{b+2}^1(\delta )=\delta e^{-(b+1)\epsilon }\) while the unconstrained singleton step PMF \(\psi _{b+1}^1(\delta )\) decreases until it joins the next group since it has matched its value, and becomes \(\phi ^1_{b+1}\), as shown in equation (30), at the start of the next flat region, i.e., for \(\underline{\delta }_{1,1} \le \delta \le \overline{\delta }_{1,1}\). In this region, the PMF of every group is \(e^{-\epsilon }\) times the PMF of previous group similar to Lemma 4; the only change here is the lengths of \(b^{th}\) and \((b+1)^{th}\) groups which are now \(\ell _b = (\bar{\mu }-1)\) and \(\ell _{b+1} = r+1\).

Now we provide the reason for splitting only groups of lengths \(\bar{\mu }\) using contradiction. Suppose there is a group i of length \(\ell < \bar{\mu }\) which splits at the beginning of a linear region \(k \in [b]_+\). As we know, for this group to split there must be a violation of the inequality in (27) between the PMF values \(\phi _{i-1}^{k-1} = \delta\) and \(\phi _i^{k-1}\). Now, \(\phi _i^{k-1}\) splits into \(\psi _{i}^k(\delta )\) of length \((\ell -1)\) and \(\psi _{i+1}^k(\delta )\) of length 1, which decreases as \(\delta\) grows. We see that \(\psi _{i+1}^k(\delta )\) is \(\le \bar{\mu }\) distance away from at least two members of the \(\psi _{i-1}^k(\delta )\) group leading to at least two violations in the inequalities in (27) which makes the actual privacy loss \(\ge 2\delta\), in violation of the inequality constraint in (13c), which is a contradiction.

From the discussion in the previous paragraph, for the next linear region, we now search for a group of length \(\bar{\mu }\) with the smallest possible PMF, which is found to be \((b-1)^{th}\) group, whose PMF is \(\phi ^1_{b-1}\). For this group to split and enter into the linear region, we should have \(\delta =\phi ^1_{b-2}\). As explained in the first paragraph of this proof, the same process follows in this linear region too, i.e., for \(\overline{\delta }_{1,1} < \delta \le \underline{\delta }_{2,1}\). In the next flat region, i.e., for \(\underline{\delta }_{2,1} \le \delta \le \overline{\delta }_{2,1}\) the step lengths, as compared to previous flat region, which have been altered are \(\ell _{b-1} = (\bar{\mu }-1)\) and \(\ell _{b} = \bar{\mu }\). Now, observe that \(b^{th}\) group has length \(\bar{\mu }\) and from the discussion in the previous paragraph, this group splits at the end of this flat region, i.e., when \(\delta =\phi ^2_{b-2} + \phi ^2_{b-1} = \phi ^2_{0} e^{-(b-2)\epsilon } (1+e^{-\epsilon }) \equiv \overline{\delta }_{2,1}\). In this case the inequality in (27) are violated by both \(\phi ^2_{b-2}\) and \(\phi ^2_{b-1}\), hence the summation and the reasoning for (37b) (this was missing in SD neighborhood case). For the same reason, in the next linear region i.e., for \(\overline{\delta }_{2,1} < \delta \le \underline{\delta }_{2,2}\) the normalizing factor \((1+e^{-\epsilon }) \equiv \xi _2^\epsilon\) is used while computing the PMF values in (39).

These alternate flat and linear intervals are formed as \(\delta\) increases and the process continues until all the remaining groups of lengths \(\bar{\mu }\) split into groups of lengths \((\bar{\mu }-1)\). The expression for \(\underline{\delta }_{h,j}\), \(h \in [b-1]_+, \,j \in [h]_+\) in (37c) is computed using \(f^\star (0)\) vs. \(\delta\) curve (see Fig. 4) in the linear region between \(\phi ^{k-1}_{0}\) and \(\phi ^{k}_{0}\), the slope corresponding to \(\psi _0^k(\delta )\) i.e. \(e^{(b-h)\epsilon }/ \xi _j^{\epsilon }\) from (39). i.e.,

$$\begin{aligned} \underline{\delta }_{h,j}&= \overline{\delta }^{\epsilon }_{h,j-1} + (\phi ^{k}_{0} - \phi ^{k-1}_{0}) e^{-(b-h)\epsilon } \xi _j^{\epsilon } \nonumber \\&= \phi ^{k-1}_{0} e^{-(b-h)\epsilon } \xi _j^{\epsilon } + (\phi ^{k}_{0} - \phi ^{k-1}_{0}) e^{-(b-h)\epsilon } \xi _j^{\epsilon } \nonumber \\&= \phi ^{k}_{0} e^{-(b-h)\epsilon } \xi _j^{\epsilon }. \end{aligned}$$
(61)

Now, we find the simplified expression for the value of \(\phi ^k_{0}\) from the fact that \(\phi ^k_{0} + \sum _{i=1}^{b+1} \ell _i \phi ^k_{i} = 1\) as following:

$$\begin{aligned}&\phi ^k_{0} \left( 1 + \bar{\mu } \sum _{i=1}^{b-h} e^{-i\epsilon } \right. + (\bar{\mu }-1)\!\!\!\sum _{\begin{array}{c} i=b-h+1\\ i \ne b-h+j+1 \end{array}}^{b} e^{-i\epsilon } \nonumber \\&\quad + \bar{\mu } u_{hj}e^{-(b-h+j+1)\epsilon } \left. + (r+h-u_{hj}) e^{-(b+1)\epsilon }\right) \!\!=\!\! 1 \end{aligned}$$
(62)

By further simplifying, we get:

$$\begin{aligned} \phi ^k_{0}&= \left( 1+\bar{\mu } \sum _{i=1}^{b} e^{-i\epsilon } -\bar{\mu }(1-u_{hj}) e^{-(b-h+j+1)\epsilon }\right. \\&\quad + e^{-(b-h+j+1)\epsilon }\!+\!(r\!+\!h\!-\!u_{hj}) e^{-(b+1)\epsilon } \!-\! \left. \!\!\!\!\!\!\sum _{i=b-h+1}^{b}\!\!\!\!\!\!\! e^{-i\epsilon } \right) ^{-1} \\ \phi ^k_{0}&= \left( \bar{\mu } \alpha ^{\epsilon }_{hj} + \beta ^{\epsilon }_{hj}\right) ^{-1}, \,\hbox {see (34)}. \end{aligned}$$

The last group has length \(b+r\) which is less than \(\bar{\mu }\) in our case. This assumption simplifies the analysis, hence tractable. Even though the pattern is same, the case \(b+r \ge \bar{\mu }\) complicates the analysis because, for \(j \in [h]_+\) and for every \(h \in [b]_+\), instead of increasing h when \(j =h\), j must be increased beyond h to accommodate the groups of lengths \(\bar{\mu }\) created by the last group whenever its length \(r+h\) exceeds \(\bar{\mu }\). Hence, we do not discuss the analysis of this case, but we do provide some numerical results in Section “Numerical results”.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadam, S., Scaglione, A., Ravi, N. et al. Optimum noise mechanism for differentially private queries in discrete finite sets. Cybersecurity 7, 54 (2024). https://doi.org/10.1186/s42400-024-00239-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s42400-024-00239-3

Keywords