Verify Many Signatures Efficiently
Introduction
In data processing applications like federated learning and confidential analytics, it is often necessary to prove that a function was correctly executed over inputs originating from specific sources. The authenticity of such data is typically attested via a digital signature by the data source such that any recipient can verify the signature and thereby, the origin of the data source.
comp_output to a Verifier with a proof of correctness proof. Both the Prover and Verifier know the public keys of all Data Sources.
A straight-forward approach for generating SNARKs over authenticated data is executing the signature verification logic inside the SNARK circuit. However, it turns out that the arithmetic operations involved in verifying digital signatures significantly increase the complexity of the circuit in terms of number constraints, which the prover has to suffer. This impact is further amplified when a circuit deals with many signatures, such as in a federated learning scenario.
This Post
Incorporating data authentication into SNARK-based protocols faces the following challenges:
- Prover Efficiency: How can the prover attest to data provenance efficiently.
- Verifier Efficiency: How can this attestation be efficiently verified.
- Data Secrecy: Whether efficiency towards the prover or verifier has security limitations.
A SNARK circuit that encodes all required digital signature verification enables the prover to use the ZK property and keep secret (part of) the data, the signature, or both. But as shown above, this does not come for free, but against a computation cost the Prover has to pay. In this post, we will look at different approaches of proving functions over authenticated data, analyzing their efficiency and their implications on security.
Setting
Before getting into the different approaches we first establish the setting and notation that will be used.
Parties
In our target setting, there are 3 types of parties: DataSource, Prover, and Verifier.
Data Source. A Data Source with ID i has a signing key pair (sk, pk) produces
data with signature sig.
Prover. Peggy is a Prover party that interacts with n data sources.
It maintains an array of public keys pubkeys[], an array of data dataset[],
and an array of signatures sigs[] of size n each, such that sigs[i]
is a signature over dataset[i] under public key pubkeys[i], where i
is the ID of the data source.
Verifier. Victor is a Verifier party that also knows the array pubkeys[] and
wishes to obtain the output of an arbitrary function over the entire dataset,
i.e., comp_output = f(dataset), where Peggy carries out this computation and
communicates the result.
High-level Goals
Victor requires a convincing argument that
comp_outputis the correct output off(dataset)- All data items in the input
dataset[]originate from data sources that Victor expect
To fulfill the first requirement, Peggy will use a SNARK system to prove that
the execution of the arbitrary function f(dataset) produced the output
comp_output.
In the upcoming sections we will look at various approaches of fulfilling the second requirement.
Algorithms
In each proposed approach, we describe a DataSource, Prover, and a Verifier algorithm.
Therein, all data sources execute the DataSource algorithm, sending the output
to Peggy.
Peggy plugs those outputs into the Prover algorithm and sends the output to
Victor. Finally, Victor inputs the data from Peggy into the Verifier
algorithm and obtains a verification result.
Both Peggy and Victor are assumed to have pre-existing knowledge of the array
pubkeys[], thus it is not communicated but are made available to their
respective algorithms as part of the inputs.
For a concise description of the algorithms, we devise a pseudo-code
notation. As an example, the following pseudo-code describes the algorithm
DataSource.
BEGIN DataSource
IN: data, sk
OUT: data, sig
EXEC:
SET sig = Sign(sk, data)
END DataSource
The algorithm DataSource takes as input (denoted via IN:) a data item
data and a signing key sk.
Denoted via OUT:, it outputs the same data item data and a
signature sig, which is what gets sent to Peggy.
Following EXEC: is how the algorithm computes sig.
This DataSource algorithm is what is implicitly used in the following
approaches (unless stated otherwise).
Baseline Approach: Signatures Inside the Circuit
To establish a baseline reference, we let the signature verification be encoded in the proving circuit, where in addition to proving correctness of the function, Peggy also needs to prove validity of all of the signatures.
BEGIN Prover
IN: dataset[], sigs[], pubkeys[]
OUT: comp_output, proof
EXEC:
SET comp_output = f(dataset[])
SET proof = BEGIN Proof
PRIVATE: dataset, sigs
PUBLIC: pubkeys, comp_output
CONSTRAINTS:
comp_output == f(dataset)
FOR ALL indices i in dataset:
Sig.verify(pubkeys[i], dataset[i], sigs[i]) == 1
END Proof
END Prover
Peggy executes the Prover algorithm that computes f(dataset[]).
The algorithm also produces a proof that:
- this output is correct
- the data set input to
fhad correct signatures under the array of public keyspubkeys.
Peggy sends Victor proof and the computation output comp_output, nothing else.
Victor receives proof, func_output from Peggy, and using his list of pubkeys, he
executes the Verifier algorithm:
Begin Verifier
IN: proof, comp_output, pubkeys[]
OUT: verification_result
EXEC:
SET verification_result = VerifyProof(proof, comp_output, pubkeys)
End Verifier
Efficiency Analysis
As conveyed in the introduction, this approach has high cost for the Prover who need to encode all signature verification steps into the circuit. For the Verifier, however, it is a best-case scenario as the it only needs to process a succinct proof.
Security Analysis
In this setting, Peggy is able to hide all data as well as the signatures.
Thus, after execution, Victor effectively learns nothing about the data, except
what is leaked by the computation output func_output.
This is essentially a best case scenario for secrecy, since Peggy relies entirely on the ZK property of SNARKS for delivering the authenticity guarantees that Victor requires.
Approach I: Signatures Outside Circuit
The simplest approach for optimizing proofs over authenticated data is to not carry out verification of signatures in the circuit, but to move them outside. In this setting, Peggy generates a proof of correctness of the desired function, but not of any signature validity.
BEGIN Prover
IN: dataset[], sigs[], pubkeys[]
OUT: comp_output, dataset[], sigs[], proof
EXEC:
SET comp_output = f(dataset)
SET proof = BEGIN Proof:
PUBLIC: dataset[], comp_output
CONSTRAINTS:
comp_output == f(M)
END Proof
END Prover
In addition to sending Victor comp_output and proof, Peggy must also send
dataset and sigs such that Victor can verify authenticity of these
messages, and that they are the same messages used in generating the proof.
Thus Victor receives comp_output, dataset, sigs and proof, and verifies both
the proof and all signatures:
Verifier
IN: proof, comp_output, pubkeys[], dataset[], sigs[]
OUT: verification_result
EXEC:
SET proof_result = VerifyProof(dataset, comp_output, proof)
SET sig_result = 1
FOR ALL indices i in dataset:
SET sig_result = sig_result AND Sig.verify(pubkeys[i], dataset[i], sigs[i])
SET verification_result = proof_result AND sig_result
END Verifier
Efficiency Analysis
The approach strips away all the overhead of data authenticity checks from
the proof generation circuit.
In this setting, Prover Efficiency is maximized since only
the function f needs to be encoded in the circuit.
However, the Verifier efficiency suffers. Victor now is required to receive and verify all data and signatures, in addition to verifying the proof itself, effectively making the protocol as a whole non-succinct.
Security Analysis
Signature checks are removed from the circuit and are carried out by the Verifier external to the circuit. This step alone would be insufficient to guarantee authenticity, as there must be a way to ensure that those data whose signatures have been verified are the same data used in the proof generation. To do that, the Verifier inputs the data into the proof verification algorithm. This mechanism prevents the prover from making using of the ZK property in hiding any of the input data, since the verifier needs to learn them. Such protocol is would only make sense if Peggy has no secrecy requirements for the data, and also if the computation is too expensive for Victor, otherwise Victor can just carry out the computation himself.
In the next approach, we will see how we can bring back some level of confidentiality.
Approach II: Sign Commitment instead of Data
Moving verification of the signatures outside the circuit prevented Peggy from
hiding any data.
We can work-around this limitation by making use of a commitment scheme, and
have the signature be over commitments of the data rather than the raw data.
To this end, we modify the DataSource algorithm as follows:
Begin DataSource
IN: data, sk
OUT: data, sig, comm, opening
EXEC:
SET comm, opening = Commit(m)
SET sig = Sign(sk, comm)
End DataSource
A data source then sends to the Prover its data, a commitment comm, a
random opening that opens the commitment to data, and a signature sig
over this commitment.
Peggy then executes the following Prover algorithm:
Begin Prover
IN: dataset[], sigs[], pubkeys[], comms[], openings[]
OUT: comp_output, comms[], sigs[], proof
EXEC:
SET comp_output = f(M)
SET proof = BEGIN Proof
PRIVATE: dataset[], openings[]
PUBLIC: comms
CONSTRAINTS:
comp_output == f(dataset)
FOR ALL indices i in dataset:
VerifyCommitment(comms[i], openings[i], dataset[i])
End Proof
End Prover
In addition to proving correctness of comp_output, Peggy additionally proves that
all commitments for data in dataset used in producing comp_output are correct
with respect to the private openings.
Peggy then sends Victor comp_output, proof, as well as the set of commitments
comms and set of signatures sigs over those commitments (but not
openings or dataset).
Upon receiving proof, comp_output, comms, sigs, Victor computes:
Begin Verifier
IN: proof, comp_output, pubkeys[], comms[], sigs[]
OUT: final_result
EXEC:
SET proof_result = Verifyproof(comms, comp_output, proof)
SET sig_result = 1
FOR ALL indices i in dataset:
SET sig_result = sig_result AND Sig.verify(pubkeys[i], comms[i], sigs[i])
SET final_result = proof_result AND sig_result
End Verifier
Thus in addition to verifying the proof, Victor verifies all signatures over the received commitments.
Efficiency
Let H be a cryptographic hash function, we can construct a commitment scheme
as follows:
BEGIN Commit
IN: data
OUT: comm, opening
EXEC:
SET opening = Sample a uniform value in H's input space
SET comm = H(data || opening)
END Commit
BEGIN VerifyCommitment
IN: data, comm, opening
OUT: vf_result
EXEC:
SET vf_result = comm == H(data || opening)
END VerifyCommitment
The Prover Efficiency is now worse that Approach I, since the proving circuit
needs to encode verification of all commitments, i.e., compute a hash
function per message.
To optimise, we can instantiate H using a circuit friendly hash function like
Poseidon.
The verifier may have a slight performance gain over Approach I as he verifies signatures over fixed-size small messages, compared to the arbitrary-sizes messages. Nevertheless, the succinctness property remains sacrificed.
Security Analysis
The protocol rests on the outlined commitment scheme being (1) Hiding: Given
comm, it is intractable to obtain data, and (2) Binding: it is further
intractable to find opening2 != opening that makes VerifyCommitment output
1 for data2' != data.
Hiding is guaranteed by randomness of the chosen opening, and Binding by the
collision resistance the H.
Authenticity. Instead of verifying signatures over the direct data, Victor verifies signatures over commitments and is guaranteed of the authenticity of those commitments. Because the proof guarantees that these commitments are bound to the input data, the authenticity of commitments is transitively carried over to the input data.
Confidentiality. While the protocol is a step up from Approach I since Victor receives the authenticity guarantees without learning the data, Victor still learns a long-term commitment-signature of the data. This makes usage of the same data linkable by Victor across difference interactions with Peggy.
Approach III: Signature Aggregation
We can achieve some improvement on Verifier Efficiency if the data sources use a signature scheme that supports signature aggregation. Instead of verifying a digital signature per message, the Prover can aggregate all the signatures into a single short one which is only what needs to be verified.
BEGIN Prover
IN: dataset[], sigs[], pubkeys[], comms[], openings[]
OUT: comp_output, comms[], aggsig, proof
EXEC:
SET res = f(dataset)
SET aggsig = Aggregate(sigs)
SET proof = BEGIN proof
PRIVATE: dataset[], openings[]
PUBLIC: comms[]
CONSTRAINTS:
comp_outputs == f(dataset)
FOR ALL indices i in dataset:
VerifyCommitment(comms[i], openings[i], dataset[i])
END proof
END Prover
Instead of sending Victor the set of signatures, Peggy sends a short aggregation of those signatures that–together with the commitments–suffice to convince Victor of the authenticity of the data.
Upon receiving comp_output, comms[], aggsig, and proof Victor computes:
BEGIN Verifier
IN: comp_output, comms[], aggsig, pubkeys[], proof
OUT: verification_result
EXEC:
SET proof_result = VerifyProof(comp_outputs, comms, proof)
SET sig_result = VerifyAggregateSignature(pubkeys, comms, aggsig)
SET verification_result = proof_result AND sig_result
END Verifier
Efficiency
The Prover is slightly more expensive than Approach II, since it additionally needs to aggregate the signatures (outside the proving circuit).
Compared to Approach II, Victor needs to verify an aggregate, single signature instead of many. This is not inherently more efficient, as an aggregatable signature scheme like BLS would require the Verifier compute pairing operations, which may be more expensive than verifying many signatures of a standard signature scheme.
Security
This approach has similar security as in approach II.
Conclusion
We reviewed different approaches for generating SNARK proofs over authenticated data and analyzed their computational impact on the Prover and Verifier, as well as their implications on security. Although some had positive impact on Prover Efficiency, they make sacrifices on Verifier Efficiency and/or Secrecy. A correct approach then depends on the functional and security requirements of the use case.
Leave a comment