Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make verify_batch deterministic #256

Merged
merged 12 commits into from
Jan 15, 2023

Conversation

rozbb
Copy link
Contributor

@rozbb rozbb commented Jan 5, 2023

Previously we had batch and batch_deterministic features, which would do a different thing under the hood depending on what you picked. This PR removes all nondeterminism from batch verification, and simply uses Merlin for everything. This is for a few reasons:

  1. Having 2 mutually exclusive flags is a pain point in the API. And splitting them into two functions is similarly annoying.
  2. I couldn't find any arguments for the use of nondeterminism in batch verif. There's the old docs (which this PR deletes), which links to Merlin's reasoning and Trevor Perrin's reasoning, but they are talking about signing, not verifying. It is perfectly sound to use Merlin to apply the Fiat-Shamir transform to this protocol and use its randomness for verification.

I only have two arguments against this PR:

  1. If there is any part of the transcript that we did not properly hash into the context, it could allow an attacker to brute force a transcript that causes verify_batch to succeed on a bad signature. I think this is unlikely, given the transcript hash is done in 3 lines and consumes literally every input to the function in the process. Unfortunately, this claim requires some deep code checking. For example, the signatures that are hashed are actually the result of InternalSignature::into. Is this accidentally two-to-one? The answer is no but it requires you to check how scalars and points are deserialized. I don't think there's a problem here, but I can make the code more obviously correct by hashing in the un-converted input values. Update: I did this
  2. You could brute-force verif so that it succeeds on specific ill-formed pubkey-signature pairs (i.e., the low-order R and A values that pass our validation criteria). However, even a nondeterministic batch function will succeed on these batches with 1/8 probability (technically, (1/8)^n for n such items in a batch). How much do we care if the alternative has such a high likelihood of false positive as well? I think this if anything, this is a good argument to use the batched equation, which multiplies by the cofactor and accepts all of these 100% of the time.

@rozbb rozbb requested a review from tarcieri January 5, 2023 10:27
@rozbb
Copy link
Contributor Author

rozbb commented Jan 5, 2023

Just refactored verify_batch so that it's immediately obvious that it's hashing everything. There are no conversions. Everything is shoved into the transcript either directly or after a SHA-512. Two notes:

  1. This might be slightly less efficient, since we construct a vector of H(R || A || M), use it for the transcript, and then convert those digests into scalars.
  2. I don't think we need to hash the message lengths. Like I mentioned in the comment, H(R || A || M) != H(R' || A' || M'), no matter what lengths for M and M' you allow. Does that sound correct?
  3. I don't think we need to hash the index of transcript items either. If we just put 2n items through the transcript, then the interpretation is unambiguous: it's the hrams followed by the ss. Even if the two message types were the same bytelen (which they aren't), the only colliding inputs would be made by shifting where the hrams end and the ss start. Not only is that impossible because s and hram come from the same vector, but even if they didn't, it would be caught by the length-equality check at the beginning of batch_verify. So we can save a good amount of hashing here.

@pinkforest pinkforest mentioned this pull request Jan 5, 2023
@@ -170,55 +170,19 @@ transactions.
The scalar component of a signature is not the only source of signature
malleability, however. Both the public key used for signature verification and
the group element component of the signature are malleable, as they may contain
a small torsion component as a consquence of the curve25519 group not being of
a small torsion component as a consequence of the curve25519 group not being of
Copy link
Contributor

@pinkforest pinkforest Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do all README changes in #241 pretty pls - Merged all this there :)

pinkforest added a commit to pinkforest/ed25519-dalek that referenced this pull request Jan 5, 2023
@rozbb
Copy link
Contributor Author

rozbb commented Jan 7, 2023

Ready for review @tarcieri. I'm also gonna wait on at least one more set of eyes on the new verify_batch transcript stuff before merging. Gotta be sure.

@@ -160,7 +160,7 @@ impl InternalSignature {
///
/// However, by the time this was standardised, most libraries in use were
/// only checking the most significant three bits. (See also the
/// documentation for `PublicKey.verify_strict`.)
/// documentation for [`crate::VerifyingKey::verify_strict`].)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, guess this was missed in #242. Looks like there's another instance of it on L46

@rozbb
Copy link
Contributor Author

rozbb commented Jan 15, 2023

Just ran benchmarks. Looks like performance is unaffected. Documenting some design changes here:

  1. Previously, we were including message lengths in the transcript hash as well. This is not necessary. We get security through collision-resistance of SHA-512(R || A || M), where R and A are both fixed-length encodings of curve points. If the length of M changes, the digest changes, so there's no issue.
  2. Previously we also hashed the index of each item along with its value. This is unnecessary because we hash everything in the order we receive it, so reordering leads to a different digest.
  3. We now put H(R || A || M) into the transcript before it gets converted into a scalar. This was probably not an issue before, but it's better to feed the full hash in rather than the hash reduced mod p.

@rozbb rozbb merged commit b5dc40b into dalek-cryptography:release/2.0 Jan 15, 2023
@rozbb
Copy link
Contributor Author

rozbb commented Feb 4, 2023

Batch verif code was reviewed by @james-miller-93

Notes from James:

  1. If we plan on supporting multiple basepoints, we should include it in the transcript
  2. Ensure that success on empty input is intended

My conclusion: Since we will never support another basepoint, and success-on-empty is intended, no actions need to be taken.

cc @tarcieri

Thanks James!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants