Pieter Wuille [ARCHIVE] on Nostr: 📅 Original date posted:2020-08-12 📝 Original message:Hello all, The current ...
📅 Original date posted:2020-08-12
📝 Original message:Hello all,
The current BIP340 draft[1] uses two different tiebreakers for conveying the Y coordinate of points: for the R point inside signatures squaredness is used, while for public keys evenness is used. Originally both used squaredness, but it was changed[2] for public keys after observing this results in additional complexity for compatibility with existing systems.
The reason for choosing squaredness as tiebreaker was performance: in non-batch signature validation, the recomputed R point must be verified to have the correct sign, to guarantee consistency with batch validation. Whether the Y coordinate is square can be computed directly in Jacobian coordinates, while determining evenness requires a conversion to affine coordinates first.
This argument of course relies on the assumption that determining whether the Y coordinate is square can be done more efficiently than a conversion to affine coordinates. It appears now that this assumption is incorrect, and the justification for picking the squaredness tiebreaking doesn't really exist. As it comes with other trade-offs (it slows down signing, and is a less conventional choice), it would seem that we should reconsider the option of having the R point use the evenness tiebreaker (like public keys).
It is late in the process, but I feel I owe this explanation so that at least the possibility of changing can be discussed with all information. On the upside, this was discovered in the context of looking into a cool improvement to libsecp256k1[5], which makes things faster in general, but specifically benefits the evenness variant.
# 1. What happened?
Computing squaredness is done through the Jacobi symbol (same inventor, but unrelated to Jacobian coordinates). Computing evenness requires converting points to affine coordinates first, and that needs a modular inverse. The assumption that Jacobi symbols are faster to compute than inverses was based on:
* A (possibly) mistaken belief about the theory: fast algorithms for both Jacobi symbols and inverses are internally based on variants of the same extended GCD algorithm[3]. Since an inverse needs to extract a full big integer out of the transition steps made in the extgcd algorithm, while the Jacobi symbol just extracts a single bit, it had seemed that any advances applicable to one would be applicable to the other, but inverses would always need additional work on top. It appears however that a class of extgcd algorithms exists (LSB based ones) that cannot be used for Jacobi calculations without losing efficiency. Recent developments[4] and a proposed implementation in libsecp256k1[5] by Peter Dettman show that using this, inverses in some cases can in fact be faster than Jacobi symbols.
* A broken benchmark. This belief was incorrectly confirmed by a broken benchmark[6] in libsecp256k1 for the libgmp-based Jacobi symbol calculation and modular inverse. The benchmark was repeatedly testing the same constant input, which apparently was around 2.5x faster than the average speed. It is a variable-time algorithm, so a good variation of inputs matters. This mistake had me (and probably others) convinced for years that Jacobi symbols were amazingly fast, while in reality they were always very close in performance to inverses.
# 2. What is the actual impact of picking evenness instead?
It is hard to make very generic statements here, as BIP340 will hopefully be used for a long time, and hardware advancements and algorithmic improvements may change the balance. That said, performance on current hardware with optimized algorithms is the best approximation we have.
The numbers below give the expected performance change from squareness to evenness, for single BIP340 validation, and for signing. Positive numbers mean evenness is faster. Batch validation is not impacted at all.
In the short term, for block validation in Bitcoin Core, the numbers for master-nogmp are probably the most relevant (as Bitcoin Core uses libsecp256k1 without libgmp, to reduce consensus-critical dependencies). If/when [5] gets merged, safegcd-nogmp will be what matters. On a longer time scale, the gmp numbers may be more relevant, as the Jacobi implementation there is certainly closer to the state of the art.
* i7-7820HQ: (verify) (sign)
- master-nogmp: -0.3% +16.1%
- safegcd-nogmp: +6.7% +17.1%
- master-gmp: +0.6% +7.7%
- safegcd-gmp: +1.6% +8.6%
* Cortex-A53: (verify) (sign)
- master-nogmp: -0.3% +15.7%
- safegcd-nogmp: +7.5% +16.9%
- master-gmp: +0.3% +4.1%
- safegcd-gmp: 0.0% +3.5%
* EPYC 7742: (verify) (sign)
- master-nogmp: -0.3% +16.8%
- safegcd-nogmp: +8.6% +18.4%
- master-gmp: 0.0% +7.4%
- safegcd-gmp: +2.3% +7.8%
In well optimized cryptographic code speedups as large as a couple percent are difficult to come by, so we would usually consider changes of this magnitude relevant. Note however that while the percentages for signing speed are larger, they are not what is unexpected here. The choice for the square tiebreaker was intended to improve verification speed at the cost of signing speed. As it turns out that it doesn't actually benefit verification speed, this is a bad trade-off.
# 3. How big a change is it
* In the BIP:
- Changing both invocations of `has_square_y` to `has_even_y`.
- Changing the `lift_x_square_y` invocation to `lift_x_even_y`.
- Applying the same change to the test vector generation code, and the resulting test vectors.
* In the libsecp256k1:
- An 8-line patch to the proposed BIP340 implementation[7]: see [8]
* In Bitcoin Core:
- Similarly small changes to the Python test reimplementation[9]
* Duplicating these changes in other draft implementations that may already exist.
* Review for all the above.
# 4. Conclusion
We discovered that the justification for using squaredness tiebreakers in BIP340 is based on a misunderstanding, and recent developments show that it may in fact be a somewhat worse choice than the alternative. It is a relatively simple change to address this, but that has be weighed against the impact of changing the standard at this stage.
Thoughts?
# 5. References
[1] https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki#design
[2] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-February/017639.html
[3] https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
[4] https://gcd.cr.yp.to/safegcd-20190413.pdf
[5] https://github.com/bitcoin-core/secp256k1/pull/767
[6] https://github.com/bitcoin-core/secp256k1/pull/797
[7] https://github.com/bitcoin-core/secp256k1/pull/558
[8] https://github.com/sipa/secp256k1/commit/822311ca230a48d2c373f3e48b91b2a59e1371d6
[9] https://github.com/bitcoin/bitcoin/pull/17977
Cheers,
--
Pieter
📝 Original message:Hello all,
The current BIP340 draft[1] uses two different tiebreakers for conveying the Y coordinate of points: for the R point inside signatures squaredness is used, while for public keys evenness is used. Originally both used squaredness, but it was changed[2] for public keys after observing this results in additional complexity for compatibility with existing systems.
The reason for choosing squaredness as tiebreaker was performance: in non-batch signature validation, the recomputed R point must be verified to have the correct sign, to guarantee consistency with batch validation. Whether the Y coordinate is square can be computed directly in Jacobian coordinates, while determining evenness requires a conversion to affine coordinates first.
This argument of course relies on the assumption that determining whether the Y coordinate is square can be done more efficiently than a conversion to affine coordinates. It appears now that this assumption is incorrect, and the justification for picking the squaredness tiebreaking doesn't really exist. As it comes with other trade-offs (it slows down signing, and is a less conventional choice), it would seem that we should reconsider the option of having the R point use the evenness tiebreaker (like public keys).
It is late in the process, but I feel I owe this explanation so that at least the possibility of changing can be discussed with all information. On the upside, this was discovered in the context of looking into a cool improvement to libsecp256k1[5], which makes things faster in general, but specifically benefits the evenness variant.
# 1. What happened?
Computing squaredness is done through the Jacobi symbol (same inventor, but unrelated to Jacobian coordinates). Computing evenness requires converting points to affine coordinates first, and that needs a modular inverse. The assumption that Jacobi symbols are faster to compute than inverses was based on:
* A (possibly) mistaken belief about the theory: fast algorithms for both Jacobi symbols and inverses are internally based on variants of the same extended GCD algorithm[3]. Since an inverse needs to extract a full big integer out of the transition steps made in the extgcd algorithm, while the Jacobi symbol just extracts a single bit, it had seemed that any advances applicable to one would be applicable to the other, but inverses would always need additional work on top. It appears however that a class of extgcd algorithms exists (LSB based ones) that cannot be used for Jacobi calculations without losing efficiency. Recent developments[4] and a proposed implementation in libsecp256k1[5] by Peter Dettman show that using this, inverses in some cases can in fact be faster than Jacobi symbols.
* A broken benchmark. This belief was incorrectly confirmed by a broken benchmark[6] in libsecp256k1 for the libgmp-based Jacobi symbol calculation and modular inverse. The benchmark was repeatedly testing the same constant input, which apparently was around 2.5x faster than the average speed. It is a variable-time algorithm, so a good variation of inputs matters. This mistake had me (and probably others) convinced for years that Jacobi symbols were amazingly fast, while in reality they were always very close in performance to inverses.
# 2. What is the actual impact of picking evenness instead?
It is hard to make very generic statements here, as BIP340 will hopefully be used for a long time, and hardware advancements and algorithmic improvements may change the balance. That said, performance on current hardware with optimized algorithms is the best approximation we have.
The numbers below give the expected performance change from squareness to evenness, for single BIP340 validation, and for signing. Positive numbers mean evenness is faster. Batch validation is not impacted at all.
In the short term, for block validation in Bitcoin Core, the numbers for master-nogmp are probably the most relevant (as Bitcoin Core uses libsecp256k1 without libgmp, to reduce consensus-critical dependencies). If/when [5] gets merged, safegcd-nogmp will be what matters. On a longer time scale, the gmp numbers may be more relevant, as the Jacobi implementation there is certainly closer to the state of the art.
* i7-7820HQ: (verify) (sign)
- master-nogmp: -0.3% +16.1%
- safegcd-nogmp: +6.7% +17.1%
- master-gmp: +0.6% +7.7%
- safegcd-gmp: +1.6% +8.6%
* Cortex-A53: (verify) (sign)
- master-nogmp: -0.3% +15.7%
- safegcd-nogmp: +7.5% +16.9%
- master-gmp: +0.3% +4.1%
- safegcd-gmp: 0.0% +3.5%
* EPYC 7742: (verify) (sign)
- master-nogmp: -0.3% +16.8%
- safegcd-nogmp: +8.6% +18.4%
- master-gmp: 0.0% +7.4%
- safegcd-gmp: +2.3% +7.8%
In well optimized cryptographic code speedups as large as a couple percent are difficult to come by, so we would usually consider changes of this magnitude relevant. Note however that while the percentages for signing speed are larger, they are not what is unexpected here. The choice for the square tiebreaker was intended to improve verification speed at the cost of signing speed. As it turns out that it doesn't actually benefit verification speed, this is a bad trade-off.
# 3. How big a change is it
* In the BIP:
- Changing both invocations of `has_square_y` to `has_even_y`.
- Changing the `lift_x_square_y` invocation to `lift_x_even_y`.
- Applying the same change to the test vector generation code, and the resulting test vectors.
* In the libsecp256k1:
- An 8-line patch to the proposed BIP340 implementation[7]: see [8]
* In Bitcoin Core:
- Similarly small changes to the Python test reimplementation[9]
* Duplicating these changes in other draft implementations that may already exist.
* Review for all the above.
# 4. Conclusion
We discovered that the justification for using squaredness tiebreakers in BIP340 is based on a misunderstanding, and recent developments show that it may in fact be a somewhat worse choice than the alternative. It is a relatively simple change to address this, but that has be weighed against the impact of changing the standard at this stage.
Thoughts?
# 5. References
[1] https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki#design
[2] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2020-February/017639.html
[3] https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
[4] https://gcd.cr.yp.to/safegcd-20190413.pdf
[5] https://github.com/bitcoin-core/secp256k1/pull/767
[6] https://github.com/bitcoin-core/secp256k1/pull/797
[7] https://github.com/bitcoin-core/secp256k1/pull/558
[8] https://github.com/sipa/secp256k1/commit/822311ca230a48d2c373f3e48b91b2a59e1371d6
[9] https://github.com/bitcoin/bitcoin/pull/17977
Cheers,
--
Pieter