Pieter Wuille [ARCHIVE] on Nostr: 📅 Original date posted:2020-12-05 📝 Original message:> On Wednesday, October 7, ...
📅 Original date posted:2020-12-05
📝 Original message:> On Wednesday, October 7, 2020 5:21 PM, Rusty Russell via bitcoin-dev bitcoin-dev at lists.linuxfoundation.org wrote:
>
> > I propose an alternative to length restrictions suggested by
> > Russell in https://github.com/bitcoin/bips/pull/945: use the
> > https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb variant,
> > unless the first byte is 0.
>
> Hi all,
>
> starting a slight side-thread here.
Another update, and a longer write-up.
Introduction
------------
Bech32's checksum algorithm was designed to be strong against substitution
errors, but it also provides some protection against more general classes of
errors. The final constant M that is XOR'ed into the checksum influences that
protection. BIP173 today uses M=1, but it is now known that this has a
weakness: if the final character is a "p", any number of "q" characters can be
inserted or erased right before it, without invalidating the checksum.
As it was recognized that other constants do not have this issue, the obvious
question is whether this is the only possible type of weakness, and if not, if
there is an optimal constant to use that minimizes the largest number of
weaknesses.
Since my last mail I've realized that it is actually possible to analyse the
behavior of these final constants under a wide variety of error classes
(substitutions, deletions, insertions, swaps, duplications) programatically.
Greg Maxwell and I have used this to perform an exhaustive analysis of certain
error patterns for all 2^30 possible M values, selected a number of criteria
to optimize for, and conclude that we should use as constant:
M = 0x2bc830a3
The code used to do this analysis, as well as the code used to verify some
expected properties of the final result, and more, can be found on
https://gist.github.com/sipa/14c248c288c3880a3b191f978a34508e
See results_final.txt to see how this constant compares with the previously
suggested constants 1, 0x3fffffff, and 0x3fefffff.
Properties
----------
If we define an error as a deletion of one position, a swap of adjacent
positions, a substitution in a specific position with a random character, an
insertion of one random character in a specific position, or a duplication of
the character in a specific position, then this M constant above gives us the
following properties:
* For generic HRPs and errors that don't affect the first 6 data characters,
or alternatively averaged over all HRPs (see details futher):
* Always detected:
* (P) Up to 4 substitution errors (true for any constant).
* (Q) Any valid code with the new constant, fed without modification to
software that uses the old constant 1 (true for any constant).
* Any error pattern has failure to detect probability <= 2^-30:
* (L) Any number of errors restricted to a single window of up to 4
characters.
* (B) Up to two errors restricted to a window of up to 68 characters.
* (D) Any one error made to a valid code with the new constant, and fed to
software that uses the old constant 1
* Most error patterns have probability <= 2^-30:
* (C) Up to two errors in general: out of 23926796 such error patterns,
0.0040% have probability 2^-25.
* (N) Up to three errors restricted to a window of up to 69 characters:
out of 284708444 such patterns, 0.033% have probability 2^-25.
* (O) Up to three errors in general: out of 295744442 such error patterns,
0.034% have probability 2^-25; 0.000065% have probability 2^-20.
* (G) Up to two errors made to a valid code with the new constant, and fed
to software that uses the old constant 1: out of 2831622 such error
patterns, 0.048% have probability 2^-25.
* Specifically for the bc1 HRP, with the BIP173 length restrictions:
* Always detected:
* (R) Up to 4 substitution errors (true for any constant).
* (A) Up to 3 substitution errors made to a valid code with the new
constant, and fed to software that uses the old constant 1.
* Any error pattern has failure to detect probability <= 2^-30:
* (E) Any one error.
* (F) Any one error made to a valid code with the new constant, and fed to
software that uses the old constant 1.
* (H) Up to two errors restricted to a window of 28 characters.
* Most error patterns have probability <= 2^-30:
* (J) Up to two errors in general: out of 455916 such error patterns,
0.039% have probability 2^-25; 0.0053% have 2^-20.
* (K) Any number of errors restricted to a window of 4 characters: out of
5813139 such error patterns, 0.0016% have probability 2^-25.
* (M) Up to three errors: out of 50713466 such error patterns, 0.078% have
probability 2^-25; 0.00063% have 2^-20.
* (I) Up to two errors made to a valid code with the new constant, and fed
to software that uses the old constant 1: out of 610683 such error
patterns, 0.092% have probability 2^-25; 0.00049% have probability
2^-20.
To give an idea of what these probabilities mean, consider the known BIP173
insertion issue. It admits an error pattern of 1 error (insertion in
penultimate position) that has a failure to detect probability of 2^-10:
it requires the final character to be 'p', and the inserted character to be
'q'. Assuming those are both random, we have a chance of 1 in 32*32 to hit it.
Note that the choice of *what* the error pattern is (whether it's insertion,
and where) isn't part of our probabilities: we try to make sure that *every*
pattern behaves well, not just randomly chosen ones, because presumably humans
make some kinds of errors more than others, and we don't know which ones.
All the analyzed patterns above are guaranteed to be detected with probability
2^-20 or better (and most are 2^-30). Of course, if we'd search for even
larger classes of errors, say any 4 independent errors of any type, we would
probably discover patterns with worse probabilities, but at that point the
probability of the pattern itself being hit should be taken into account.
The selection was made based on these same properties:
* Start with the set of all 2^30 constants.
* The generic properties (L), (B), (D), (C), (N), (O), and (G) were selected
for by rejecting all constants that left any worse error patterns (e.g.
all codes for which patterns matching (N) existed with failure probability
above 2^-25 were removed). All these restrictions are as strong as they
can be: making them over longer strings, wider windows, or more errors with
the same restrictions removes all remaining constants. This leaves us with
just 12054 acceptable constants.
* The same was then done for the bc1/BIP173 specific properties (A), (E), (J),
(F), (H), (K), (M), and (I). This reduces the set further to 79 acceptable
constants. The full analysis output for all of these can be found in
output.txt.
* Finally, the constant with the minimal number of worst-probability patterns
was chosen for the generic property (N). The single constant 0x2bc830a3
remains.
* This solution and a few of its expected properties were then validated using
a simple program that makes random errors (see the monte_carlo.py file).
Technical details
-----------------
For the purpose of this analysis, define an "error pattern" as a starting
length (of a valid string consisting of otherwise randomly chosen characters)
combined with a sequence of the following (in this order):
* 0 or more deletions of characters at specific positions (without
constraining what those characters are)
* 0 or more swaps of characters at specific positions with the character
following it
* 0 or more substitutions of characters at specific positions with a uniformly
randomly selected character
* 0 or more insertions of uniformly randomly selected characters at specific
positions
* 0 or more duplications of characters at specific positions (including
duplications of characters inserted/substituted in the previous steps)
Examples:
* "Start with a random valid 58 character string, remove the 17th character,
swap the 11th character with the 12th character, and insert a random
character in the 24th position" is an error pattern.
* "Replace the 17th through 24th characters in a 78 character string with
'aardvark'" is not an error pattern, because substituted characters have to
be random, and can't be specific values.
Given such a pattern, assign variable names to every input character, and to
every inserted/substituted character. For example, the pattern "Start with
a 6 character string, delete the 1st character, swap the 2nd and 3rd
character, and insert a random character between those" would be represented
as [v0 v1 v2 v3 v4 v5] and [v1 v3 v6 v2 v4 v5]. Treat these variables as
elements of GF(32), and write out the equations that both the first and second
list have a valid checksum. Due to the fact that BCH codes are linear, this is
just a linear set of equations over GF(32), and we can use Gaussian
elimination to find the size of the solution space. If the input and output
are the same length, we need to subtract the number of solutions for which the
input and output are exactly the same, which is easy to find with another set
of equations. Now compute the ratio of this number divided by (32^numvars /
32^6), where the 32^6 is due to the precondition that the input string is
valid. This gives us the probability of failure, assuming input and output are
random, apart from the known relation between the two, and the fact that both
are valid.
This technique has an important limitation: it can only reason about randomly-
chosen input strings, and the presence of the HRP and version numbers at the
start violates that assumption. These are not random, and we're forced to
make one of these concessions:
1) Ignore the problem, and treat the HRP as random. This lets us derive
properties that hold over all potential HRPs on average, but will thus fail
to account for the possibility that for a small numbers of potential HRPs
some error patterns may exist that behave worse. For technical reasons,
this averaging makes all constants behave identically for error patterns
that don't change the length of the string. Given that substitution/swap
only errors are already dealt with well due to the BCH design this is
perhaps not too important. One exception is frame-shifting errors (a
deletion in one place compensated with an insertion in another place).
2) Restrict analysis to error patterns that don't affect the first 6 actual
characters. Doing so "masks" the effect of the HRP completely.
3) Do analysis for specific HRPs only, allowing much more accurate statements,
but HRP-specific ones that may not hold for every HRP.
Our final selection primarily optimizes for 1) and 2) as those benefit all
potential uses of the encoding, but do optimize for 3) the "bc1" prefix
specifically (and the BIP173 length restriction) as a tiebreaker.
The code for this can be found under the link above, in const_analysis.cpp.
Cheers,
--
Pieter
📝 Original message:> On Wednesday, October 7, 2020 5:21 PM, Rusty Russell via bitcoin-dev bitcoin-dev at lists.linuxfoundation.org wrote:
>
> > I propose an alternative to length restrictions suggested by
> > Russell in https://github.com/bitcoin/bips/pull/945: use the
> > https://gist.github.com/sipa/a9845b37c1b298a7301c33a04090b2eb variant,
> > unless the first byte is 0.
>
> Hi all,
>
> starting a slight side-thread here.
Another update, and a longer write-up.
Introduction
------------
Bech32's checksum algorithm was designed to be strong against substitution
errors, but it also provides some protection against more general classes of
errors. The final constant M that is XOR'ed into the checksum influences that
protection. BIP173 today uses M=1, but it is now known that this has a
weakness: if the final character is a "p", any number of "q" characters can be
inserted or erased right before it, without invalidating the checksum.
As it was recognized that other constants do not have this issue, the obvious
question is whether this is the only possible type of weakness, and if not, if
there is an optimal constant to use that minimizes the largest number of
weaknesses.
Since my last mail I've realized that it is actually possible to analyse the
behavior of these final constants under a wide variety of error classes
(substitutions, deletions, insertions, swaps, duplications) programatically.
Greg Maxwell and I have used this to perform an exhaustive analysis of certain
error patterns for all 2^30 possible M values, selected a number of criteria
to optimize for, and conclude that we should use as constant:
M = 0x2bc830a3
The code used to do this analysis, as well as the code used to verify some
expected properties of the final result, and more, can be found on
https://gist.github.com/sipa/14c248c288c3880a3b191f978a34508e
See results_final.txt to see how this constant compares with the previously
suggested constants 1, 0x3fffffff, and 0x3fefffff.
Properties
----------
If we define an error as a deletion of one position, a swap of adjacent
positions, a substitution in a specific position with a random character, an
insertion of one random character in a specific position, or a duplication of
the character in a specific position, then this M constant above gives us the
following properties:
* For generic HRPs and errors that don't affect the first 6 data characters,
or alternatively averaged over all HRPs (see details futher):
* Always detected:
* (P) Up to 4 substitution errors (true for any constant).
* (Q) Any valid code with the new constant, fed without modification to
software that uses the old constant 1 (true for any constant).
* Any error pattern has failure to detect probability <= 2^-30:
* (L) Any number of errors restricted to a single window of up to 4
characters.
* (B) Up to two errors restricted to a window of up to 68 characters.
* (D) Any one error made to a valid code with the new constant, and fed to
software that uses the old constant 1
* Most error patterns have probability <= 2^-30:
* (C) Up to two errors in general: out of 23926796 such error patterns,
0.0040% have probability 2^-25.
* (N) Up to three errors restricted to a window of up to 69 characters:
out of 284708444 such patterns, 0.033% have probability 2^-25.
* (O) Up to three errors in general: out of 295744442 such error patterns,
0.034% have probability 2^-25; 0.000065% have probability 2^-20.
* (G) Up to two errors made to a valid code with the new constant, and fed
to software that uses the old constant 1: out of 2831622 such error
patterns, 0.048% have probability 2^-25.
* Specifically for the bc1 HRP, with the BIP173 length restrictions:
* Always detected:
* (R) Up to 4 substitution errors (true for any constant).
* (A) Up to 3 substitution errors made to a valid code with the new
constant, and fed to software that uses the old constant 1.
* Any error pattern has failure to detect probability <= 2^-30:
* (E) Any one error.
* (F) Any one error made to a valid code with the new constant, and fed to
software that uses the old constant 1.
* (H) Up to two errors restricted to a window of 28 characters.
* Most error patterns have probability <= 2^-30:
* (J) Up to two errors in general: out of 455916 such error patterns,
0.039% have probability 2^-25; 0.0053% have 2^-20.
* (K) Any number of errors restricted to a window of 4 characters: out of
5813139 such error patterns, 0.0016% have probability 2^-25.
* (M) Up to three errors: out of 50713466 such error patterns, 0.078% have
probability 2^-25; 0.00063% have 2^-20.
* (I) Up to two errors made to a valid code with the new constant, and fed
to software that uses the old constant 1: out of 610683 such error
patterns, 0.092% have probability 2^-25; 0.00049% have probability
2^-20.
To give an idea of what these probabilities mean, consider the known BIP173
insertion issue. It admits an error pattern of 1 error (insertion in
penultimate position) that has a failure to detect probability of 2^-10:
it requires the final character to be 'p', and the inserted character to be
'q'. Assuming those are both random, we have a chance of 1 in 32*32 to hit it.
Note that the choice of *what* the error pattern is (whether it's insertion,
and where) isn't part of our probabilities: we try to make sure that *every*
pattern behaves well, not just randomly chosen ones, because presumably humans
make some kinds of errors more than others, and we don't know which ones.
All the analyzed patterns above are guaranteed to be detected with probability
2^-20 or better (and most are 2^-30). Of course, if we'd search for even
larger classes of errors, say any 4 independent errors of any type, we would
probably discover patterns with worse probabilities, but at that point the
probability of the pattern itself being hit should be taken into account.
The selection was made based on these same properties:
* Start with the set of all 2^30 constants.
* The generic properties (L), (B), (D), (C), (N), (O), and (G) were selected
for by rejecting all constants that left any worse error patterns (e.g.
all codes for which patterns matching (N) existed with failure probability
above 2^-25 were removed). All these restrictions are as strong as they
can be: making them over longer strings, wider windows, or more errors with
the same restrictions removes all remaining constants. This leaves us with
just 12054 acceptable constants.
* The same was then done for the bc1/BIP173 specific properties (A), (E), (J),
(F), (H), (K), (M), and (I). This reduces the set further to 79 acceptable
constants. The full analysis output for all of these can be found in
output.txt.
* Finally, the constant with the minimal number of worst-probability patterns
was chosen for the generic property (N). The single constant 0x2bc830a3
remains.
* This solution and a few of its expected properties were then validated using
a simple program that makes random errors (see the monte_carlo.py file).
Technical details
-----------------
For the purpose of this analysis, define an "error pattern" as a starting
length (of a valid string consisting of otherwise randomly chosen characters)
combined with a sequence of the following (in this order):
* 0 or more deletions of characters at specific positions (without
constraining what those characters are)
* 0 or more swaps of characters at specific positions with the character
following it
* 0 or more substitutions of characters at specific positions with a uniformly
randomly selected character
* 0 or more insertions of uniformly randomly selected characters at specific
positions
* 0 or more duplications of characters at specific positions (including
duplications of characters inserted/substituted in the previous steps)
Examples:
* "Start with a random valid 58 character string, remove the 17th character,
swap the 11th character with the 12th character, and insert a random
character in the 24th position" is an error pattern.
* "Replace the 17th through 24th characters in a 78 character string with
'aardvark'" is not an error pattern, because substituted characters have to
be random, and can't be specific values.
Given such a pattern, assign variable names to every input character, and to
every inserted/substituted character. For example, the pattern "Start with
a 6 character string, delete the 1st character, swap the 2nd and 3rd
character, and insert a random character between those" would be represented
as [v0 v1 v2 v3 v4 v5] and [v1 v3 v6 v2 v4 v5]. Treat these variables as
elements of GF(32), and write out the equations that both the first and second
list have a valid checksum. Due to the fact that BCH codes are linear, this is
just a linear set of equations over GF(32), and we can use Gaussian
elimination to find the size of the solution space. If the input and output
are the same length, we need to subtract the number of solutions for which the
input and output are exactly the same, which is easy to find with another set
of equations. Now compute the ratio of this number divided by (32^numvars /
32^6), where the 32^6 is due to the precondition that the input string is
valid. This gives us the probability of failure, assuming input and output are
random, apart from the known relation between the two, and the fact that both
are valid.
This technique has an important limitation: it can only reason about randomly-
chosen input strings, and the presence of the HRP and version numbers at the
start violates that assumption. These are not random, and we're forced to
make one of these concessions:
1) Ignore the problem, and treat the HRP as random. This lets us derive
properties that hold over all potential HRPs on average, but will thus fail
to account for the possibility that for a small numbers of potential HRPs
some error patterns may exist that behave worse. For technical reasons,
this averaging makes all constants behave identically for error patterns
that don't change the length of the string. Given that substitution/swap
only errors are already dealt with well due to the BCH design this is
perhaps not too important. One exception is frame-shifting errors (a
deletion in one place compensated with an insertion in another place).
2) Restrict analysis to error patterns that don't affect the first 6 actual
characters. Doing so "masks" the effect of the HRP completely.
3) Do analysis for specific HRPs only, allowing much more accurate statements,
but HRP-specific ones that may not hold for every HRP.
Our final selection primarily optimizes for 1) and 2) as those benefit all
potential uses of the encoding, but do optimize for 3) the "bc1" prefix
specifically (and the BIP173 length restriction) as a tiebreaker.
The code for this can be found under the link above, in const_analysis.cpp.
Cheers,
--
Pieter