Hugo Nguyen [ARCHIVE] on Nostr: ๐ Original date posted:2022-09-21 ๐ Original message:Hello Craig, Thank you for ...
๐
Original date posted:2022-09-21
๐ Original message:Hello Craig,
Thank you for putting this proposal together. It is indeed another big
missing piece of the puzzle.
I would like to echo some of the comments already made by others (and you
yourself) on this thread, that this proposal seems to have some inherent
conflicts between the 2 goals it tries to achieve.
> *Allowing users to import and export their labels in a standardized way
ensures that they do not experience lock-in to a particular wallet
application. As a secondary goal, by using common formats this BIP seeks to
make manual or bulk management of labels accessible to users outside of
wallet applications and without specific technical expertise.*
IMHO, the reason these conflicts exist is because the first one is an
engineering requirement, while the second one is a UX / product requirement.
Engineering requirements typically prioritize data integrity,
reliability/robustness and performance. Do we want some sort of error
detection / correction codes? What data format would be the most robust and
least error-prone? Is CSV a good fit or not for this purpose? etc.
UX requirements, on the other hand, typically prioritize convenience and
ease of use.
When we donโt separate these concerns it can backfire and we might end up
with a Frankenstein standard that is the worst of both worlds. That is: not
quite robust in engineering terms, but also not quite user-friendly in
product terms either.
SLIP-132 is one such example. It tries to solve what are inherently
engineering challenges โ how to manage the complexities that arose due to
the evolution of keys and scripts โ by sadly offloading those complexities
onto the end users. The end result is user confusion (what kind of [?]PUB
do I need here?) and a nightmare for engineers to maintain (the
complexities are better managed via a high level language such as Output
Descriptors).
Keeping in this mind, I also think having 2 separate BIPs for this is
better.
Cheers,
Hugo
On Mon, Aug 29, 2022 at 4:26 AM Craig Raw via bitcoin-dev <
bitcoin-dev at lists.linuxfoundation.org> wrote:
> Thanks for your feedback @Ali.
>
> I am attempting to achieve two goals with this proposal, primarily for the
> benefit of wallet users:
>
> Goal #1. Transfer labels between different wallet implementations
> Goal #2. Manage labels in applications outside of Bitcoin wallets (such as
> Excel)
>
> Much of the feedback so far has indicated the tension between these two
> goals - it may be that it is too difficult to achieve both, in which case
> Goal #1 is the most important. That said, I think further exploration is
> still necessary before abandoning Goal #2, because removing it would
> significantly reduce the value of this proposal and mean users need to rely
> on application-specific workarounds.
>
> > it is important that a version byte is defined
> If Goal #2 is to be achieved it's difficult to mandate this, particularly
> if one requires bit flags to be set. Should an importing wallet fail to
> import if the version byte is not present, even if all the data is
> otherwise correct? Although it is difficult to know in advance how a format
> may be extended, it is certainly possible to extend this format with
> additional types where the nature of hashes serve as unique identifiers
> (more on this below).
>
> > Don't mandate the file extension... There is no way to enforce this on
> a BIP level.
> I'm not quite sure what you mean here - for example BIP174, which is
> widely used, states "Binary PSBT files should use the .psbt file
> extension." Also, this contradicts Goal #2 - Excel and Numbers register as
> handlers for .csv, and so make it clear that the file is editable outside
> of a wallet.
>
> > ZIP does not have good performance or compression ratio
> Indeed, but it is very widely available. That said, gzip is supported
> widely too these days. Unfortunately, gzip does not offer encryption (see
> next answer).
>
> > ZIP is an archiving format, that happens to have its own compression
> format.
> I agree this is not ideal. My main reason for choosing ZIP was that it
> supports encryption. It seems to me that without considering encryption, an
> application must create label export files that allow privacy-sensitive
> wallet information to be readable in plain text. Being able to transfer
> labels without risking privacy is IMO valuable. I considered other
> encryption formats such as PGP, but they are much more niche and so again
> contradict Goal #2.
>
> > I don't see the benefit of encrypting addresses and labels together...
> additionally, the password you propose is insecure - anybody with access to
> the wallet can unlock it
> I'm not sure I understand your question, but both wallet addresses and
> wallet labels contain privacy-sensitive information that should be
> protected. Wrt to the password, there is actually a more fundamental
> problem with using the wallet xpub - there is no equivalent for multisig
> wallets. For this reason I'll remove that requirement in future iterations.
>
> > Why the need for input and output formats? There is no difference
> between them on the wallet level, because they are always identified with a
> txid and output index.
> The input refers to the txid and the input index (in the set of vin), so
> the difference is the context in which they are displayed. A wallet will
> not necessarily store the spent outputs for a funding transaction
> containing a UTXO coming into the wallet, but it will contain references to
> the inputs as part of that transaction.
>
> > Another important point is that practically nobody labels inputs or
> outputs
> To the contrary, UTXOs are very frequently labelled, as they link and
> reveal information when spent. Inputs are much less frequently labelled,
> but there is no particular reason to exclude them.
>
> > there is a net benefit for the addresses to be exported in ascending
> order
> Indeed, but it makes achieving Goal #2 much more difficult for marginal
> benefit.
>
> > It's better to mandate that they should always be double-quoted, since
> only wallets will generate label exports anyway.
> Rather I think it's better to mandate RFC4180 is followed, as per
> recommendations in other feedback.
>
> > The importing code is too naive... it should utilize a dedicate item
> type field that unambiguously identifies the item
> It's unclear to me what you mean here. As I've indicated it is currently
> possible to disambiguate between addresses/transactions/etc without the
> need for a 3rd column, but in any case the hash functions used ensure that
> labels will not be associated incorrectly. Even in the unlikely event of
> some future address type being indistinguishable from a txid, it will
> simply not match any txids in the wallet.
>
> Craig
>
>
>
> On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief <ali at notatether.com> wrote:
>
>> Hi Craig,
>>
>> This a really good proposal. I studied your BIP and I have some feedback
>> on some parts of it.
>>
>> > The first line in the file is a header, and should be ignored on import.
>>
>> From past experience and lessons, most notably BIP39, it is important
>> that a version byte is defined somewhere in case someone wants to extend it
>> in the future, currently there is no version byte which someone can
>> increment if somebody wants to extend it. In the unique case of CSV files,
>> you should make the header line mandatory (I see you have already implied
>> this, but you should make it explicit in the BIP), but instead of a line
>> with columns in it, I suggest instead of Reference,Label, you make the
>> format like this:
>>
>> BIP-wallet-labels,<version>
>>
>> Since there are two columns per record, this works out nicely. The first
>> column can be the name of the BIP - BIPxxxx where the x's are numbers, and
>> the second column can be an unsigned 32-bit integer (most significant 8
>> bits reserved for version, the remaining for flags, or perhaps the entirety
>> for version - but I recommend leaving at least some bits for flags, even if
>> they all end up being just "reserved").
>>
>> You should make importing fail if the header line is not exactly as
>> specified - or appropriate, should you decide a different format for the
>> header.
>>
>> > Files exported should use the <tt>.csv</tt> file extension.
>> Don't mandate the file extension (read below for why):
>>
>> > In order to reduce file size while retaining wide accessibility, the CSV
>> > file may be compressed using the ZIP file format, using the
>> <tt>.zip</tt>
>> > file extension.
>> I see three problems with this. The first is more important than the
>> later two because it makes them moot points, but I'll mention them anyway
>> so you get a background of the situation:
>> - The BIP is trying to specify in what file format the export format can
>> be written in onto the filesystem. There is no way to enforce this on a BIP
>> level (besides, Unix operating systems don't even consider the file
>> extension, they use its mimetype). Also specifying this in the BIP will
>> prevent modular "Layer 2" protocols and schemes from encoding the Export
>> labels into another format - for example Base64 or with their own
>> compression algorithm.
>>
>> Now for the two "moot problems":
>> - ZIP does not have good performance or compression ratio, there are
>> better algorithms out there like gzip (which also happens to be more
>> ubiquitous; nearly all websites are serving HTML compressed with gzip
>> compression).
>> - ZIP is an archiving format, that happens to have its own compression
>> format. Archiving format parsers can have serious vulnerabilities in their
>> implementation that can allow malware to swipe private keys and passwords,
>> since the primary target for this BIP is wallets. For example, there was
>> Zip Slip[1] in 2018, which allows for remote code execution. So the malware
>> can even hide in memory until private keys or passwords are written to
>> memory, then send them accros the network. Assuming it's targeting a
>> specific wallet software it's not hard to carry out at all.
>>
>> There's two solutions for all this:
>> 1. The duck-tape solution: Use some compression algorithm like gzip
>> instead of ZIP archive format.
>> 2. The "throw it out and buy a new one" solution: Get rid of the optional
>> compression specs altogether, because users are responsible for supplying
>> the export labels in the first place, so all the compression stuff is
>> redundant and should be left up to the user use if they desire to.
>>
>> I prefer the second solution because it hits the nail at the problem
>> directly instead of putting duck tape on it like the first one.
>>
>> > This <tt>.zip</tt> file may optionally be encrypted using either
>> AES-128 or
>> > AES-256 encryption, which is supported by numerous applications
>> including
>> > Winzip and 7-zip.
>> > The textual representation of the wallet's extended public key (as
>> defined
>> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>> Not specific to AES, but I don't see the benefit of encrypting addresses
>> and labels together. Can you please elaborate why this would be desireable?
>>
>> Like I said though, it's better to leave it up to users to decide how to
>> store their exports, since BIPs can't enforce that anyway (additionally,
>> the password you propose is insecure - anybody with access to the wallet
>> can unlock it, which is not desireable to some users who want their own
>> security).
>>
>> > * Transaction ID (<tt>txid</tt>)
>> > * Address
>> > * Input (rendered as <tt>txid<index</tt>)
>> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>> Why the need for input and output formats? There is no difference between
>> them on the wallet level, because they are always identified with a txid
>> and output index. To distinguish between them and hence write them with the
>> correct format would require a UTXO set and thus access to a full node,
>> otherwise the CSV cannot be verified to be completely well-formed.
>>
>> Another important point is that practically nobody labels inputs or
>> outputs because most people do not know that those things even exist, and
>> the rest don't bother to label them.
>>
>> But the biggest downside to including them is related to the problem of
>> information leaking which you make reference to here:
>> > In both cases, care must be taken when spending to avoid undesirable
>> leaks
>> > of private information.
>> A CSV dump that has inputs/outputs and addresses mixed together can infer
>> the owner of all those items. In fact, A CVS label dump is basically a
>> personal information store so everything in it can be correlated as coming
>> from the same wallet, so it's important that unnecessary types are kept out
>> of the format. People are known to leave files lying around on their
>> computer that they don't need anymore, so these files can find their way
>> via telemetry to surveillence entities. While we can't specify what users
>> can do with their exports, we can control the information leak by
>> preventing certain types of items that we know most users will never use
>> from being exported in the first place.
>>
>> > The order in which these records appear is not defined.
>> Again, since the primary use case for this BIP is wallets, which likely
>> use heirarchical derivation schemes like BIP44, there is a net benefit for
>> the addresses to be exported in ascending order of their `address_type`. It
>> means that wallets can import them in O(n) time as opposed to O(n^2) time
>> spent serially checking in which index the address appears at. Of course,
>> this implies that all addresses up to a certain index have to be exported
>> into the CSV as well, but most wallets I know of like Core, Electrum
>> already store addresses like that.
>>
>> Also if you do this, you will need to group all the transaction records
>> before the address records or vice versa - you can use lexigraphical
>> sorting if you want (ie. Addresses before Transactions). The benefit of
>> this separation of parts is that wallets can split the imported address
>> records from the transaction records internally, and feed them to separate
>> functions which set these labels internally.
>>
>> If you decide on doing it this way, then you need a 3rd column to
>> identify the item type, and also you should quote the label (see below). I
>> strongly recommend using numbers for identification as opposed to character
>> strings, so you don't have to worry about localization or character case
>> issues. There is always one unique number, but there could be multiple
>> strings that reference the same type. This will complicate importing
>> functions.
>>
>> If you insist on include Input and Output types then they can both be
>> specified as <txid>:<index> if you do this change. They won't be used to
>> determine the type anyway.
>>
>> > The fields may be quoted, but this is unnecessary, as the first comma in
>> > the line will always be the delimiter.
>> Don't implement it like that, because that will break CSV parsers which
>> expect a fixed amount of rows in each record (2 in the header, and some
>> rows have >2 rows). It's better to mandate that they should always be
>> double-quoted, since only wallets will generate label exports anyway. If
>> you plan to use headers then the 3rd column can be blank for it (or you can
>> split the version and flags from each other).
>>
>> > ==Importing==
>> >
>> > When importing, a naive algorithm may simply match against any
>> reference,
>> > but it is possible to disambiguate between transactions, addresses,
>> inputs
>> > and outputs.
>> > For example in the following pseudocode:
>> > <pre>
>> > if reference length < 64
>> > Set address label
>> > else if reference length == 64
>> > Set transaction label
>> > else if reference contains '<'
>> > Set input label
>> > else
>> > Set output label
>> > </pre>
>> The importing code is too naive and in its current form will prevent the
>> BIP from getting a number. It is perhaps the single most important part of
>> a BIP. When implementing an importer, it should utilize a dedicate item
>> type field that unambiguously identifies the item. So the naive importer is
>> not good, you need use a 3rd column for that like I explained above, so
>> that the importer becomes robust.
>>
>> In summary (exclamation marks indicate severity - one means low, two
>> means medium, and three means high):
>>
>> 1. Convert the header into a version line with optional flags, otherwise
>> nobody can extend this format without compatibility issues (!)
>> 2. Get rid of the specs related to file compression (!!!)
>> 3. Add a 3rd column for item type (address, transaction etc.) preferably
>> as numeric constants and grouping items of one type after items of another
>> type, or if you insist on strings, then only recognize their Titlecase
>> ASCII versions <spreadsheet software like Excel always tries to titlecase
>> the words> (!!)
>> 4. Require double quotes around the label (or single quotes if you
>> prefer, as long as spreadsheet software doesn't choke on them) (!!)
>> 5. Require sorting the records according to the order they are stored in
>> the wallet implementation. (!)
>> 6. Consider getting rid of Input and Output item types. (!)
>> 7. And last and most importantly, please write a more robust importer
>> algorithm in the example given by the BIP, because code in BIPs are
>> frequently used as references for software. (!!!)
>>
>> I hope you will consider these points in future revisions of your BIP.
>>
>> - Ali
>>
>> [1] https://github.com/snyk/zip-slip-vulnerability
>>
>> On Wed, 24 Aug 2022 11:18:43 +0200, craigraw at gmail.com wrote:
>> > Hi all,
>> >
>> > I would like to propose a BIP that specifies a format for the export and
>> > import of labels from a wallet. While transferring access to funds
>> across
>> > wallet applications has been made simple through standards such as
>> BIP39,
>> > wallet labels remain siloed and difficult to extract despite their
>> value,
>> > particularly in a privacy context.
>> >
>> > The proposed format is a simple two column CSV file, with the reference
>> to
>> > a transaction, address, input or output in the first column, and the
>> label
>> > in the second column. CSV was chosen for its wide accessibility,
>> especially
>> > to users without specific technical expertise. Similarly, the CSV file
>> may
>> > be compressed using the ZIP format, and optionally encrypted using AES.
>> >
>> > The full text of the BIP can be found at
>> >
>> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
>> > and also copied below.
>> >
>> > Feedback is appreciated.
>> >
>> > Thanks,
>> > Craig Raw
>> >
>> > ---
>> >
>> > <pre>
>> > BIP: wallet-labels
>> > Layer: Applications
>> > Title: Wallet Labels Export Format
>> > Author: Craig Raw <craig at sparrowwallet.com>
>> > Comments-Summary: No comments yet.
>> > Comments-URI:
>> > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>> > Status: Draft
>> > Type: Informational
>> > Created: 2022-08-23
>> > License: BSD-2-Clause
>> > </pre>
>> >
>> > ==Abstract==
>> >
>> > This document specifies a format for the export of labels that may be
>> > attached to the transactions, addresses, input and outputs in a wallet.
>> >
>> > ==Copyright==
>> >
>> > This BIP is licensed under the BSD 2-clause license.
>> >
>> > ==Motivation==
>> >
>> > The export and import of funds across different Bitcoin wallet
>> applications
>> > is well defined through standards such as BIP39, BIP32, BIP44 etc.
>> > These standards are well supported and allow users to move easily
>> between
>> > different wallets.
>> > There is, however, no defined standard to transfer any labels the user
>> may
>> > have applied to the transactions, addresses, inputs or outputs in their
>> > wallet.
>> > The UTXO model that Bitcoin uses makes these labels particularly
>> valuable
>> > as they may indicate the source of funds, whether received externally
>> or as
>> > a result of change from a prior transaction.
>> > In both cases, care must be taken when spending to avoid undesirable
>> leaks
>> > of private information.
>> > Labels provide valuable guidance in this regard, and have even become
>> > mandatory when spending in several Bitcoin wallets.
>> > Allowing users to export their labels in a standardized way ensures that
>> > they do not experience lock-in to a particular wallet application.
>> > In addition, by using common formats, this BIP seeks to make manual or
>> bulk
>> > management of labels accessible to users without specific technical
>> > expertise.
>> >
>> > ==Specification==
>> >
>> > In order to make the import and export of labels as widely accessible as
>> > possible, this BIP uses the comma separated values (CSV) format, which
>> is
>> > widely supported by consumer, business, and scientific applications.
>> > Although the technical specification of CSV in RFC4180 is not always
>> > followed, the application of the format in this BIP is simple enough
>> that
>> > compatibility should not present a problem.
>> > Moreover, the simplicity and forgiving nature of CSV (over for example
>> > JSON) lends itself well to bulk label editing using spreadsheet and text
>> > editing tools.
>> >
>> > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
>> > containing one record per line, with records containing two fields
>> > delimited by a comma.
>> > The fields may be quoted, but this is unnecessary, as the first comma in
>> > the line will always be the delimiter.
>> > The first line in the file is a header, and should be ignored on import.
>> > Thereafter, each line represents a record that refers to a label
>> applied in
>> > the wallet.
>> > The order in which these records appear is not defined.
>> >
>> > The first field in the record contains a reference to the transaction,
>> > address, input or output in the wallet.
>> > This is specified as one of the following:
>> > * Transaction ID (<tt>txid</tt>)
>> > * Address
>> > * Input (rendered as <tt>txid<index</tt>)
>> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>> >
>> > The second field contains the label applied to the reference.
>> > Exporting applications may omit records with no labels or labels of zero
>> > length.
>> > Files exported should use the <tt>.csv</tt> file extension.
>> >
>> > In order to reduce file size while retaining wide accessibility, the CSV
>> > file may be compressed using the ZIP file format, using the
>> <tt>.zip</tt>
>> > file extension.
>> > This <tt>.zip</tt> file may optionally be encrypted using either
>> AES-128 or
>> > AES-256 encryption, which is supported by numerous applications
>> including
>> > Winzip and 7-zip.
>> > In order to ensure that weak encryption does not proliferate, importers
>> > following this standard must refuse to import <tt>.zip</tt> files
>> encrypted
>> > with the weaker Zip 2.0 standard.
>> > The textual representation of the wallet's extended public key (as
>> defined
>> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>> >
>> > ==Importing==
>> >
>> > When importing, a naive algorithm may simply match against any
>> reference,
>> > but it is possible to disambiguate between transactions, addresses,
>> inputs
>> > and outputs.
>> > For example in the following pseudocode:
>> > <pre>
>> > if reference length < 64
>> > Set address label
>> > else if reference length == 64
>> > Set transaction label
>> > else if reference contains '<'
>> > Set input label
>> > else
>> > Set output label
>> > </pre>
>> >
>> > Importing applications may truncate labels if necessary.
>> >
>> > ==Test Vectors==
>> >
>> > The following fragment represents a wallet label export:
>> > <pre>
>> > Reference,Label
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
>> > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
>> > (alternative)
>> > </pre>
>> >
>> > ==Reference Implementation==
>> >
>> > TBD
>>
>> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20220920/d85742d6/attachment-0001.html>
๐ Original message:Hello Craig,
Thank you for putting this proposal together. It is indeed another big
missing piece of the puzzle.
I would like to echo some of the comments already made by others (and you
yourself) on this thread, that this proposal seems to have some inherent
conflicts between the 2 goals it tries to achieve.
> *Allowing users to import and export their labels in a standardized way
ensures that they do not experience lock-in to a particular wallet
application. As a secondary goal, by using common formats this BIP seeks to
make manual or bulk management of labels accessible to users outside of
wallet applications and without specific technical expertise.*
IMHO, the reason these conflicts exist is because the first one is an
engineering requirement, while the second one is a UX / product requirement.
Engineering requirements typically prioritize data integrity,
reliability/robustness and performance. Do we want some sort of error
detection / correction codes? What data format would be the most robust and
least error-prone? Is CSV a good fit or not for this purpose? etc.
UX requirements, on the other hand, typically prioritize convenience and
ease of use.
When we donโt separate these concerns it can backfire and we might end up
with a Frankenstein standard that is the worst of both worlds. That is: not
quite robust in engineering terms, but also not quite user-friendly in
product terms either.
SLIP-132 is one such example. It tries to solve what are inherently
engineering challenges โ how to manage the complexities that arose due to
the evolution of keys and scripts โ by sadly offloading those complexities
onto the end users. The end result is user confusion (what kind of [?]PUB
do I need here?) and a nightmare for engineers to maintain (the
complexities are better managed via a high level language such as Output
Descriptors).
Keeping in this mind, I also think having 2 separate BIPs for this is
better.
Cheers,
Hugo
On Mon, Aug 29, 2022 at 4:26 AM Craig Raw via bitcoin-dev <
bitcoin-dev at lists.linuxfoundation.org> wrote:
> Thanks for your feedback @Ali.
>
> I am attempting to achieve two goals with this proposal, primarily for the
> benefit of wallet users:
>
> Goal #1. Transfer labels between different wallet implementations
> Goal #2. Manage labels in applications outside of Bitcoin wallets (such as
> Excel)
>
> Much of the feedback so far has indicated the tension between these two
> goals - it may be that it is too difficult to achieve both, in which case
> Goal #1 is the most important. That said, I think further exploration is
> still necessary before abandoning Goal #2, because removing it would
> significantly reduce the value of this proposal and mean users need to rely
> on application-specific workarounds.
>
> > it is important that a version byte is defined
> If Goal #2 is to be achieved it's difficult to mandate this, particularly
> if one requires bit flags to be set. Should an importing wallet fail to
> import if the version byte is not present, even if all the data is
> otherwise correct? Although it is difficult to know in advance how a format
> may be extended, it is certainly possible to extend this format with
> additional types where the nature of hashes serve as unique identifiers
>
> > Don't mandate the file extension... There is no way to enforce this on
> a BIP level.
> I'm not quite sure what you mean here - for example BIP174, which is
> widely used, states "Binary PSBT files should use the .psbt file
> extension." Also, this contradicts Goal #2 - Excel and Numbers register as
> handlers for .csv, and so make it clear that the file is editable outside
> of a wallet.
>
> > ZIP does not have good performance or compression ratio
> Indeed, but it is very widely available. That said, gzip is supported
> widely too these days. Unfortunately, gzip does not offer encryption (see
> next answer).
>
> > ZIP is an archiving format, that happens to have its own compression
> format.
> I agree this is not ideal. My main reason for choosing ZIP was that it
> supports encryption. It seems to me that without considering encryption, an
> application must create label export files that allow privacy-sensitive
> wallet information to be readable in plain text. Being able to transfer
> labels without risking privacy is IMO valuable. I considered other
> encryption formats such as PGP, but they are much more niche and so again
> contradict Goal #2.
>
> > I don't see the benefit of encrypting addresses and labels together...
> additionally, the password you propose is insecure - anybody with access to
> the wallet can unlock it
> I'm not sure I understand your question, but both wallet addresses and
> wallet labels contain privacy-sensitive information that should be
> protected. Wrt to the password, there is actually a more fundamental
> problem with using the wallet xpub - there is no equivalent for multisig
> wallets. For this reason I'll remove that requirement in future iterations.
>
> > Why the need for input and output formats? There is no difference
> between them on the wallet level, because they are always identified with a
> txid and output index.
> The input refers to the txid and the input index (in the set of vin), so
> the difference is the context in which they are displayed. A wallet will
> not necessarily store the spent outputs for a funding transaction
> containing a UTXO coming into the wallet, but it will contain references to
> the inputs as part of that transaction.
>
> > Another important point is that practically nobody labels inputs or
> outputs
> To the contrary, UTXOs are very frequently labelled, as they link and
> reveal information when spent. Inputs are much less frequently labelled,
> but there is no particular reason to exclude them.
>
> > there is a net benefit for the addresses to be exported in ascending
> order
> Indeed, but it makes achieving Goal #2 much more difficult for marginal
> benefit.
>
> > It's better to mandate that they should always be double-quoted, since
> only wallets will generate label exports anyway.
> Rather I think it's better to mandate RFC4180 is followed, as per
> recommendations in other feedback.
>
> > The importing code is too naive... it should utilize a dedicate item
> type field that unambiguously identifies the item
> It's unclear to me what you mean here. As I've indicated it is currently
> possible to disambiguate between addresses/transactions/etc without the
> need for a 3rd column, but in any case the hash functions used ensure that
> labels will not be associated incorrectly. Even in the unlikely event of
> some future address type being indistinguishable from a txid, it will
> simply not match any txids in the wallet.
>
> Craig
>
>
>
> On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief <ali at notatether.com> wrote:
>
>> Hi Craig,
>>
>> This a really good proposal. I studied your BIP and I have some feedback
>> on some parts of it.
>>
>> > The first line in the file is a header, and should be ignored on import.
>>
>> From past experience and lessons, most notably BIP39, it is important
>> that a version byte is defined somewhere in case someone wants to extend it
>> in the future, currently there is no version byte which someone can
>> increment if somebody wants to extend it. In the unique case of CSV files,
>> you should make the header line mandatory (I see you have already implied
>> this, but you should make it explicit in the BIP), but instead of a line
>> with columns in it, I suggest instead of Reference,Label, you make the
>> format like this:
>>
>> BIP-wallet-labels,<version>
>>
>> Since there are two columns per record, this works out nicely. The first
>> column can be the name of the BIP - BIPxxxx where the x's are numbers, and
>> the second column can be an unsigned 32-bit integer (most significant 8
>> bits reserved for version, the remaining for flags, or perhaps the entirety
>> for version - but I recommend leaving at least some bits for flags, even if
>> they all end up being just "reserved").
>>
>> You should make importing fail if the header line is not exactly as
>> specified - or appropriate, should you decide a different format for the
>> header.
>>
>> > Files exported should use the <tt>.csv</tt> file extension.
>> Don't mandate the file extension (read below for why):
>>
>> > In order to reduce file size while retaining wide accessibility, the CSV
>> > file may be compressed using the ZIP file format, using the
>> <tt>.zip</tt>
>> > file extension.
>> I see three problems with this. The first is more important than the
>> later two because it makes them moot points, but I'll mention them anyway
>> so you get a background of the situation:
>> - The BIP is trying to specify in what file format the export format can
>> be written in onto the filesystem. There is no way to enforce this on a BIP
>> level (besides, Unix operating systems don't even consider the file
>> extension, they use its mimetype). Also specifying this in the BIP will
>> prevent modular "Layer 2" protocols and schemes from encoding the Export
>> labels into another format - for example Base64 or with their own
>> compression algorithm.
>>
>> Now for the two "moot problems":
>> - ZIP does not have good performance or compression ratio, there are
>> better algorithms out there like gzip (which also happens to be more
>> ubiquitous; nearly all websites are serving HTML compressed with gzip
>> compression).
>> - ZIP is an archiving format, that happens to have its own compression
>> format. Archiving format parsers can have serious vulnerabilities in their
>> implementation that can allow malware to swipe private keys and passwords,
>> since the primary target for this BIP is wallets. For example, there was
>> Zip Slip[1] in 2018, which allows for remote code execution. So the malware
>> can even hide in memory until private keys or passwords are written to
>> memory, then send them accros the network. Assuming it's targeting a
>> specific wallet software it's not hard to carry out at all.
>>
>> There's two solutions for all this:
>> 1. The duck-tape solution: Use some compression algorithm like gzip
>> instead of ZIP archive format.
>> 2. The "throw it out and buy a new one" solution: Get rid of the optional
>> compression specs altogether, because users are responsible for supplying
>> the export labels in the first place, so all the compression stuff is
>> redundant and should be left up to the user use if they desire to.
>>
>> I prefer the second solution because it hits the nail at the problem
>> directly instead of putting duck tape on it like the first one.
>>
>> > This <tt>.zip</tt> file may optionally be encrypted using either
>> AES-128 or
>> > AES-256 encryption, which is supported by numerous applications
>> including
>> > Winzip and 7-zip.
>> > The textual representation of the wallet's extended public key (as
>> defined
>> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>> Not specific to AES, but I don't see the benefit of encrypting addresses
>> and labels together. Can you please elaborate why this would be desireable?
>>
>> Like I said though, it's better to leave it up to users to decide how to
>> store their exports, since BIPs can't enforce that anyway (additionally,
>> the password you propose is insecure - anybody with access to the wallet
>> can unlock it, which is not desireable to some users who want their own
>> security).
>>
>> > * Transaction ID (<tt>txid</tt>)
>> > * Address
>> > * Input (rendered as <tt>txid<index</tt>)
>> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>> Why the need for input and output formats? There is no difference between
>> them on the wallet level, because they are always identified with a txid
>> and output index. To distinguish between them and hence write them with the
>> correct format would require a UTXO set and thus access to a full node,
>> otherwise the CSV cannot be verified to be completely well-formed.
>>
>> Another important point is that practically nobody labels inputs or
>> outputs because most people do not know that those things even exist, and
>> the rest don't bother to label them.
>>
>> But the biggest downside to including them is related to the problem of
>> information leaking which you make reference to here:
>> > In both cases, care must be taken when spending to avoid undesirable
>> leaks
>> > of private information.
>> A CSV dump that has inputs/outputs and addresses mixed together can infer
>> the owner of all those items. In fact, A CVS label dump is basically a
>> personal information store so everything in it can be correlated as coming
>> from the same wallet, so it's important that unnecessary types are kept out
>> of the format. People are known to leave files lying around on their
>> computer that they don't need anymore, so these files can find their way
>> via telemetry to surveillence entities. While we can't specify what users
>> can do with their exports, we can control the information leak by
>> preventing certain types of items that we know most users will never use
>> from being exported in the first place.
>>
>> > The order in which these records appear is not defined.
>> Again, since the primary use case for this BIP is wallets, which likely
>> use heirarchical derivation schemes like BIP44, there is a net benefit for
>> the addresses to be exported in ascending order of their `address_type`. It
>> means that wallets can import them in O(n) time as opposed to O(n^2) time
>> spent serially checking in which index the address appears at. Of course,
>> this implies that all addresses up to a certain index have to be exported
>> into the CSV as well, but most wallets I know of like Core, Electrum
>> already store addresses like that.
>>
>> Also if you do this, you will need to group all the transaction records
>> before the address records or vice versa - you can use lexigraphical
>> sorting if you want (ie. Addresses before Transactions). The benefit of
>> this separation of parts is that wallets can split the imported address
>> records from the transaction records internally, and feed them to separate
>> functions which set these labels internally.
>>
>> If you decide on doing it this way, then you need a 3rd column to
>> identify the item type, and also you should quote the label (see below). I
>> strongly recommend using numbers for identification as opposed to character
>> strings, so you don't have to worry about localization or character case
>> issues. There is always one unique number, but there could be multiple
>> strings that reference the same type. This will complicate importing
>> functions.
>>
>> If you insist on include Input and Output types then they can both be
>> specified as <txid>:<index> if you do this change. They won't be used to
>> determine the type anyway.
>>
>> > The fields may be quoted, but this is unnecessary, as the first comma in
>> > the line will always be the delimiter.
>> Don't implement it like that, because that will break CSV parsers which
>> expect a fixed amount of rows in each record (2 in the header, and some
>> rows have >2 rows). It's better to mandate that they should always be
>> double-quoted, since only wallets will generate label exports anyway. If
>> you plan to use headers then the 3rd column can be blank for it (or you can
>> split the version and flags from each other).
>>
>> > ==Importing==
>> >
>> > When importing, a naive algorithm may simply match against any
>> reference,
>> > but it is possible to disambiguate between transactions, addresses,
>> inputs
>> > and outputs.
>> > For example in the following pseudocode:
>> > <pre>
>> > if reference length < 64
>> > Set address label
>> > else if reference length == 64
>> > Set transaction label
>> > else if reference contains '<'
>> > Set input label
>> > else
>> > Set output label
>> > </pre>
>> The importing code is too naive and in its current form will prevent the
>> BIP from getting a number. It is perhaps the single most important part of
>> a BIP. When implementing an importer, it should utilize a dedicate item
>> type field that unambiguously identifies the item. So the naive importer is
>> not good, you need use a 3rd column for that like I explained above, so
>> that the importer becomes robust.
>>
>> In summary (exclamation marks indicate severity - one means low, two
>> means medium, and three means high):
>>
>> 1. Convert the header into a version line with optional flags, otherwise
>> nobody can extend this format without compatibility issues (!)
>> 2. Get rid of the specs related to file compression (!!!)
>> 3. Add a 3rd column for item type (address, transaction etc.) preferably
>> as numeric constants and grouping items of one type after items of another
>> type, or if you insist on strings, then only recognize their Titlecase
>> ASCII versions <spreadsheet software like Excel always tries to titlecase
>> the words> (!!)
>> 4. Require double quotes around the label (or single quotes if you
>> prefer, as long as spreadsheet software doesn't choke on them) (!!)
>> 5. Require sorting the records according to the order they are stored in
>> the wallet implementation. (!)
>> 6. Consider getting rid of Input and Output item types. (!)
>> 7. And last and most importantly, please write a more robust importer
>> algorithm in the example given by the BIP, because code in BIPs are
>> frequently used as references for software. (!!!)
>>
>> I hope you will consider these points in future revisions of your BIP.
>>
>> - Ali
>>
>> [1] https://github.com/snyk/zip-slip-vulnerability
>>
>> On Wed, 24 Aug 2022 11:18:43 +0200, craigraw at gmail.com wrote:
>> > Hi all,
>> >
>> > I would like to propose a BIP that specifies a format for the export and
>> > import of labels from a wallet. While transferring access to funds
>> across
>> > wallet applications has been made simple through standards such as
>> BIP39,
>> > wallet labels remain siloed and difficult to extract despite their
>> value,
>> > particularly in a privacy context.
>> >
>> > The proposed format is a simple two column CSV file, with the reference
>> to
>> > a transaction, address, input or output in the first column, and the
>> label
>> > in the second column. CSV was chosen for its wide accessibility,
>> especially
>> > to users without specific technical expertise. Similarly, the CSV file
>> may
>> > be compressed using the ZIP format, and optionally encrypted using AES.
>> >
>> > The full text of the BIP can be found at
>> >
>> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
>> > and also copied below.
>> >
>> > Feedback is appreciated.
>> >
>> > Thanks,
>> > Craig Raw
>> >
>> > ---
>> >
>> > <pre>
>> > BIP: wallet-labels
>> > Layer: Applications
>> > Title: Wallet Labels Export Format
>> > Author: Craig Raw <craig at sparrowwallet.com>
>> > Comments-Summary: No comments yet.
>> > Comments-URI:
>> > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>> > Status: Draft
>> > Type: Informational
>> > Created: 2022-08-23
>> > License: BSD-2-Clause
>> > </pre>
>> >
>> > ==Abstract==
>> >
>> > This document specifies a format for the export of labels that may be
>> > attached to the transactions, addresses, input and outputs in a wallet.
>> >
>> > ==Copyright==
>> >
>> > This BIP is licensed under the BSD 2-clause license.
>> >
>> > ==Motivation==
>> >
>> > The export and import of funds across different Bitcoin wallet
>> applications
>> > is well defined through standards such as BIP39, BIP32, BIP44 etc.
>> > These standards are well supported and allow users to move easily
>> between
>> > different wallets.
>> > There is, however, no defined standard to transfer any labels the user
>> may
>> > have applied to the transactions, addresses, inputs or outputs in their
>> > wallet.
>> > The UTXO model that Bitcoin uses makes these labels particularly
>> valuable
>> > as they may indicate the source of funds, whether received externally
>> or as
>> > a result of change from a prior transaction.
>> > In both cases, care must be taken when spending to avoid undesirable
>> leaks
>> > of private information.
>> > Labels provide valuable guidance in this regard, and have even become
>> > mandatory when spending in several Bitcoin wallets.
>> > Allowing users to export their labels in a standardized way ensures that
>> > they do not experience lock-in to a particular wallet application.
>> > In addition, by using common formats, this BIP seeks to make manual or
>> bulk
>> > management of labels accessible to users without specific technical
>> > expertise.
>> >
>> > ==Specification==
>> >
>> > In order to make the import and export of labels as widely accessible as
>> > possible, this BIP uses the comma separated values (CSV) format, which
>> is
>> > widely supported by consumer, business, and scientific applications.
>> > Although the technical specification of CSV in RFC4180 is not always
>> > followed, the application of the format in this BIP is simple enough
>> that
>> > compatibility should not present a problem.
>> > Moreover, the simplicity and forgiving nature of CSV (over for example
>> > JSON) lends itself well to bulk label editing using spreadsheet and text
>> > editing tools.
>> >
>> > A CSV export of labels from a wallet must be a UTF-8 encoded text file,
>> > containing one record per line, with records containing two fields
>> > delimited by a comma.
>> > The fields may be quoted, but this is unnecessary, as the first comma in
>> > the line will always be the delimiter.
>> > The first line in the file is a header, and should be ignored on import.
>> > Thereafter, each line represents a record that refers to a label
>> applied in
>> > the wallet.
>> > The order in which these records appear is not defined.
>> >
>> > The first field in the record contains a reference to the transaction,
>> > address, input or output in the wallet.
>> > This is specified as one of the following:
>> > * Transaction ID (<tt>txid</tt>)
>> > * Address
>> > * Input (rendered as <tt>txid<index</tt>)
>> > * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>> >
>> > The second field contains the label applied to the reference.
>> > Exporting applications may omit records with no labels or labels of zero
>> > length.
>> > Files exported should use the <tt>.csv</tt> file extension.
>> >
>> > In order to reduce file size while retaining wide accessibility, the CSV
>> > file may be compressed using the ZIP file format, using the
>> <tt>.zip</tt>
>> > file extension.
>> > This <tt>.zip</tt> file may optionally be encrypted using either
>> AES-128 or
>> > AES-256 encryption, which is supported by numerous applications
>> including
>> > Winzip and 7-zip.
>> > In order to ensure that weak encryption does not proliferate, importers
>> > following this standard must refuse to import <tt>.zip</tt> files
>> encrypted
>> > with the weaker Zip 2.0 standard.
>> > The textual representation of the wallet's extended public key (as
>> defined
>> > by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>> >
>> > ==Importing==
>> >
>> > When importing, a naive algorithm may simply match against any
>> reference,
>> > but it is possible to disambiguate between transactions, addresses,
>> inputs
>> > and outputs.
>> > For example in the following pseudocode:
>> > <pre>
>> > if reference length < 64
>> > Set address label
>> > else if reference length == 64
>> > Set transaction label
>> > else if reference contains '<'
>> > Set input label
>> > else
>> > Set output label
>> > </pre>
>> >
>> > Importing applications may truncate labels if necessary.
>> >
>> > ==Test Vectors==
>> >
>> > The following fragment represents a wallet label export:
>> > <pre>
>> > Reference,Label
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
>> > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output
>> > (alternative)
>> > </pre>
>> >
>> > ==Reference Implementation==
>> >
>> > TBD
>>
>> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20220920/d85742d6/attachment-0001.html>