Ali Sherief [ARCHIVE] on Nostr: š Original date posted:2022-08-26 š Original message:I think these problems can ...
š
Original date posted:2022-08-26
š Original message:I think these problems can be mitigated if the CSV format is strictly defined, such as how I specified it in my previous message.
In particular, the parser has to recognize only one specific header line that has a version number somewhere, or abort - and I still insist on quoting the labels with double-quote and introducing a 3rd column with specific string or numeric types and then replacing all the special characters in the input/output with ":".
Strictly defining CSV version and consequentially, the fields, and then specifying on what kind of data the import is supposed to fail at will limit the complexity of importers to N different switch cases - where N is the number of circulating versions of the format (for now 1).
- Ali
On Thu, Thu, 25 Aug 2022 13:48:36 +0000, rhavar at protonmail.com wrote:
> > Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard.
>
> I think quite simply: A forgiving format is not appropriate for a standard.
>
> It'd be hard to understate how much extra and pointless effort it creates for everyone, and every implementation ends up creating its own defacto standard for what it produces and accepts. Even doing something as simple as adding an extra column will not be possible in the future because it'll break comparability with previous parsers.
>
> I've literally worked on projects where the csv parser has evolved into scan-ahead to use heuristics to understand "rules" of a csv file, and then do line-by-line heuristics to override those rules in pathological cases. Makes a bit of sense when you're trying to achieve 30 years of backwards compatibility. Doesn't make sense for much else..
>
> If your application users really like csv, then introduce an application-specific import-from-csv and export-to-csv with your own rules.
> -Ryan
>
> ------- Original Message -------
> On Thursday, August 25th, 2022 at 1:59 AM, Craig Raw <craigraw at gmail.com> wrote:
>
> > Thanks for your thoughts Ryan.
> >
> > Without reference to the quality feedback on this proposal, I was aware when submitting it for review that it provides an excellent opportunity for bike shedding. As developers, we have all experienced frustration with data formats. One thing that I did not perhaps make clear enough is that this format is not solely intended for developers, but general users who are probably not well represented on this list.
> >
> > While doing research for this proposal I spoke to several professional users of Sparrow Wallet (who are not developers). They all expressed a desire for the format to integrate with their business processes, which are driven by business tools such as Excel. Labelling provides an important function in UTXO and address management in these scenarios, and needs to be accessible and manageable outside of wallet software.
> >
> > If this is to be achieved, it immediately rules out JSON as a data format. Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard. With respect to your comments on escaping, my expectation would be that developers will be using a mature CSV library rather than handling character escaping themselves. I would rather propose a format that is generally usable, even if occasionally a label is escaped incorrectly.
> >
> > Finally, I'll note that CSV files are already common and uncontroversial in Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many others) already export addresses and/or transactions with their labels as CSV files. This proposal simply attempts to create a standard for importing and exporting all the labels in a wallet.
> >
> > Craig
> >
> > On Wed, Aug 24, 2022 at 9:01 PM <rhavar at protonmail.com> wrote:
> >
> >> I'd strongly suggest not using CSV. Especially for a standard. I've worked with it as an interchange format many a times, and it's always been a clusterfuck.
> >>
> >> Right off the bat, you have stuff like "The fields may be quoted, but this is unnecessary as the first comma in the line will always be the delimiter" which invariably leads to some implementations doing it, some implementations not doing it, and others that are intolerant of the other way.
> >>
> >> And you have also made the classic mistake of not strictly defining escape rules. So everyone will pick their own (e.g. some will \, escape commas, others will not cause it's quoted and escape quotes, and others will assume no escaping is required since its the last column in a csv).
> >>
> >> Over time it morphs into its own mini-monster that introduces so much pain.
> >>
> >> On a similar note, allowing alternatives (like: txid>index vs txid:index) provides no benefit, but creates additional work for implementations (who quite likely only test formats they produce) and future incompatibilities.
> >>
> >> I know everyone loves to hate on it, but really (line-separated?) json is the way to go.
> >>
> >> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?", "label": "wow, such label" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txout": 4, "label": "omg this is so easy to parse" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txin": 0, "label": "wow this is going to be extensible as well" }
> >>
> >> -Ryan
> >>
> >> ------- Original Message -------
> >> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev at lists.linuxfoundation.org> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
> >>>
> >>> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
> >>>
> >>> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
> >>>
> >>> Feedback is appreciated.
> >>>
> >>> Thanks,
> >>> Craig Raw
> >>>
> >>> ---
> >>>
> >>> <pre>
> >>> BIP: wallet-labels
> >>> Layer: Applications
> >>> Title: Wallet Labels Export Format
> >>> Author: Craig Raw <craig at sparrowwallet.com>
> >>> Comments-Summary: No comments yet.
> >>> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> >>> Status: Draft
> >>> Type: Informational
> >>> Created: 2022-08-23
> >>> License: BSD-2-Clause
> >>> </pre>
> >>>
> >>> ==Abstract==
> >>>
> >>> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
> >>>
> >>> ==Copyright==
> >>>
> >>> This BIP is licensed under the BSD 2-clause license.
> >>>
> >>> ==Motivation==
> >>>
> >>> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
> >>> These standards are well supported and allow users to move easily between different wallets.
> >>> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
> >>> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
> >>> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
> >>> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
> >>> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
> >>> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
> >>>
> >>> ==Specification==
> >>>
> >>> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
> >>> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
> >>> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
> >>>
> >>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
> >>> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
> >>> The first line in the file is a header, and should be ignored on import.
> >>> Thereafter, each line represents a record that refers to a label applied in the wallet.
> >>> The order in which these records appear is not defined.
> >>>
> >>> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
> >>> This is specified as one of the following:
> >>> * Transaction ID (<tt>txid</tt>)
> >>> * Address
> >>> * Input (rendered as <tt>txid<index</tt>)
> >>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> >>>
> >>> The second field contains the label applied to the reference.
> >>> Exporting applications may omit records with no labels or labels of zero length.
> >>> Files exported should use the <tt>.csv</tt> file extension.
> >>>
> >>> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
> >>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
> >>> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
> >>> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> >>>
> >>> ==Importing==
> >>>
> >>> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
> >>> For example in the following pseudocode:
> >>> <pre>
> >>> if reference length < 64
> >>> Set address label
> >>> else if reference length == 64
> >>> Set transaction label
> >>> else if reference contains '<'
> >>> Set input label
> >>> else
> >>> Set output label
> >>> </pre>
> >>>
> >>> Importing applications may truncate labels if necessary.
> >>>
> >>> ==Test Vectors==
> >>>
> >>> The following fragment represents a wallet label export:
> >>> <pre>
> >>> Reference,Label
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> >>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output (alternative)
> >>> </pre>
> >>>
> >>> ==Reference Implementation==
> >>>
> >>> TBD
š Original message:I think these problems can be mitigated if the CSV format is strictly defined, such as how I specified it in my previous message.
In particular, the parser has to recognize only one specific header line that has a version number somewhere, or abort - and I still insist on quoting the labels with double-quote and introducing a 3rd column with specific string or numeric types and then replacing all the special characters in the input/output with ":".
Strictly defining CSV version and consequentially, the fields, and then specifying on what kind of data the import is supposed to fail at will limit the complexity of importers to N different switch cases - where N is the number of circulating versions of the format (for now 1).
- Ali
On Thu, Thu, 25 Aug 2022 13:48:36 +0000, rhavar at protonmail.com wrote:
> > Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard.
>
> I think quite simply: A forgiving format is not appropriate for a standard.
>
> It'd be hard to understate how much extra and pointless effort it creates for everyone, and every implementation ends up creating its own defacto standard for what it produces and accepts. Even doing something as simple as adding an extra column will not be possible in the future because it'll break comparability with previous parsers.
>
> I've literally worked on projects where the csv parser has evolved into scan-ahead to use heuristics to understand "rules" of a csv file, and then do line-by-line heuristics to override those rules in pathological cases. Makes a bit of sense when you're trying to achieve 30 years of backwards compatibility. Doesn't make sense for much else..
>
> If your application users really like csv, then introduce an application-specific import-from-csv and export-to-csv with your own rules.
> -Ryan
>
> ------- Original Message -------
> On Thursday, August 25th, 2022 at 1:59 AM, Craig Raw <craigraw at gmail.com> wrote:
>
> > Thanks for your thoughts Ryan.
> >
> > Without reference to the quality feedback on this proposal, I was aware when submitting it for review that it provides an excellent opportunity for bike shedding. As developers, we have all experienced frustration with data formats. One thing that I did not perhaps make clear enough is that this format is not solely intended for developers, but general users who are probably not well represented on this list.
> >
> > While doing research for this proposal I spoke to several professional users of Sparrow Wallet (who are not developers). They all expressed a desire for the format to integrate with their business processes, which are driven by business tools such as Excel. Labelling provides an important function in UTXO and address management in these scenarios, and needs to be accessible and manageable outside of wallet software.
> >
> > If this is to be achieved, it immediately rules out JSON as a data format. Not only is JSON limited to editing only through specific software or text editors, but (in the latter case) it is fragile enough that a single missing character can cause an entire file to fail parsing. CSV is more forgiving in this regard. With respect to your comments on escaping, my expectation would be that developers will be using a mature CSV library rather than handling character escaping themselves. I would rather propose a format that is generally usable, even if occasionally a label is escaped incorrectly.
> >
> > Finally, I'll note that CSV files are already common and uncontroversial in Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many others) already export addresses and/or transactions with their labels as CSV files. This proposal simply attempts to create a standard for importing and exporting all the labels in a wallet.
> >
> > Craig
> >
> > On Wed, Aug 24, 2022 at 9:01 PM <rhavar at protonmail.com> wrote:
> >
> >> I'd strongly suggest not using CSV. Especially for a standard. I've worked with it as an interchange format many a times, and it's always been a clusterfuck.
> >>
> >> Right off the bat, you have stuff like "The fields may be quoted, but this is unnecessary as the first comma in the line will always be the delimiter" which invariably leads to some implementations doing it, some implementations not doing it, and others that are intolerant of the other way.
> >>
> >> And you have also made the classic mistake of not strictly defining escape rules. So everyone will pick their own (e.g. some will \, escape commas, others will not cause it's quoted and escape quotes, and others will assume no escaping is required since its the last column in a csv).
> >>
> >> Over time it morphs into its own mini-monster that introduces so much pain.
> >>
> >> On a similar note, allowing alternatives (like: txid>index vs txid:index) provides no benefit, but creates additional work for implementations (who quite likely only test formats they produce) and future incompatibilities.
> >>
> >> I know everyone loves to hate on it, but really (line-separated?) json is the way to go.
> >>
> >> { "tx": "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?", "label": "wow, such label" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txout": 4, "label": "omg this is so easy to parse" }
> >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", "txin": 0, "label": "wow this is going to be extensible as well" }
> >>
> >> -Ryan
> >>
> >> ------- Original Message -------
> >> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev <bitcoin-dev at lists.linuxfoundation.org> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I would like to propose a BIP that specifies a format for the export and import of labels from a wallet. While transferring access to funds across wallet applications has been made simple through standards such as BIP39, wallet labels remain siloed and difficult to extract despite their value, particularly in a privacy context.
> >>>
> >>> The proposed format is a simple two column CSV file, with the reference to a transaction, address, input or output in the first column, and the label in the second column. CSV was chosen for its wide accessibility, especially to users without specific technical expertise. Similarly, the CSV file may be compressed using the ZIP format, and optionally encrypted using AES.
> >>>
> >>> The full text of the BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki and also copied below.
> >>>
> >>> Feedback is appreciated.
> >>>
> >>> Thanks,
> >>> Craig Raw
> >>>
> >>> ---
> >>>
> >>> <pre>
> >>> BIP: wallet-labels
> >>> Layer: Applications
> >>> Title: Wallet Labels Export Format
> >>> Author: Craig Raw <craig at sparrowwallet.com>
> >>> Comments-Summary: No comments yet.
> >>> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> >>> Status: Draft
> >>> Type: Informational
> >>> Created: 2022-08-23
> >>> License: BSD-2-Clause
> >>> </pre>
> >>>
> >>> ==Abstract==
> >>>
> >>> This document specifies a format for the export of labels that may be attached to the transactions, addresses, input and outputs in a wallet.
> >>>
> >>> ==Copyright==
> >>>
> >>> This BIP is licensed under the BSD 2-clause license.
> >>>
> >>> ==Motivation==
> >>>
> >>> The export and import of funds across different Bitcoin wallet applications is well defined through standards such as BIP39, BIP32, BIP44 etc.
> >>> These standards are well supported and allow users to move easily between different wallets.
> >>> There is, however, no defined standard to transfer any labels the user may have applied to the transactions, addresses, inputs or outputs in their wallet.
> >>> The UTXO model that Bitcoin uses makes these labels particularly valuable as they may indicate the source of funds, whether received externally or as a result of change from a prior transaction.
> >>> In both cases, care must be taken when spending to avoid undesirable leaks of private information.
> >>> Labels provide valuable guidance in this regard, and have even become mandatory when spending in several Bitcoin wallets.
> >>> Allowing users to export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application.
> >>> In addition, by using common formats, this BIP seeks to make manual or bulk management of labels accessible to users without specific technical expertise.
> >>>
> >>> ==Specification==
> >>>
> >>> In order to make the import and export of labels as widely accessible as possible, this BIP uses the comma separated values (CSV) format, which is widely supported by consumer, business, and scientific applications.
> >>> Although the technical specification of CSV in RFC4180 is not always followed, the application of the format in this BIP is simple enough that compatibility should not present a problem.
> >>> Moreover, the simplicity and forgiving nature of CSV (over for example JSON) lends itself well to bulk label editing using spreadsheet and text editing tools.
> >>>
> >>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, containing one record per line, with records containing two fields delimited by a comma.
> >>> The fields may be quoted, but this is unnecessary, as the first comma in the line will always be the delimiter.
> >>> The first line in the file is a header, and should be ignored on import.
> >>> Thereafter, each line represents a record that refers to a label applied in the wallet.
> >>> The order in which these records appear is not defined.
> >>>
> >>> The first field in the record contains a reference to the transaction, address, input or output in the wallet.
> >>> This is specified as one of the following:
> >>> * Transaction ID (<tt>txid</tt>)
> >>> * Address
> >>> * Input (rendered as <tt>txid<index</tt>)
> >>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
> >>>
> >>> The second field contains the label applied to the reference.
> >>> Exporting applications may omit records with no labels or labels of zero length.
> >>> Files exported should use the <tt>.csv</tt> file extension.
> >>>
> >>> In order to reduce file size while retaining wide accessibility, the CSV file may be compressed using the ZIP file format, using the <tt>.zip</tt> file extension.
> >>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 or AES-256 encryption, which is supported by numerous applications including Winzip and 7-zip.
> >>> In order to ensure that weak encryption does not proliferate, importers following this standard must refuse to import <tt>.zip</tt> files encrypted with the weaker Zip 2.0 standard.
> >>> The textual representation of the wallet's extended public key (as defined by BIP32, with an <tt>xpub</tt> header) should be used as the password.
> >>>
> >>> ==Importing==
> >>>
> >>> When importing, a naive algorithm may simply match against any reference, but it is possible to disambiguate between transactions, addresses, inputs and outputs.
> >>> For example in the following pseudocode:
> >>> <pre>
> >>> if reference length < 64
> >>> Set address label
> >>> else if reference length == 64
> >>> Set transaction label
> >>> else if reference contains '<'
> >>> Set input label
> >>> else
> >>> Set output label
> >>> </pre>
> >>>
> >>> Importing applications may truncate labels if necessary.
> >>>
> >>> ==Test Vectors==
> >>>
> >>> The following fragment represents a wallet label export:
> >>> <pre>
> >>> Reference,Label
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction
> >>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output
> >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output (alternative)
> >>> </pre>
> >>>
> >>> ==Reference Implementation==
> >>>
> >>> TBD