Elias Rohrer [ARCHIVE] on Nostr: š Original date posted:2023-08-03 šļø Summary of this message: Long-term ...
š
Original date posted:2023-08-03
šļø Summary of this message: Long-term collection of proposed data could potentially re-identify anonymized channel counterparties, raising concerns about privacy and data storage.
š Original message:
Hi Carla + Clara,
I want to prefix this by saying that I'm very familiar with how limiting
the lack of available real-world datasets can be for conducting
significant simulations and empirical experiments on Lightning.
However, it may be noteworthy that long-term collection of the proposed
fields could potentially allow to re-identify the anonymized channel
counterparties based off some heuristics correlating with the public
graph data, especially when datasets from multiple (possibly
neighbouring) collection points will end up being combined.
Subsequently, this might allow to draw further conclusions on
transferred amounts, channel liquidities at particular times, and, as
HTLC settlement/failure timestamps are recorded in nanosecond
resolution, potentially even the payment destination's identity (cf.
[1]).
As surrendering this kind of data therefore requires a good level of
trust in the researchers, it might be helpful (and best practise) if you
could clarify upfront whether you intend to time-box the collection
period, where the data would be stored, and who would have access to it.
From my point of view clearly defining the collection period would also
be mandatory as we don't want to incentivise node operators to collect
and store HTLC data longer-term, especially if it's to this degree of
detail.
Best,
Elias
[1]: https://arxiv.org/pdf/2006.12143.pdf
> ### 1. Collect Anonymized Data
> We're aware that we are dealing with sensitive and private
> information.
> For this reason, we propose defining a common data format so that
> analysis tooling can be built around, so that node operators can run
> the analysis locally if desired. Fields marked with [P] *MUST* be
> randomized if exported to researching teams.
>
> The proposed format is a CSV file with the following fields:
> * version (uint8): set to 1, included to future-proof ourselves
> against the need to change this format.
> * channel_in (uint64)[P]: the short channel ID of the incoming channel
> that forwarded the HLTC.
> * channel_out (uint64)[P]: the short channel ID of the outgoing
> channel that forwarded the HTLC.
> * peer_in (hex string)[P]: the hex encoded pubkey of the remote peer
> for the channel_in.
> * peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer
> for the channel_out.
> * fee_msat(uint64): the fee offered by the HTLC, expressed in msat.
> * outgoing_liquidity (float64): the portion of
> `max_htlc_value_in_flight` that is occupied on channel_out after the
> HTLC has been forwarded.
> * outgoing_slots (float64): the portion of `max_accepted_htlcs` that
> is occupied on channel_out after the HTLC has been forwarded.
> * ts_added_ns (uint64): the unix timestamp that the HTLC was added,
> expressed in nanoseconds.
> * ts_removed_ns (uint64): the unix timestamp that the HLTC was
> removed, expressed in nanoseconds.
> * htlc_settled (bool): set to 0 if the HTLC failed, and 1 if it was
> settled.
> * incoming_endorsed (int16): an integer indicating the endorsement
> status of the incoming HTLC (-1 if not present, otherwise set to the
> value in the incoming endorsement TLV).
> * outgoing_endorsed (int16): an integer indicating the endorsement
> status of the outgoing HTLC (-1 if not set, otherwise set to the
> value set in the outgoing endorsement TLV).
>
> Before we add endorsement signaling and setting via an experimental
> TLV, the last two values here will always be -1. The data is still
> incredibly useful in the meantime, and allows for easy update once the
> TLV is propagated through the network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20230803/17adad5b/attachment.html>
šļø Summary of this message: Long-term collection of proposed data could potentially re-identify anonymized channel counterparties, raising concerns about privacy and data storage.
š Original message:
Hi Carla + Clara,
I want to prefix this by saying that I'm very familiar with how limiting
the lack of available real-world datasets can be for conducting
significant simulations and empirical experiments on Lightning.
However, it may be noteworthy that long-term collection of the proposed
fields could potentially allow to re-identify the anonymized channel
counterparties based off some heuristics correlating with the public
graph data, especially when datasets from multiple (possibly
neighbouring) collection points will end up being combined.
Subsequently, this might allow to draw further conclusions on
transferred amounts, channel liquidities at particular times, and, as
HTLC settlement/failure timestamps are recorded in nanosecond
resolution, potentially even the payment destination's identity (cf.
[1]).
As surrendering this kind of data therefore requires a good level of
trust in the researchers, it might be helpful (and best practise) if you
could clarify upfront whether you intend to time-box the collection
period, where the data would be stored, and who would have access to it.
From my point of view clearly defining the collection period would also
be mandatory as we don't want to incentivise node operators to collect
and store HTLC data longer-term, especially if it's to this degree of
detail.
Best,
Elias
[1]: https://arxiv.org/pdf/2006.12143.pdf
> ### 1. Collect Anonymized Data
> We're aware that we are dealing with sensitive and private
> information.
> For this reason, we propose defining a common data format so that
> analysis tooling can be built around, so that node operators can run
> the analysis locally if desired. Fields marked with [P] *MUST* be
> randomized if exported to researching teams.
>
> The proposed format is a CSV file with the following fields:
> * version (uint8): set to 1, included to future-proof ourselves
> against the need to change this format.
> * channel_in (uint64)[P]: the short channel ID of the incoming channel
> that forwarded the HLTC.
> * channel_out (uint64)[P]: the short channel ID of the outgoing
> channel that forwarded the HTLC.
> * peer_in (hex string)[P]: the hex encoded pubkey of the remote peer
> for the channel_in.
> * peer_out (hex_string)[P]: the hex encoded pubkey of the remote peer
> for the channel_out.
> * fee_msat(uint64): the fee offered by the HTLC, expressed in msat.
> * outgoing_liquidity (float64): the portion of
> `max_htlc_value_in_flight` that is occupied on channel_out after the
> HTLC has been forwarded.
> * outgoing_slots (float64): the portion of `max_accepted_htlcs` that
> is occupied on channel_out after the HTLC has been forwarded.
> * ts_added_ns (uint64): the unix timestamp that the HTLC was added,
> expressed in nanoseconds.
> * ts_removed_ns (uint64): the unix timestamp that the HLTC was
> removed, expressed in nanoseconds.
> * htlc_settled (bool): set to 0 if the HTLC failed, and 1 if it was
> settled.
> * incoming_endorsed (int16): an integer indicating the endorsement
> status of the incoming HTLC (-1 if not present, otherwise set to the
> value in the incoming endorsement TLV).
> * outgoing_endorsed (int16): an integer indicating the endorsement
> status of the outgoing HTLC (-1 if not set, otherwise set to the
> value set in the outgoing endorsement TLV).
>
> Before we add endorsement signaling and setting via an experimental
> TLV, the last two values here will always be -1. The data is still
> incredibly useful in the meantime, and allows for easy update once the
> TLV is propagated through the network.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20230803/17adad5b/attachment.html>