📅 Original date posted:2021-07-06 📝 Original message: Carla Kirk-Cohen ...

📅 Original date posted:2021-07-06
📝 Original message:
Carla Kirk-Cohen <kirkcohenc at gmail.com> writes:
> Hi all,

Hi Carla,

I apologize for not responding to this earlier, but it was
raised again in the recent spec meeting
(https://lightningd.github.io/meetings/ln_spec_meeting/2021/ln_spec_meeting.2021-07-05-20.06.log.html).

I love the idea of more specific error codes, BTW!

Feedback interleaved:

> Since we shouldn’t have non-ascii values in the error string itself,
> this can most easily be achieved by adding TLV fields after the
> data field. In terms of supporting nodes that have not upgraded,
> we could either include the error code in the data field to cover
> our bases, or introduce a feature bit so that we know whether
> to backfill the data field. This gives upgraded nodes an improved
> quality of life, while leaving older nodes unaffected.

Older nodes should definitely ignore extra fields; it's in the spec and
we've relied on this to extend messages in the past, so this part is
easy.

Technically, all defined types are now assumed to have an optional TLV
appended, since f068dd0d (Bolt 1: Specify that extensions to existing
messages must use TLV (#754)).

> While we can’t enumerate every possible error, there are quite
> a few cases in the spec where we can introduce explicit error
> codes. For the sake of the skim-readers, I’ve left that list at
> the end of the email.
>
> Taking the example of our node receiving an invalid signature for
> a htlc, a new error would look like this:

I think this is both too much, and not enough.

Too much:
- Many of these errors are "your implementation is broken", which is
really not something actionable by the recipient.
- A lot of work to fill in all these error cases, which will (because
they're usually impossible) will be untested and broken.

Not enough:
- Look at the proposal for channel_types, where you would object to the
channel_type if you don't like it. This would be grouped under
"Funding params unacceptable", which is actually 99% of errors at this
point and does not say what the problem is with specificity.

I took a different approach with onion messages[1], where you (optionally)
specify the field number, even an optional suggested value:

1. type: 1 (`erroneous_field`)
2. data:
* [`tu64`:`tlv_fieldnum`]
1. type: 3 (`suggested_value`)
2. data:
* [`...*byte`:`value`]
1. type: 5 (`error`)
2. data:
* [`...*utf8`:`msg`]

In our case, we need to refer to which message (if any) caused the
error, and we have non-tlv fields, so it can't simply use the tlv field
number.

Here's my straw proposal:

1. `tlv_stream`: `error_tlvs`
2. types:
1. type: 1 (`erroneous_message`)
2. data:
* [`...*byte`:`message`]
1. type: 3 (`erroneous_fieldnum`)
2. data:
* [`tu64`:`fieldnum`]
1. type: 5 (`suggested_value`)
2. data:
* [`...*byte`:`value`]

erroneous_message is the message we're complaining about (including
2-byte type), which may be truncated (but must be at least 2 bytes).

fieldnum is either the 0-based field number (for fixed fields), or the
number of fixed fields + the tlv type (for tlv fields).

suggested_value is the optional value if we have an idea if what we
expected / prefer.

> This new kind of error provides us with an error code that tells us
>
> exactly what has gone wrong, and metadata pointing to the htlc
>
> with an invalid sig. This information can be logged, or stored in a
>
> more permanent error store to help diagnose issues in the future.
>
> Right now, the spec is pretty strict on error handling [13], indicating
>
> that senders/recipients of errors `MUST` fail the channel referenced
>
> in the error.

> This isn’t very practical, and I believe that the majority
> of the impls don’t abide by this instruction.

This was inevitable eventually, but c-lightning deliberately treated
errors as fatal for a long time so people would *notice* and *report*
these issues.

To be fair, *LND* didn't treat them as fatal. As so naturally your
engineers didn't *think* of them as a big deal (and testing didn't show
it up), so it would send errors for cases which it clearly didn't want
to close the channel (e.g. peer too slow to respond!).

Hence this PR, which makes these less fatal, and adds warning support:

https://github.com/lightningnetwork/lightning-rfc/pull/834

(We should similarly add this TLV to warnings?)

> Candidates for error codes:

The vast majority of these are "contact your developer, peer says we did
something illegal". Which is always nice to have more information
about, but not critital.

The exceptions are:

> Funding Process:
>
> * Funding process timeout [2]
>
> * Fees out of range [3]
>
> * Funding tx spent [11]
>
> * Funding params unacceptable (eg, channel too small)
...
> Channel State Machine:
>
> * HTLC timeout [4]
...
> Fee Updates
>
> * Update fee to low/high [9]
...

And BTW this one is misguided:

> * Feature bit required

If Alice says a feature bit is even (compulsory), and Bob doesn't offer
it, that's on Bob! Alice's behavior here is undefined: she may do a
courtesy message to Bob, but she may also *assume* it's agreed.

But it's worth noting that with the exception of timeouts, these are all
expressable in form "problem is this message, this field". Perhaps it's
worth having a special TLV case for timeouts in the message?

Thanks for starting this ball rolling!

Cheers,
Rusty.

[1] https://github.com/lightningnetwork/lightning-rfc/pull/759

Rusty Russell [ARCHIVE] on Nostr: 📅 Original date posted:2021-07-06 📝 Original message: Carla Kirk-Cohen ...