What is Nostr?
Carla Kirk-Cohen [ARCHIVE] /
npub17xu…nw36
2023-06-09 13:03:01

Carla Kirk-Cohen [ARCHIVE] on Nostr: đź“… Original date posted:2021-07-28 đź“ť Original message: Hi Rusty, Thanks for ...

đź“… Original date posted:2021-07-28
đź“ť Original message:
Hi Rusty,

Thanks for taking a look!

I like the idea of being able to reference a specific message/field
that your node finds problematic. Definitely makes the job simpler
than trying to define every error evar.

> erroneous_message is the message we're complaining about (including
2-byte type), which may be truncated (but must be at least 2 bytes)

Do you think this could be reduced to just the 2-byte message type?
If the reason for including the whole message is so that the error
recipient can match it to the original message, we could possibly just
include the field that we had a problem with as another TLV? That way
we save on having to resend the whole message, and guarantee that we'd
be delivering the problematic field value (it could be cut off if we
truncate the message).

With this approach, there's a chance that we can't match errors back
to their original message if the same channel sends the same message
with the same incorrect field more than once, but I don't see much
practical need to exactly match error to original message in this case,
since we know which field is broken.

> (We should similarly add this TLV to warnings?)

Sounds like a good idea to me!

> But it's worth noting that with the exception of timeouts, these are
all expressible in form "problem is this message, this field". Perhaps
its worth having a special TLV case for timeouts in the message?

If we're confident that the only error that we're going to need that
doesn't follow the "problem/message" structure is timeouts, then a
`timeout` tlv sounds reasonable. If we're also going to add these
fields to warning messages, I'm wondering whether it could be useful
to define an `error_code` enum, where `timeout` is one of the cases?
Perhaps overkill if we're certain that the only value will ever be
`timeout`.

Putting that all together, an update on your proposed solution:
1. `tlv_stream`: `error_tlvs`
2. types:
1. type: 1 (`erroneous_msgtype`)
2. data:
* [`u16`:`type`]
1. type: 3 (`erroneous_fieldnum`)
2. data:
* [`tu64`:`fieldnum`]
1. type: 5 (`suggested_value`)
2. data:
* [`...*byte`:`value`]
1. type: 7 (`erroneous_value`)
2. data:
* [`...*byte`:`value`]
1. type: 9 (`error_code`)
2. data:
* [`u16`:`code`]

Cheers,
- Carla

On Wed, Jul 7, 2021 at 2:36 AM Rusty Russell <rusty at rustcorp.com.au> wrote:

> Carla Kirk-Cohen <kirkcohenc at gmail.com> writes:
> > Hi all,
>
> Hi Carla,
>
> I apologize for not responding to this earlier, but it was
> raised again in the recent spec meeting
> (
> https://lightningd.github.io/meetings/ln_spec_meeting/2021/ln_spec_meeting.2021-07-05-20.06.log.html
> ).
>
> I love the idea of more specific error codes, BTW!
>
> Feedback interleaved:
>
> > Since we shouldn’t have non-ascii values in the error string itself,
> > this can most easily be achieved by adding TLV fields after the
> > data field. In terms of supporting nodes that have not upgraded,
> > we could either include the error code in the data field to cover
> > our bases, or introduce a feature bit so that we know whether
> > to backfill the data field. This gives upgraded nodes an improved
> > quality of life, while leaving older nodes unaffected.
>
> Older nodes should definitely ignore extra fields; it's in the spec and
> we've relied on this to extend messages in the past, so this part is
> easy.
>
> Technically, all defined types are now assumed to have an optional TLV
> appended, since f068dd0d (Bolt 1: Specify that extensions to existing
> messages must use TLV (#754)).
>
> > While we can’t enumerate every possible error, there are quite
> > a few cases in the spec where we can introduce explicit error
> > codes. For the sake of the skim-readers, I’ve left that list at
> > the end of the email.
> >
> > Taking the example of our node receiving an invalid signature for
> > a htlc, a new error would look like this:
>
> I think this is both too much, and not enough.
>
> Too much:
> - Many of these errors are "your implementation is broken", which is
> really not something actionable by the recipient.
> - A lot of work to fill in all these error cases, which will (because
> they're usually impossible) will be untested and broken.
>
> Not enough:
> - Look at the proposal for channel_types, where you would object to the
> channel_type if you don't like it. This would be grouped under
> "Funding params unacceptable", which is actually 99% of errors at this
> point and does not say what the problem is with specificity.
>
> I took a different approach with onion messages[1], where you (optionally)
> specify the field number, even an optional suggested value:
>
> 1. type: 1 (`erroneous_field`)
> 2. data:
> * [`tu64`:`tlv_fieldnum`]
> 1. type: 3 (`suggested_value`)
> 2. data:
> * [`...*byte`:`value`]
> 1. type: 5 (`error`)
> 2. data:
> * [`...*utf8`:`msg`]
>
> In our case, we need to refer to which message (if any) caused the
> error, and we have non-tlv fields, so it can't simply use the tlv field
> number.
>
> Here's my straw proposal:
>
> 1. `tlv_stream`: `error_tlvs`
> 2. types:
> 1. type: 1 (`erroneous_message`)
> 2. data:
> * [`...*byte`:`message`]
> 1. type: 3 (`erroneous_fieldnum`)
> 2. data:
> * [`tu64`:`fieldnum`]
> 1. type: 5 (`suggested_value`)
> 2. data:
> * [`...*byte`:`value`]
>
> erroneous_message is the message we're complaining about (including
> 2-byte type), which may be truncated (but must be at least 2 bytes).
>
> fieldnum is either the 0-based field number (for fixed fields), or the
> number of fixed fields + the tlv type (for tlv fields).
>
> suggested_value is the optional value if we have an idea if what we
> expected / prefer.
>
> > This new kind of error provides us with an error code that tells us
> >
> > exactly what has gone wrong, and metadata pointing to the htlc
> >
> > with an invalid sig. This information can be logged, or stored in a
> >
> > more permanent error store to help diagnose issues in the future.
> >
> > Right now, the spec is pretty strict on error handling [13], indicating
> >
> > that senders/recipients of errors `MUST` fail the channel referenced
> >
> > in the error.
>
> > This isn’t very practical, and I believe that the majority
> > of the impls don’t abide by this instruction.
>
> This was inevitable eventually, but c-lightning deliberately treated
> errors as fatal for a long time so people would *notice* and *report*
> these issues.
>
> To be fair, *LND* didn't treat them as fatal. As so naturally your
> engineers didn't *think* of them as a big deal (and testing didn't show
> it up), so it would send errors for cases which it clearly didn't want
> to close the channel (e.g. peer too slow to respond!).
>
> Hence this PR, which makes these less fatal, and adds warning support:
>
> https://github.com/lightningnetwork/lightning-rfc/pull/834
>
> (We should similarly add this TLV to warnings?)
>
> > Candidates for error codes:
>
> The vast majority of these are "contact your developer, peer says we did
> something illegal". Which is always nice to have more information
> about, but not critital.
>
> The exceptions are:
>
> > Funding Process:
> >
> > * Funding process timeout [2]
> >
> > * Fees out of range [3]
> >
> > * Funding tx spent [11]
> >
> > * Funding params unacceptable (eg, channel too small)
> ...
> > Channel State Machine:
> >
> > * HTLC timeout [4]
> ...
> > Fee Updates
> >
> > * Update fee to low/high [9]
> ...
>
> And BTW this one is misguided:
>
> > * Feature bit required
>
> If Alice says a feature bit is even (compulsory), and Bob doesn't offer
> it, that's on Bob! Alice's behavior here is undefined: she may do a
> courtesy message to Bob, but she may also *assume* it's agreed.
>
> But it's worth noting that with the exception of timeouts, these are all
> expressable in form "problem is this message, this field". Perhaps it's
> worth having a special TLV case for timeouts in the message?
>
> Thanks for starting this ball rolling!
>
> Cheers,
> Rusty.
>
> [1] https://github.com/lightningnetwork/lightning-rfc/pull/759
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20210728/4b5ba70c/attachment-0001.html>;
Author Public Key
npub17xugd458km0nm8edu8u2efuqmxzft3tmu92j3tyc0fa4gxdk9mkqmanw36