ZmnSCPxj [ARCHIVE] on Nostr: π Original date posted:2022-03-04 π Original message:Good morning aj, > On Sun, ...
π
Original date posted:2022-03-04
π Original message:Good morning aj,
> On Sun, Feb 27, 2022 at 04:34:31PM +0000, ZmnSCPxj via bitcoin-dev wrote:
>
> > In reaction to this, AJ Towns mailed me privately about some of his
> > thoughts on this insane `OP_EVICT` proposal.
> > He observed that we could generalize the `OP_EVICT` opcode by
> > decomposing it into smaller parts, including an operation congruent
> > to the Scheme/Haskell/Scala `map` operation.
>
> At much the same time Zman was thinking about OP_FOLD and in exactly the
> same context, I was wondering what the simplest possible language that
> had some sort of map construction was -- I mean simplest in a "practical
> engineering" sense; I think Simplicity already has the Euclidean/Peano
> "least axioms" sense covered.
>
> The thing that's most appealing to me about bitcoin script as it stands
> (beyond "it works") is that it's really pretty simple in an engineering
> sense: it's just a "forth" like system, where you put byte strings on a
> stack and have a few operators to manipulate them. The alt-stack, and
> supporting "IF" and "CODESEPARATOR" add a little additional complexity,
> but really not very much.
>
> To level-up from that, instead of putting byte strings on a stack, you
> could have some other data structure than a stack -- eg one that allows
> nesting. Simple ones that come to mind are lists of (lists of) byte
> strings, or a binary tree of byte strings [0]. Both those essentially
> give you a lisp-like language -- lisp is obviously all about lists,
> and a binary tree is just made of things or pairs of things, and pairs
> of things are just another way of saying "car" and "cdr".
>
> A particular advantage of lisp-like approaches is that they treat code
> and data exactly the same -- so if we're trying to leave the option open
> for a transaction to supply some unexpected code on the witness stack,
> then lisp handles that really naturally: you were going to include data
> on the stack anyway, and code and data are the same, so you don't have
> to do anything special at all. And while I've never really coded in
> lisp at all, my understanding is that its biggest problems are all about
> doing things efficiently at large scales -- but script's problem space
> is for very small scale things, so there's at least reason to hope that
> any problems lisp might have won't actually show up for this use case.
I heartily endorse LISP --- it has a trivial implementation of `eval` that is easily implementable once you have defined a proper data type in preferred-language-here to represent LISP datums.
Combine it with your idea of committing to a max-number-of-operations (which increases the weight of the transaction) and you may very well have something viable.
(In particular, even though `eval` is traditionally (re-)implemented in LISP itself, the limit on max-number-of-operations means any `eval` implementation within the same language is also forcibly made total.)
Of note is that the supposed "problem at scale" of LISP is, as I understand it, due precisely to its code and data being homoiconic to each other.
This homoiconicity greatly tempts LISP programmers to use macros, i.e. programs that generate other programs from some input syntax.
Homoiconicity means that one can manipulate code just as easily as the data, and thus LISP macros are a trivial extension on the language.
This allows each LISP programmer to just code up a macro to expand common patterns.
However, each LISP programmer then ends up implementing *different*, but *similar* macros from each other.
Unfortunately, programming at scale requires multiple programmers speaking the same language.
Then programming at scale is hampered because each LISP programmer has their own private dialect of LISP (formed from the common LISP language and from their own extensive set of private macros) and intercommunication between them is hindered by the fact that each one speaks their own private dialect.
Some LISP-like languages (e.g. Scheme) have classically targeted a "small" subset of absolutely-necessary operations, and each implementation of the language immediately becomes a new dialect due to having slightly different forms for roughly the same convenience function or macro, and *then* individual programmers build their own private dialect on top.
For Scheme specifically, R7RS has targeted providing a "large" standard as well, as did R6RS (which only *had* a "large" standard), but individual Scheme implementations have not always liked to implement *all* the "large" standard.
Otherwise, every big C program contains a half-assed implementation of half of Common LISP, so ----
> - I don't think execution costing takes into account how much memory
> is used at any one time, just how much was allocated in total; so
> the equivalent of (OP_DUP OP_DROP OP_DUP OP_DROP ..) only has the
> allocations accounted for, with no discount given for the immediate
> freeing, so it gets treated as having the same cost as (OP_DUP
> OP_DUP .. OP_DROP OP_DROP ..). Doing it that way would be a worse
> than how bitcoin script is currently costed, but doing better might
> mean locking in an evaluation method at the consensus level. Seems
> worth looking into, at least.
This may depend on the language that the interpreter is written in.
For example, on a typical GC language, both the N * `OP_DUP OP_DROP` and the N * `OP_DUP` + N * `OP_DROP` will have similar behavior when allocated at the nursery.
Since the GC nursery acts as a large buffer of potential allocations, the amount of work done in both cases would be the same, at least until the number of allocs exceeds the nursery size.
Alternately, the implementation may use immutable byte vectors, in which case `OP_DUP` is just a pointer copy.
Or alternately the implementation may use copy-on-write byte vectors, in which case `OP_DUP` is just a pointer copy plus refcount increment, and `OP_DROP` is just a refcount decrement, and the amount of memory used remains small.
> There's two ways to think about upgradability here; if someday we want
> to add new opcodes to the language -- perhaps something to validate zero
> knowledge proofs or calculate sha3 or use a different ECC curve, or some
> way to support cross-input signature aggregation, or perhaps it's just
> that some snippets are very widely used and we'd like to code them in
> C++ directly so they validate quicker and don't use up as much block
> weight. One approach is to just define a new version of the language
> via the tapleaf version, defining new opcodes however we like.
>
> The other is to use the "softfork" opcode -- chia defines it as:
>
> (softfork cost code)
>
> though I think it would probably be better if it were
>
> (softfork cost version code)
>
> where the idea is that "code" will use the "x" opcode if there's a
> problem, and anyone supporting the "version" softfork can verify that
> there aren't any problems at a cost of "cost". However, whether you
> do or don't support that softfork, as far as the rest of the script is
> concerned, the expression will either fail entirely or evaluate as zero;
> so anyone who doesn't support the softfork can just replace it with zero
> and continue on, treating it as if it had costed "cost" units.
>
> One thing worth noting: "softfork" behaves more like OP_NOP than
> tapscript's OP_SUCCESS -- I think it's just not possible in general to
> have OP_SUCCESS-like behaviour if you're trying to allow accepting code
> from the witness data -- otherwise as soon as you reveal that your script
> does accept arbitrary code supplied by the spender, someone could stick
> in an OP_SUCCESS code, and remove all the restrictions on spending and
> steal your funds.
Oh no `1 RETURN`!
Well, the advantage of chialisp here is that it enables the opcode for a *block* of code, so the opcode *itself* could return arbitrary data, it is just the entire block of code that is restricted to returning `0`.
So it would be something like an `OP_BEGINSOFTFORK .... OP_ENDSOFTFORK` where any reserved opcodes in the middle have arbitrary behavior, the entire block gets a *copy* of the current stack and alt stack, with any changes to the stack / alt stack disappear after the `OP_ENDSOFTFORK`.
Thus, the entire block as a whole would act as an `OP_NOP`.
(OG Bitcoin SCRIPT FTW!)
I think the `softfork` form would have to be a syntax though and not a procedure, as I think you want `cost` to be statically determined, and very likely also `version`.
Regards,
ZmnSCPxj
π Original message:Good morning aj,
> On Sun, Feb 27, 2022 at 04:34:31PM +0000, ZmnSCPxj via bitcoin-dev wrote:
>
> > In reaction to this, AJ Towns mailed me privately about some of his
> > thoughts on this insane `OP_EVICT` proposal.
> > He observed that we could generalize the `OP_EVICT` opcode by
> > decomposing it into smaller parts, including an operation congruent
> > to the Scheme/Haskell/Scala `map` operation.
>
> At much the same time Zman was thinking about OP_FOLD and in exactly the
> same context, I was wondering what the simplest possible language that
> had some sort of map construction was -- I mean simplest in a "practical
> engineering" sense; I think Simplicity already has the Euclidean/Peano
> "least axioms" sense covered.
>
> The thing that's most appealing to me about bitcoin script as it stands
> (beyond "it works") is that it's really pretty simple in an engineering
> sense: it's just a "forth" like system, where you put byte strings on a
> stack and have a few operators to manipulate them. The alt-stack, and
> supporting "IF" and "CODESEPARATOR" add a little additional complexity,
> but really not very much.
>
> To level-up from that, instead of putting byte strings on a stack, you
> could have some other data structure than a stack -- eg one that allows
> nesting. Simple ones that come to mind are lists of (lists of) byte
> strings, or a binary tree of byte strings [0]. Both those essentially
> give you a lisp-like language -- lisp is obviously all about lists,
> and a binary tree is just made of things or pairs of things, and pairs
> of things are just another way of saying "car" and "cdr".
>
> A particular advantage of lisp-like approaches is that they treat code
> and data exactly the same -- so if we're trying to leave the option open
> for a transaction to supply some unexpected code on the witness stack,
> then lisp handles that really naturally: you were going to include data
> on the stack anyway, and code and data are the same, so you don't have
> to do anything special at all. And while I've never really coded in
> lisp at all, my understanding is that its biggest problems are all about
> doing things efficiently at large scales -- but script's problem space
> is for very small scale things, so there's at least reason to hope that
> any problems lisp might have won't actually show up for this use case.
I heartily endorse LISP --- it has a trivial implementation of `eval` that is easily implementable once you have defined a proper data type in preferred-language-here to represent LISP datums.
Combine it with your idea of committing to a max-number-of-operations (which increases the weight of the transaction) and you may very well have something viable.
(In particular, even though `eval` is traditionally (re-)implemented in LISP itself, the limit on max-number-of-operations means any `eval` implementation within the same language is also forcibly made total.)
Of note is that the supposed "problem at scale" of LISP is, as I understand it, due precisely to its code and data being homoiconic to each other.
This homoiconicity greatly tempts LISP programmers to use macros, i.e. programs that generate other programs from some input syntax.
Homoiconicity means that one can manipulate code just as easily as the data, and thus LISP macros are a trivial extension on the language.
This allows each LISP programmer to just code up a macro to expand common patterns.
However, each LISP programmer then ends up implementing *different*, but *similar* macros from each other.
Unfortunately, programming at scale requires multiple programmers speaking the same language.
Then programming at scale is hampered because each LISP programmer has their own private dialect of LISP (formed from the common LISP language and from their own extensive set of private macros) and intercommunication between them is hindered by the fact that each one speaks their own private dialect.
Some LISP-like languages (e.g. Scheme) have classically targeted a "small" subset of absolutely-necessary operations, and each implementation of the language immediately becomes a new dialect due to having slightly different forms for roughly the same convenience function or macro, and *then* individual programmers build their own private dialect on top.
For Scheme specifically, R7RS has targeted providing a "large" standard as well, as did R6RS (which only *had* a "large" standard), but individual Scheme implementations have not always liked to implement *all* the "large" standard.
Otherwise, every big C program contains a half-assed implementation of half of Common LISP, so ----
> - I don't think execution costing takes into account how much memory
> is used at any one time, just how much was allocated in total; so
> the equivalent of (OP_DUP OP_DROP OP_DUP OP_DROP ..) only has the
> allocations accounted for, with no discount given for the immediate
> freeing, so it gets treated as having the same cost as (OP_DUP
> OP_DUP .. OP_DROP OP_DROP ..). Doing it that way would be a worse
> than how bitcoin script is currently costed, but doing better might
> mean locking in an evaluation method at the consensus level. Seems
> worth looking into, at least.
This may depend on the language that the interpreter is written in.
For example, on a typical GC language, both the N * `OP_DUP OP_DROP` and the N * `OP_DUP` + N * `OP_DROP` will have similar behavior when allocated at the nursery.
Since the GC nursery acts as a large buffer of potential allocations, the amount of work done in both cases would be the same, at least until the number of allocs exceeds the nursery size.
Alternately, the implementation may use immutable byte vectors, in which case `OP_DUP` is just a pointer copy.
Or alternately the implementation may use copy-on-write byte vectors, in which case `OP_DUP` is just a pointer copy plus refcount increment, and `OP_DROP` is just a refcount decrement, and the amount of memory used remains small.
> There's two ways to think about upgradability here; if someday we want
> to add new opcodes to the language -- perhaps something to validate zero
> knowledge proofs or calculate sha3 or use a different ECC curve, or some
> way to support cross-input signature aggregation, or perhaps it's just
> that some snippets are very widely used and we'd like to code them in
> C++ directly so they validate quicker and don't use up as much block
> weight. One approach is to just define a new version of the language
> via the tapleaf version, defining new opcodes however we like.
>
> The other is to use the "softfork" opcode -- chia defines it as:
>
> (softfork cost code)
>
> though I think it would probably be better if it were
>
> (softfork cost version code)
>
> where the idea is that "code" will use the "x" opcode if there's a
> problem, and anyone supporting the "version" softfork can verify that
> there aren't any problems at a cost of "cost". However, whether you
> do or don't support that softfork, as far as the rest of the script is
> concerned, the expression will either fail entirely or evaluate as zero;
> so anyone who doesn't support the softfork can just replace it with zero
> and continue on, treating it as if it had costed "cost" units.
>
> One thing worth noting: "softfork" behaves more like OP_NOP than
> tapscript's OP_SUCCESS -- I think it's just not possible in general to
> have OP_SUCCESS-like behaviour if you're trying to allow accepting code
> from the witness data -- otherwise as soon as you reveal that your script
> does accept arbitrary code supplied by the spender, someone could stick
> in an OP_SUCCESS code, and remove all the restrictions on spending and
> steal your funds.
Oh no `1 RETURN`!
Well, the advantage of chialisp here is that it enables the opcode for a *block* of code, so the opcode *itself* could return arbitrary data, it is just the entire block of code that is restricted to returning `0`.
So it would be something like an `OP_BEGINSOFTFORK .... OP_ENDSOFTFORK` where any reserved opcodes in the middle have arbitrary behavior, the entire block gets a *copy* of the current stack and alt stack, with any changes to the stack / alt stack disappear after the `OP_ENDSOFTFORK`.
Thus, the entire block as a whole would act as an `OP_NOP`.
(OG Bitcoin SCRIPT FTW!)
I think the `softfork` form would have to be a syntax though and not a procedure, as I think you want `cost` to be statically determined, and very likely also `version`.
Regards,
ZmnSCPxj