[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities
From: |
Philip McGrath |
Subject: |
[bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities. |
Date: |
Fri, 31 Dec 2021 00:22:48 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.1 |
Hi Liliana,
On 12/30/21 11:56, Liliana Marie Prikler wrote:
> Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
>> This commit adds several utility functions for non-destructive
>> transformation of the JSON representation used by (guix build json),
>> particularly for purely functional update of JSON objects. They
>> should
>> eventually be exported, but most are private for now to allow for
>> more
>> experience and consideration before commiting to the API. The design
>> was largely inspired by the 'racket/dict' and 'racket/hash'
>> libraries.
>> Liliana Marie Prikler proposed 'with-atomic-json-file-replacement'.
> Given that this is a fair amount of procedures that you're proposing, I
> think a new file would be appropriate. Perhaps (guix build json-
> utils)? Adding that should IIUC not cause a world rebuild, so we could
> do that on master.
>
I agree that these functions ultimately belong in their own file, and
I'd even had the name (guix build json-utils) in mind.
I put them in (guix build node-build-system) for now because, if they
were in (guix build json-utils), they would have to be exported, at
which point their API would have to be relatively stable, and I didn't
want designing them to block, or to be rushed by, the rest of this
series. Now, maybe consensus on the json-utils will turn out to be the
easy part! But my high-level question is, in your view, do any of the
points I'm about to respond to block this patch series?
On 12/30/21 13:18, Liliana Marie Prikler wrote:
Having argued for these procedures to be moved into their own file in a
separate mail, now it's time to bikeshed stylistic choices.
Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip McGrath:
+(define (jsobject-ref js key failure-result)
+ "Return the value assosciated with KEY in the json object JS. If
KEY is not
+found and FAILURE-RESULT is a procedure, it is called in tail position
with
+zero arguments. Otherwise, FAILURE-RESULT is returned."
+ ;; TODO: `failure-result` should be optional, but should the default
+ ;; `failure-result` be #f (like `assoc-ref`), a thunk raising an
exception,
+ ;; '(@), or something else? Keep it mandatory until we discuss and
decide.
+ (match js
+ (('@ . alist)
+ (match (assoc key alist)
+ (#f
+ (if (procedure? failure-result)
+ (failure-result)
+ failure-result))
+ ((_ . value)
+ value)))))
We can safely replace failure-result by Guile's DEFAULT and leave error
handling to the user.
I don't care whether we call it DEFAULT or FAILURE-RESULT.
I agree that it should not raise an exception by default. My current
thinking is that '(@) would be a good default DEFAULT: it is useful for
the common pattern of traversing or transforming nested objects, and, as
you point out at the end of this email, explicitly typing #f (the other
useful possibility) looks much more like normal Scheme than explicitly
typing '(@).
In my experience with [1] and [2] (the purely-functional dictionary
libraries I use most often), the special case for a procedure as DEFAULT
is useful. I feel less strongly about it because it's relatively easy to
work around for JSON, since you can pick some non-JSON signal value, but
it also seems to have especially little downside for JSON, since (guix
build json) will never have a procedure in a valid JSON object. In
addition to raising exceptions and other control flow, it's useful for
default values that are expensive to produce.
[1]: https://docs.racket-lang.org/reference/hashtables.html
[2]: https://docs.racket-lang.org/reference/dicts.html
+(define (alist-pop alist key)
+ "Return two values: the first pair in ALIST with the given KEY in
its
+'car' (or #f, if no such pair exists) and an assosciation list like
(and
+potentially sharing storage with) ALIST, but with no entry for KEY."
+ (match (assoc key alist)
+ ;; If key isn't present, we don't need to do any allocation
+ (#f
+ (values #f alist))
+ (found
+ (values found
+ ;; Because we have `found`, we can find it more
+ ;; efficiently this time with `eq?`. We avoid using
+ ;; `delq` because it would copy pairs in a shared
+ ;; tail. We assume a sufficiently smart compiler to
+ ;; handle "tail recursion modulo cons" (vid. e.g. Indiana
+ ;; University Technical Report No. 19, Friedman & Wise
+ ;; 1975) at least as efficiently as a hand-written
+ ;; tail-recursive implementation with an accumulator.
+ (let loop ((alist alist))
+ (match alist
+ ;; We know that `found` is present,
+ ;; so no need to check for '()
+ ((this . alist)
+ (if (eq? this found)
+ alist
+ (cons this (loop alist))))))))))
I think this can be more efficiently be done in a "single" loop.
(let loop ((rest alist)
(previous '()))
(match rest
(() (values #f alist))
((first . rest)
(if (eq? (car first) key)
(values first (reverse! previous rest))
(loop rest (cons first previous))))))
I'll admit to a Racket bias, but, having just eliminated the use of
'assoc-set!', I'm loathe to start mutating pairs (even correctly). To
quote a bit from the SRFI-1 spec for 'append-reverse!', "note that this
pattern of iterative computation followed by a reverse can frequently be
rewritten as a recursion, dispensing with the reverse and append-reverse
steps, and shifting temporary, intermediate storage from the heap to the
stack, which is typically a win for reasons of cache locality and eager
storage reclamation." (See how 'set-cdr!' can crash safe Chez Scheme!
<https://github.com/cisco/ChezScheme/issues/599>)
IIUC, using SRFI-1's 'span' would lead to the same situation.
Also, I don't think your version is tail-recursive. (loop alist) is
not in tail position from what I can tell.
Yes, "tail recursion modulo cons" refers to a compiler optimization for
functions which are _not_ tail recursive. For full details, see the
Friedman & Wise 1975 tech report I cited at
<https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various
other articles), but, as briefly as I can: The optimization rests on the
observation that many recursive functions, like the classic definition
of 'map':
(define (map f lst)
(match lst
(()
'())
((this . lst)
(cons (f this)
(map f lst)))))
are nearly tail-recursive, and the only real work remaining to be done
in the continuation of the recursive call is to fill in the cdr of the
pair. Thus, a compiler can safely transform this code into a truly
tail-recursive implementation:
(define (map f lst)
(match lst
(()
'())
((this . lst)
(define ret (list (f this)))
(let loop ((dest ret)
(lst lst))
(match lst
((this . lst)
(define new (list (f this)))
(set-cdr! dest new)
(loop new lst))
(()
ret))))))
Unlike the Proper Implementation of Tail Calls (so-called "tail-call
optimization"), handling "tail recursion modulo cons" truly is an
optimization: it does not change the space complexity of the function.
But it can allow the compiler to generate whatever code it thinks will
work best with its collector/allocator and continuation/"call stack"
implementation.
(The optimizations applies to constructors in general, not just 'cons',
and a compiler can safely apply it to values that are immutable from the
perspective of the source language.)
+;; Sadly, Guile's implementation of (@ (srfi srfi-1) alist-delete)
+;; performs unnecessary allocation, e.g. this currently evaluates to
#f:
+;;
+;; (let ((alist `(("a" . 1)("b" . 2)("c" . 3))))
+;; (eq? alist (alist-delete "x" alist)))
+;;
+;; These functions generally choose to allocate a new outer pair
(with the '@
+;; tag), even though in unusual cases the resulting object might not
have
+;; changed, for the sake of simplicity and to avoid retaining a
reference to
+;; the original alist longer than necessary. But that is O(1)
allocation that
+;; could only rarely be avoided: `alist-delete` would allocate O(n)
pairs,
+;; which would only be necessary in the worst case.
+(define (alist-delete* alist key)
+ "Return an assosciation list like (and potentially sharing storage
with)
+ALIST, but with no entry for KEY."
+ (define-values (_popped remaining)
+ (alist-pop alist key))
+ remaining)
That's a pretty long comment around something that could be done with
call-with-values or SRFI-71 let. I think one of these two should be
preferred.
Note that both our versions of alist-pop only pop the first key (as
they should). This means that alist-delete* should really be called
alist-delete-1 as in "remove the first pair in ALIST belonging to KEY".
For the larger JSON handling below, this makes no difference however.
Here I was using '*' to mean "a slightly altered version of", as with
'letrec' and 'letrec*', but, yes, since the other functions defined here
use '*' to mean "zero or more times", the name is confusing: I think I'd
just call it 'alist-delete' and not import (srfi srfi-1)'s version.
The comment may be unnecessarily long ... the essence of what I was
trying to explain is that, in all of these implementations, I've tried
to avoid unnecessary allocation. Being able to rely on 'alist-delete',
and more generally 'alist-pop', not to needlessly copy the "spine" of
the list lets later functions use them unconditionally.
Why would you prefer 'call-with-values' or SRFI-71 over 'define-values'?
The style guide against which I'm used to working [3] generally prefers
internal definitions, to avoid rightward drift.
[3]:
https://docs.racket-lang.org/style/Choosing_the_Right_Construct.html#%28part._.Definitions%29
+(define (alist-set alist key value)
+ "Return an assosciation list like ALIST, but with KEY mapped to
VALUE,
+replacing any existing mapping for KEY."
+ (acons key value (alist-delete* alist key)))
Is order relevant here? Because we could just as well reimplement our
alist-delete* loop and cons the replacement onto the rest. WDYT?
Relying on order for JSON objects is non-interoperable, per RFC 8259 §
4. I'm not intending for these alist procedures to be exported, so I'm
not trying to handle any more general case than that, as I explain in
the comments at the top of the file.
I'm not sure what the advantage would be to reimplementing the
'alist-delete' loop here.
+(define (jsobject-set js key value)
+ "Return a json object like JS, but with KEY mapped to VALUE,
replacing any
+existing mapping for KEY."
+ (cons '@ (match js
+ (('@ . alist)
+ (alist-set alist key value)))))
I think it'd be wiser to put the cons inside the match.
I don't care very much either way.
+(define jsobject-set*
+ (case-lambda
+ "Return a json object like JS, but functionally extended by
mapping each
+KEY to each VALUE, replacing any existing mapping for each KEY. The
update
+takes place from left to right, so later mappings overwrite earlier
mappings
+for the same KEY."
+ ((js)
+ js)
+ ((js key value)
+ (jsobject-set js key value))
+ ((js . args)
+ (cons '@ (match js
+ (('@ . alist)
+ (let loop ((alist alist)
+ (args args))
+ (match args
+ (()
+ alist)
+ ((key value . args)
+ (loop (alist-set alist key value)
+ args))))))))))
I'm not sure if I like this "syntax". I think I'd prefer
(jsobject-set* obj (FIELD1 VALUE1) (FIELD2 VALUE2) ...)
with FIELD1, FIELD2 being identifiers
WDYT?
So you would make 'jsobject-set*' a macro? When you say, "with FIELD1,
FIELD2 being identifiers", do you mean that the macro should convert
them to strings at compile-time? While, if I were designing a JSON
representation, I'd much prefer to use symbols for the object keys, I
think it would be confusing to use strings everywhere else but magic
symbols here.
I based this function on 'hash-set*' [4], 'dict-set*' [5], and numerous
similar functions in the Racket world, so I have a high degree of
confidence in the usability of the interface.
[4]:
https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._hash-set%2A%29%29
[5]:
https://docs.racket-lang.org/reference/dicts.html#%28def._%28%28lib._racket%2Fdict..rkt%29._dict-set%2A%29%29
+(define (alist-update alist key failure-result updater)
+ "Return an assosciation list like ALIST, but with KEY mapped to
the result
+of applying UPDATER to the value to which KEY is mapped in ALIST.
When ALIST
+does not have an existing mapping for KEY, FAILURE-RESULT is used as
with
+'jsobject-ref' to obtain the argument for UPDATER."
+ ;; Often, `updater` will be a lambda expression, so making it the
last
+ ;; argument may help to makes the code legible, and the most
likely
+ ;; `failure-result` arguments are all shorter than the keyword
+ ;; `#:failure-result`. Plus, making `failure-result` mandatory
helps make
+ ;; `alist-update` consistent with `alist-update*`.
Which alist-update* are you referring to here? Either way, the
failure-result to default argument from above applies, but we could
keyword it.
Ah, I guess read that as, "Plus, making 'default' mandatory helps make
'jsobject-update' consistent with 'jsobject-update*'."
+ (define-values (popped tail-alist)
+ (alist-pop alist key))
+ (acons key
+ (updater (match popped
+ (#f
+ (if (procedure? failure-result)
+ (failure-result)
+ failure-result))
+ ((_ . value)
+ value)))
+ tail-alist))
SRFI-71 let says hi. Also the ordering question applies. I'm starting
to think we should implement alist-pop, alist-set and alist-update in
terms of a single more powerful function producing three values (or
SRFI-1 span).
Same question again re 'define-values'.
My intent in creating 'alist-pop' was to have a primitive that would
work for both 'alist-update' and 'alist-delete', and thereby
'alist-set'. Returning the prefix and the tail separately would involve
either extra allocation or mutating pairs. Since order never matters in
this context, why pay that price?
+(define* (jsobject-union #:key
+ (combine (lambda (a b) b))
+ (combine/key (lambda (k a b) (combine a
b)))
+ #:rest json-objects)
+ "Combine the given JSON-OBJECTS into a single json object. The
JSON-OBJECTS
+are merged from left to right by adding each key/value pair of each
object to
+the aggregate object in turn. When one of the JSON-OBJECTS contains
a mapping
+from some key KEY to a value VAL such that the aggregate object
already
+contains a mapping from KEY to a value VAL0, the aggregate object is
+functionally updated to instead map KEY to the value of (COMBINE/KEY
KEY VAL0
+VAL). The default COMBINE/KEY tail-calls (COMBINE VAL0 VAL), and
the default
+COMBINE simply returns its second argument, so, by default, mappings
in later
+JSON-OBJECTS supersede those in earlier ones."
+ (match (filter (lambda (v)
+ (not (or (keyword? v)
+ (procedure? v))))
+ json-objects)
+ (()
+ '(@))
+ (((and js0 ('@ . _)))
+ js0)
+ ((('@ . alist0) ('@ . alist*) ...)
+ (cons '@ (fold (lambda (alist1 alist0)
+ (if (null? alist0)
+ alist1
+ (fold (lambda (k+v alist0)
+ (match k+v
+ ((k . v)
+ (define-values (popped tail-
alist)
+ (alist-pop alist0 k))
+ (match popped
+ (#f
+ (cons k+v tail-alist))
+ ((_ . v0)
+ (acons k
+ (combine/key k v0 v)
+ tail-alist))))))
+ alist0
+ alist1)))
+ alist0
+ alist*)))))
Same default argument. Cons inside.
I think having a single combine function taking (k a b) would be less
confusing than having two. Is there a rationale for the form you
chose?
I based this function in particular on 'hash-union' from 'racket/hash'
[6], which uses these keywords. (But in 'hash-union', collisions trigger
an exception by default, and it requires at least one argument, because
otherwise it would be unclear what key-comparison function the result
should use.)
Having '#:combine' in addition to '#:combine/key' is ultimately just a
convenience, but it is quite a nice convenience in practice. It is quite
rare, in my experience, for a '#:combine' function to actually depend on
the key: it might depend on the type of the value, but, very often, it
unconditionally applies to all keys. Using '#:combine' is particularly
nice when using a combination function that already exists, like
'append' or '+'.
[6]:
https://docs.racket-lang.org/reference/hashtables.html#%28def._%28%28lib._racket%2Fhash..rkt%29._hash-union%29%29
+ (with-atomic-json-file-replacement "package.json"
+ (lambda (pkg-meta)
+ (jsobject-update*
+ pkg-meta
+ "devDependencies" '(@) resolve-dependencies
+ "dependencies" '(@) (lambda (deps)
+ (resolve-dependencies
+ (jsobject-union
+ (jsobject-ref pkg-meta
"peerDependencies" '(@))
+ deps))))))
#t)
We should probably add a function to our js utils that "generates an
empty object", because '(@) is quite confusing to see in these
circumstances. Otherwise LGTM with the aforementioned caveats.
I'm not sure what to call it: it would have to be short, or people (me,
at least) might end up writing '(@) anyway. Also, IIUC Guile doesn't
actually prevent you from mutating quoted constant pairs, so I function
would have to allocate a fresh pair to be robust.
It's a somewhat odd idea, but how about this?
(define-syntax |{}| (identifier-syntax '(@)))
It's as short as '(@), it looks like the JSON notation for the empty
object, and IIUC people could only use it to mess up occurrences of '(@)
within the same compilation unit, which we can't stop them from doing
anyway.
Alternatively, if we give up the thunk special case for 'default'
values, we could do:
(define jsobject-update
(case-lambda
((js key updater)
(jsobject-update js key '(@) updater))
((js key default updater)
...)))
(define jsobject-update*
(case-lambda
...
((js . args)
(match js
(('@ . alist)
(cons '@ (let loop ((alist alist)
(args args))
(match args
(()
alist)
((key (? procedure? updater) . args)
(loop (alist-update alist key '(@) updater)
args))
((key default updater . args)
(loop (alist-update alist key '(@) updater)
args))))))))))
-Philip
- [bug#51838] [PATCH v5 07/45] guix: node-build-system: Add #:absent-dependencies argument., (continued)
- [bug#51838] [PATCH v5 07/45] guix: node-build-system: Add #:absent-dependencies argument., Philip McGrath, 2021/12/23
- [bug#51838] [PATCH v5 07/45] guix: node-build-system: Add #:absent-dependencies argument., Liliana Marie Prikler, 2021/12/23
- [bug#51838] [PATCH v6 00/41] guix: node-build-system: Support compiling add-ons with node-gyp., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 02/41] guix: node-build-system: Add implicit libuv input., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 01/41] guix: node-build-system: Add delete-lockfiles phase., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 04/41] guix: node-build-system: Add avoid-node-gyp-rebuild phase., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 06/41] gnu: node-semver-bootstrap: Use 'delete-dependencies'., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities., Liliana Marie Prikler, 2021/12/30
- [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities., Liliana Marie Prikler, 2021/12/30
- [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities.,
Philip McGrath <=
- [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities., Liliana Marie Prikler, 2021/12/31
- [bug#51838] [PATCH v6 05/41] guix: node-build-system: Add 'delete-dependencies' helper function., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 05/41] guix: node-build-system: Add 'delete-dependencies' helper function., Liliana Marie Prikler, 2021/12/30
- [bug#51838] [PATCH v6 05/41] guix: node-build-system: Add 'delete-dependencies' helper function., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 05/41] guix: node-build-system: Add 'delete-dependencies' helper function., Liliana Marie Prikler, 2021/12/30
- [bug#51838] [PATCH v6 07/41] gnu: node-ms-bootstrap: Use 'delete-dependencies'., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 09/41] gnu: node-debug-bootstrap: Use 'delete-dependencies'., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 08/41] gnu: node-binary-search-bootstrap: Use 'delete-dependencies'., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 11/41] gnu: node-llparse-frontend-bootstrap: Use 'delete-dependencies'., Philip McGrath, 2021/12/30
- [bug#51838] [PATCH v6 13/41] gnu: node-semver: Use 'delete-dependencies'., Philip McGrath, 2021/12/30