[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: New rx implementation with extension constructs
From: |
Noam Postavsky |
Subject: |
Re: New rx implementation with extension constructs |
Date: |
Thu, 5 Sep 2019 11:38:23 -0400 |
> works just as expected. &rest arguments are permitted, and expand to
> implicit (seq ...) forms. No provision was made for macros able to
> execute arbitrary Lisp code; I just couldn't find a use for them, and
> decided to wait until someone would tell me otherwise. Thus, all
> parametrised forms work by plain substitution.
Do you mean that macros don't support (literal LISP-FORM) and (regexp
LISP-FORM)? Or something else?
> +;; The `rx--translate...' functions below return (REGEXP . PRECEDENCE),
> +;; where REGEXP is a list of string expressions that will be
> +;; concatenated into a regexp, and PRECEDENCE is one of
> +;;
> +;; t -- can be used as argument to postfix operators
> +;; seq -- can be concatenated in sequence with other seq or higher
> +;; lseq -- can be concatenated to the left of rseq or higher
> +;; rseq -- can be concatenated to the right of lseq or higher
> +;; nil -- can only be used in alternatives
> +;;
> +;; They form a lattice:
> +;;
> +;; t highest precedence
> +;; |
> +;; seq
> +;; / \
> +;; lseq rseq
> +;; \ /
> +;; nil lowest precedence
It would help to add some concrete examples (i.e., of things that
would count as `t', `seq', etc) to this abstract explanation.
> +(defun rx--translate-symbol (sym)
> + "Translate an rx symbol. Return (REGEXP . PRECEDENCE)."
> + (pcase sym
> + ((or 'nonl 'not-newline 'any) (cons (list ".") t))
Is there a reason not to use '((".") . t) here (and similar for the rest
of the alternatives)? If yes, then it's probably worth mentioning in a
comment.
> +(defun rx--string-to-intervals (str)
> + "Decode STR as intervals: A-Z becomes (?A . ?Z), and the single
> +character X becomes (?X . ?X). Return the intervals in a list."
> + ;; We could just do string-to-multibyte on the string and work with
> + ;; that instead of this `decode-char' workaround.
> (let ((decode-char
> - ;; Make sure raw bytes are decoded as such, to avoid confusion with
> - ;; U+0080..U+00FF.
> (if (multibyte-string-p str)
> #'identity
> (lambda (c) (if (<= #x80 c #xff)
> @@ -483,477 +280,657 @@ rx-check-any-string
> c))))
If not using string-to-multibyte, I think this lambda can be replaced
with #'unibyte-char-to-multibyte.
Re: New rx implementation with extension constructs,
Noam Postavsky <=
Re: New rx implementation with extension constructs, Mattias Engdegård, 2019/09/06