guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Stupid module and pregexp questions


From: Tom Lord
Subject: Re: Stupid module and pregexp questions
Date: Fri, 24 Oct 2003 15:30:05 -0700 (PDT)


    > From: Thien-Thi Nguyen <address@hidden>

    >    From: Tom Lord <address@hidden>
    >    Date: Tue, 29 Apr 2003 23:31:41 -0700 (PDT)

Wow, challenging my memory, eh?

    >    Heck, they were actively excised -- apparently by virtue of some
    >    (sorry, folks) misguided reasoning about the cleanliness of their
    >    semantics.

That was a comment I made about the removal of shared substrings.

I was delighted when 1.6.4 gave me:

  guile> make-shared-substring
  #<primitive-procedure make-shared-substring>

though perturbed that the source code says:

  #if SCM_DEBUG_DEPRECATED == 0
  [...]
  SCM_DEFINE (scm_make_shared_substring, "make-shared-substring", 1, 2, 0,

I can try to write up a "case for shared substrings" if that would be
helpful.

    > in guile 1.4.1.96 you can do `(use-modules (lang librgx))' to try out
    > librx re-integration.  it's even documented to some extent in the manual.
    > below is some work-in-progress flex envy slated for 1.4.2 based on rx...

Yikes.  I'm scared to ask what version of Rx you are using.   You
_should_ (really) be using the latest and greatest in libhackerlab,
which is not currently in release.    However, by shocking
coincidence, I was just today semi-preparing to set-up a savannah
libhackerlab project and get it back out there (separately from arch,
in which it happens to be included).

A nice side effect of that: systas (which I'm not planning on
re-releasing anytime soon but which is trivially available in my
public archives) has a nice libsystas binding for the latest and
greatest rx.  It'd probably take like 2hrs at most to port it to
guile.

Nifty code sample from systas:

  (define-public sans-leading-blanks
    (structured-regexp->procedure `(^ (* ([] blank))) :pick-spec '>))


Defines a procedure that takes a string, compares it to the given
regexp, and returns a shared substring of that string.  The
`pick-spec' says _which_ shared substring to return.  ">" means,
return the shared substring that begins at the first character after
the match.

-t

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;  Structured Regular Expressions

[with apologies to Olin Shivers]

;;; 
;;; 
;;; A structured regexp is a recursively defined list structure.
;;; The general form is:
;;;
;;;     structured-regexp := (<operator> <parameter> ...)
;;;     parameter         := <integer>
;;;                       |  <character>
;;;                       |  <string>
;;;                       |  <keyword>
;;;                       |  <structured-regexp>
;;;
;;; The valid operators are:
;;;
;;;     operator          := const      ; a string constant
;;;                       |  any        ; any character
;;;                       |  []         ; character set
;;;                       |  [^]        ; negated character set
;;;                       |  ^          ; start anchor
;;;                       |  $          ; end anchor
;;;                       |  ?          ; optional sub-expression
;;;                       |  *          ; repeated sub-expression
;;;                       |  +          ; non-empty, repeated sub-expression
;;;                       |  {}         ; a counted sub-expression
;;;                       |  =          ; parenthesized subexpression
;;;                       |  &          ; sub-expression concatenation
;;;                       |  |          ; alternative sub-expressions
;;;                       |  @          ; parenthesized subexpression 
back-reference
;;;                       |  /          ; the "cut" operator
;;;                       | !           ; the symbolicly labeled "cut" operator
;;;
;;; As a short-hand, some structured regexps can be abbreviated:
;;;
;;;     (const "string") == "string"
;;;     (* any)          == *.
;;;     (^ ($ subexp))   == (^$ subexp)
;;; 
;;; Each operator has its own syntax, so the precise syntax of a structured
;;; regexp is:
;;;
;;;     structured-regexp :=    (const <string>)
;;;                       |     ([] <character-set-element> ...)
;;;                       |     ([^] <character-set-element> ...)
;;;                       |     (^ <structured-regexp> ...)
;;;                       |     ($ <structured-regexp> ...)
;;;                       |     (? <structured-regexp> ...)
;;;                       |     (* <structured-regexp> ...)
;;;                       |     (+ <structured-regexp> ...)
;;;                       |     ({} <integer> <integer> <structured-regexp> ...)
;;;                       |     (& <structured-regexp> ...)
;;;                       |     (| <structured-regexp> ...)
;;;                       |     (= [<subexpression-label>] <structured-regexp> 
...)
;;;                       |     (@ <subexpression-label>)
;;;                       |     (/ <integer>)
;;;                       |     (! [<cut-label>] <structured-regexp> ...)
;;;
;;;     character-set-element   :=      string
;;;                             |       character
;;;                             |       (character . character) ; a range of 
characters
;;;                             |       <character-set> ; see the `(standard 
char-set-lib)' module
;;;
;;;     subexpresion-label      :=      <keyword> ; (a keyword)
;;;     cut-label               :=      <keyword> ; (a keyword)
;;;
;;; A `pick-spec' specifies values to be returned from `regexec' or a
;;; procedure returned by `regexec-function'.  It has the form:
;;; 
;;;     pick-spec       :=      #f      ; return #t if a match is found, #f 
otherwise
;;; 
;;;                     |       #t      ; return #f or a list `(before match 
after)'
;;;                                     ; that is the partition of the string 
implied
;;;                                     ; by a successful match
;;; 
;;;                     |       <recursive-pick-spec>
;;; 
;;; 
;;; A `recursive-pick-spec' is:
;;; 
;;;     recursive-pick-spec :=  <rps-elt>       ; return only the value implied 
by `rps-elt'
;;;                         |   (<rps-elt> ...) ; return a list of values 
implied by 
;;;                                             ; the list of `rps-elt's.
;;;
;;; An `rps-elt' is:
;;; 
;;;     rps-elt         :=      <part>  ; return the indicated part of the 
string
;;;                                     ; (see below)
;;; 
;;;                     |       (<start-point> <end-point>) ; return the 
substring starting
;;;                                     ; at  `<start-point>' and ending 
immediately
;;;                                     ; before `<end-point>' (see below)
;;; 
;;; 
;;;                     |       state-label ; return the state label of the DFA 
ending
;;;                                     ; state.  If the match terminated at a 
`cut'
;;;                                     ; operator (`/' in sre notation), this 
is
;;;                                     ; the integer argument to that operator.
;;; 
;;;                     |       ?       ; the keyword of the terminating cut 
label or #f
;;; 
;;;                     |       <keyword> ; return the keyword literally.  This 
is useful
;;;                                     ; for labeling elements in a 
`recursive-pick-spec'
;;;                                     ; which is a list.
;;; 
;;; A `part' indicates the entire match, a parenthesized
;;; subexpression, or the substring that preceeds a match, or the
;;; substring that follows a match:
;;; 
;;;     part            :=      0       ; the entire match
;;; 
;;;                     |       <n>     ; (an integer) the `nth' parenthesized 
subexpression
;;; 
;;;                     |       (@ <keyword>) ; the subexpression labeled by 
`<keyword>'
;;;
;;;                     |       <       ; (the symbol '<') the substring 
preceeding the match
;;; 
;;;                     |       >       ; (the symbol '>') the substring 
following the match
;;; 
;;; A `point' indicates a specific position within the string.  There
;;; are two kinds of `point': a `start-point' and and `end-point' that together
;;; specify a substring of the string:
;;; 
;;;     start-point     :=      <part>          ; the beginning of the 
indicated match part.
;;;                     |       <any-point>     ; (see below)
;;; 
;;;     end-point       :=      <part>          ; the end of the indicated 
match part.
;;;                     |       <any-point>     ; (see below)
;;; 
;;;     any-point       :=      (<part> 0)      ; the beginning of the 
indicated match part
;;;                     |       (<part> 1)      ; the end of the indicated 
match part
;;; 
;;;
;;; An example pick spec that returns a list of substrings of the original 
string:
;;; 
;;;     (0              ; the entire match
;;; 
;;;      (< 0)          ; from the start of the string to the end of the match
;;; 
;;;      (2 >)          ; from the start of subexpression 2 to the end of the 
string
;;; 
;;;      (@ :username)  ; the subexpression labeled `:username'
;;; 
;;;      ((@ :username)   ; from the start of the subexpression labeled 
`:username'
;;;       (@ :directory)) ; ... to the end of the subexpression labeled 
`:directory'
;;;                       
;;;      ((2 1)                 ; from the end of subexpression 2 
;;;       ((@ :directory) 0)))  ; ... to the beginning of the subexpression 
labeled :directory
;;;                            
;;;     




reply via email to

[Prev in Thread] Current Thread [Next in Thread]