[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regexp and strings you don't want
From: |
Oliver Scholz |
Subject: |
Re: regexp and strings you don't want |
Date: |
Fri, 29 Aug 2003 20:30:05 +0200 |
User-agent: |
Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3.50 (windows-nt) |
[Yet another follow-up to myself ...]
Oliver Scholz <alkibiades@gmx.de> writes:
> kai.grossjohann@gmx.net (Kai Großjohann) writes:
>
>> chaz2@thedoghousemail.com (Chaz) writes:
>>
>>> For example, how can I search for a paragraph beginning with "The"
>>> that does NOT include the word "top"?
>>
>> It is possible to build a regexp that does this (disregarding the
>> paragraph problem at the moment), but it is not pretty.
>>
>> Some regexp implementations have the feature you're looking for to
>> make it convenient, but the Emacs implementation doesn't.
>>
>> Let me rephrase this in terms of lines instead of paragraphs.
>>
>> The idea is this: search for a line that begins with The and then
>> does not have top after it, as follows: after The, we allow any
>> characters that aren't t. We also allow a t followed by something
>> that's not o, and also a to that's followed by something that's not
>> p. And so on:
>>
>> "^The\\([^t]*\\($\\|t$\\|t[^o]\\|to$\\|to[^p]\\)\\)*$"
>
> Hmm. This is not really human readable. Would it be hard and/or bad
> to extend `rx' so that it allows for (not STRING)? A là:
>
> (looking-at (rx (and line-start
> "The "
> (not "top"))))
>
> Whereas `(not "top")' would compile to a normal regexp in the way you
> described it. WDYT?
[...]
I've played a bit with this (patch below). But I thing I am a bit
puzzled. With my patch, `(rx (not top))' translates to:
"\\(?:[^t]*\\|t[^o]*\\|to[^p]*\\)"
Is this actually correct?
What does the concept of a regexp that matches a sequence of
characters that does _not_ contain a certain sequence of characters
actually mean?
Should it match any sequence of characters not identical to the
unwanted one (including the empty string) or should it match only
sequences of the same length? Or any non-empty sequence of characters
not identical with the unwanted one?
With my patch:
(string-match (rx (and line-start
"The "
(not "top")
" lirum larum"))
"The top lirum larum")
==> nil
(string-match (rx (and line-start
"The "
(not "top")
" lirum larum"))
"The to lirum larum")
==> 0
(string-match (rx (and line-start
"The "
(not "top")
" lirum larum"))
"The lirum larum")
==> nil
Is this good or bad?
Oliver (puzzled)
cd ~/akt/lisp/
diff -u "c:/Programme/emacs/lisp/emacs-lisp/rx.el" "d:/egoge/akt/lisp/rx.el"
--- c:/Programme/emacs/lisp/emacs-lisp/rx.el 2003-08-29 19:21:19.000000000
+0200
+++ d:/egoge/akt/lisp/rx.el 2003-08-29 20:14:58.000000000 +0200
@@ -344,27 +344,44 @@
(defun rx-not (form)
"Parse and produce code from FORM. FORM is `(not ...)'."
(rx-check form)
- (let ((result (rx-to-string (cadr form) 'no-group)))
- (cond ((string-match "\\`\\[^" result)
- (if (= (length result) 4)
- (substring result 2 3)
- (concat "[" (substring result 2))))
- ((string-match "\\`\\[" result)
- (concat "[^" (substring result 1)))
- ((string-match "\\`\\\\s." result)
- (concat "\\S" (substring result 2)))
- ((string-match "\\`\\\\S." result)
- (concat "\\s" (substring result 2)))
- ((string-match "\\`\\\\c." result)
- (concat "\\C" (substring result 2)))
- ((string-match "\\`\\\\C." result)
- (concat "\\c" (substring result 2)))
- ((string-match "\\`\\\\B" result)
- (concat "\\b" (substring result 2)))
- ((string-match "\\`\\\\b" result)
- (concat "\\B" (substring result 2)))
- (t
- (concat "[^" result "]")))))
+ (if (stringp (cadr form))
+ (rx-reverse-string (cadr form))
+ (let ((result (rx-to-string (cadr form) 'no-group)))
+ (cond ((string-match "\\`\\[^" result)
+ (if (= (length result) 4)
+ (substring result 2 3)
+ (concat "[" (substring result 2))))
+ ((string-match "\\`\\[" result)
+ (concat "[^" (substring result 1)))
+ ((string-match "\\`\\\\s." result)
+ (concat "\\S" (substring result 2)))
+ ((string-match "\\`\\\\S." result)
+ (concat "\\s" (substring result 2)))
+ ((string-match "\\`\\\\c." result)
+ (concat "\\C" (substring result 2)))
+ ((string-match "\\`\\\\C." result)
+ (concat "\\c" (substring result 2)))
+ ((string-match "\\`\\\\B" result)
+ (concat "\\b" (substring result 2)))
+ ((string-match "\\`\\\\b" result)
+ (concat "\\B" (substring result 2)))
+ (t
+ (concat "[^" result "]"))))))
+
+;; [^t]\|t[^o]\|to[^p]
+;; [^t]?\
+
+(defun rx-reverse-string (string)
+ (let ((list nil))
+ (dotimes (i (length string))
+ (push (rx-reverse-string-1 i string) list))
+ (concat "\\(?:"
+ (mapconcat 'identity (nreverse list) "\\|")
+ "\\)")))
+
+(defun rx-reverse-string-1 (n string)
+ (concat (substring string 0 n)
+ "[^" (string (aref string n)) "]*"))
(defun rx-repeat (form)
Diff finished at Fri Aug 29 20:19:48
--
12 Fructidor an 211 de la Révolution
Liberté, Egalité, Fraternité!