help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: why emacs lisp's regex has 2-steps escapes?


From: Kevin Rodgers
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Thu, 10 Jul 2008 02:17:36 -0600
User-agent: Thunderbird 2.0.0.14 (Macintosh/20080421)

Xah wrote:
emacs regex has a odd pecularity in that it needs a lot backslashes.
More specifically, a string first needs to be properly escaped, then
this passed to the regex engine.

For example, suppose you have this text “Sin[x] + Sin[y]” and you need
to capture the x or y.

In emacs i need to use
“\\(\\[[a-z]\\]\\)”

If all you want to capture is the x or y (without the square brackets):

        "\\[\\([a-z]\\)\\]"

for the actual regex
“\(\[[a-z]\]\)”.

The enclosing double quotes are misleading in this context.  I would
simply write (again, capturing the letter but not the brackets):

        \[\([a-z]\)\]

Could you show the corresponding syntax in Perl or Java, as both a
conceptual (unquoted) regular expression and as a string literal (for
comparison)?

Here's somewhat typical but long regex for matching a html image tag

(search-forward-regexp "<img +src=\"\\([^\"]+\\)\" +alt=\"\\([^\"]+\\)?
\" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t)

The toothpick syndrom gets crazy making already difficult regex syntax
impossible to read and hard to code.

One of the reasons Emacs regular expressions are hard-to-read in this
way is that parentheses are defined as normal characters that need to be
escaped when they are to be interpreted as grouping delimiters, whereas
other languages interpret parentheses the opposite (as metacharacters
that need to be escaped to be matched literally).

My question is, why is elisp's regex has this 2-steps process? Is this
some design decision or just happened that way historically?

It is due to the distinction between a string and the syntax for
representing it in a program, and the interpretation of the characters
in a string itself (vs. its surface representation) as a regular
expression.

This is just like writing a shell command (using double quotes around
the regular expression) that calls the grep program (which never "sees"
the quotes).

Second question: can't elisp create some like “regex-string” wrapper
function that automatically takes care of the quoting? I can't see how
this migth be difficult?

All you need to do is specify a regular expression syntax and a string
literal syntax that don't define meanings for the same character (here:
backslash).

--
Kevin Rodgers
Denver, Colorado, USA





reply via email to

[Prev in Thread] Current Thread [Next in Thread]