help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to grok a complicated regex?


From: Yuri Khan
Subject: Re: How to grok a complicated regex?
Date: Sat, 14 Mar 2015 11:14:26 +0600

On Sat, Mar 14, 2015 at 5:16 AM, Marcin Borkowski <mbork@wmi.amu.edu.pl> wrote:

>>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
>>>
> It's not really /difficult/.
> Intimidating, yes.  Boring, possibly.  Laborious (and mechanical), yes.
> But not /difficult/.

I tried it and it’s not very intimidating or boring or laborious or
difficult. Here’s my thought process:

First I unescape all backslashes, by global-replacing “\\” with “\”.

Then I insert spaces at key points to separate the syntactic
constructs. (Any literal spaces in the regexp need to be made
explicit, e.g. by replacing as <space>.)

    \` \(?: \\ [([] \| \$+ \)? \(.*?\) \(?: \\ [])] \| \$+ \)? \'

Imagining the parentheses and alternatives as nested boxes might help, too:

       ┌─────────┬─────┐  ╔═══╗ ┌─────────┬─────┐
    \` │ \\ [([] │ \$+ │? ║.*?║ │ \\ [])] │ \$+ │? \'
       └─────────┴─────┘  ╚═══╝ └─────────┴─────┘

(Here the nesting level is just 1, so I didn’t actually need to draw
it, just match.)

Now I can read it:

1. start-of-string
2. optionally followed by either
    * a backslash and either an opening parenthesis or bracket
    * or one or more dollar signs
3. followed by any string, which is extracted as group 1
4. optionally followed by either
    * a backslash and either a closing bracket or parenthesis
    * or one or more dollar signs
5. followed by end-of-string

I can further grok it as matching a valid (La)TeX math formula: $…$,
$$…$$, \(…\), \[…\]; as well as some invalid markup such as $$$$…$$$,
$…\], \(…\], $$…, etc.


As for the bigger picture, I think, if a regular expression ends up
difficult to read, it needs decomposed into small, easily digestible
chunks, each with a descriptive name. Elisp has the let* form and the
rx macro for this purpose.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]