bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23461: perl-mode: Displaying HERE-docs as strings instead of comment


From: Harald Jörg
Subject: bug#23461: perl-mode: Displaying HERE-docs as strings instead of comments [PATCH]
Date: Wed, 23 Dec 2020 19:46:19 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Stefan Monnier writes:

>> This looks like to be an improved variation of 2): HERE-docs remain
>> marked as c-style comments, and `font-lock-syntactic-face-function` is
>> used to display them as strings.
>>
>> A patch for this variation is attached.
>
> Looks good, thanks.
> See nitpicks below,

Many thanks for your review!

>>              (if (save-excursion (nth 8 (syntax-ppss (match-beginning 0))))
>> +                ;; '>>' occurred in a string, or in a comment.
>>                  ;; Leave the property of the newline unchanged.
>
> Is think this `>>` is a type for `<<`, or am I missing something?

Ouch.  That is a typo, of course.  *I* have missed something.

>> +              ;; Before changing the syntax to c-style comment, let's
>> +              ;; check whether we are in an end-of-line comment, and
>> +              ;; if so, cheat by shifting the comment markers one char
>> +              ;; to the left.
>
> I jump straight to reading the code before reading your email's text and
> it took me a bit of time to understand what this was about.
> I think part of the reason is the "we are in an end-of-line comment"
> since this is actually not about the case where the "<<" (which is
> where "we" are at this moment, in my mind) is inside a comment.
> So, I think the comment would be better if it just gave a straight
> example, like
>
>                  ;; Beware of `foo <<'BAR' #baz` because
>                  ;; the newline needs to close the comment
>                  ;; and can't be used to start the here-doc.

That's better, thank you!

> Also rather than "shifting the comment markers one char to the left"
> I'd just say that "we terminate the comment *just before* the newline".

>> +              (when (nth 4 (save-excursion (syntax-ppss eol)))
>> +                (when (equal (car (syntax-after (1- eol)))
>> +                             (car (string-to-syntax "<")))
>
> This is a pessimistic test, because it will misfire when you have
>
>      foo <<'BAR' #baz#
>
> I think we should compare (1- eol) with (nth 8 (syntax-ppss eol))
> instead.

You are right.  I'm still not very experienced with this syntax stuff,
as it turnes out.  In hindsight it is clear that a "#" will still be a
comment starter, even if it is already in a comment.

>> +                  ;; yet another edge case: "#" is the last character
>> +                  ;; in that line, so there's actually no comment.
>> +                  (put-text-property (- eol 2) (1- eol)
>> +                                     'syntax-table (string-to-syntax "<")))
>
> Indeed, terminating the comment just before the newline is a problem if
> "just before the newline" is the comment starter.  I see that in that
> case, you mark the char before the # but that can also be a problem with
> things like:
>
>     foo <<'BAR' "baz"#

...or, as I've discovered in the meantime, also bad,

      foo << BAR#

      foo << BAR;#

(The latter breaks indentation after the HERE-doc because the ";"
became a comment)

So, back to the drawing board.  If that edge case of an "empty comment"
is just ignored, then the single # loses its comment-face (that is, by
the way, the treatment of HERE-docs in sh-script.el), which is ugly but
as far as I can tell harmless.  Maybe this can be covered by assigning a
font-lock-face property to these single comment starters... I'll check
that.

> An alternative is to leave the comment alone and start the heredoc just
> after the newline instead (that approach suffers from the fact that we
> need to be careful we don't accidentally throw away that `syntax-table`
> text property when the next line is edited.  We can do that using the
> `syntax-multiline` text property).

This is similar to how CPerl mode does it, and the fact that text
properties can get lost when text is edited brought me here.

The bugs Bug#14343 and Bug#28962 in CPerl mode come from lost text
properties, and Bug#24101 is a consequence of starting a syntax property
with the first character.  So, I wanted to check how Perl mode covers
that.  I'll check whether `syntax-multiline` can add some more
robustness.

This points to the deeper problem: In Perl, we have some occasions where
one character has two different roles.  A line end ends a comment *and*
starts a here-doc, and in s/foo/bar/ge the middle slash ends the search
string *and* starts the replacement string.  The programming modes seem
to treat this by distributing the roles on two neighbouring characters,
which comes with some ... inaccuracies if the characters nearby have
roles of their own.

>>  (defun perl-font-lock-syntactic-face-function (state)
>>    (cond
>> +   ((and (eq 2 (nth 7 state)) ; c-style comment
>> +         (cdr-safe (get-text-property (nth 8 state) 'syntax-table))) ; HERE 
>> doc
>> +    'font-lock-string-face)
>
> I think some people won't like the string-face property for it.
> How 'bout we (require 'sh-script) and use the `sh-heredoc` face?

I'd hesitate to do that.  I think that in shell-script mode it is well
justified to use a different face for HERE-docs, but in Perl it isn't.
In Perl, a HERE-doc is just a string and can be used wherever a string
can be used.  So, string-face seems quite appropriate.  In Shell, a
HERE-doc is sort of an I/O redirection.  You can't, for example, assign
a HERE-doc to a shell variable.  So, a different face seems appropriate.

BTW: CPerl also mode uses string-face for HERE-docs.
-- 
Cheers,
haj





reply via email to

[Prev in Thread] Current Thread [Next in Thread]