bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68246: 30.0.50; Add non-TS mode as extra parent of TS modes


From: Dmitry Gutov
Subject: bug#68246: 30.0.50; Add non-TS mode as extra parent of TS modes
Date: Fri, 19 Jan 2024 03:28:16 +0200
User-agent: Mozilla Thunderbird

On 18/01/2024 23:24, Stefan Monnier wrote:
Still not sure how that would work.  I mean, we could have content-type
like text/html or application/json, but neither splits into two
languages, really.

Not sure what you mean by "split", but just as with major modes and
"languages", MIME content-types have inclusion properties, such as

     application/atom+xml ⊂ application/xml ⊂ text/plain

I meant to try to clarify your meaning when you said that content-types are to languages the same as what languages are to major modes.

That might mean that a content-type corresponds to a number of languages, just like a language corresponds to a number (open set) of major modes. But I don't see how. Please enlighten.

Speaking of the relation of inclusion, as I said before we might not want to make languages hierarchical (even though it might help for certain cases), because the relation might not be universal across uses.

And if you had the idea of expressing it through major mode inheritance as well, it's likely to have adverse effects, something like:

  (derived-mode-add-parents 'c++-ts-mode 'c++-lang)
  +
  (derived-mode-add-parents 'c++-lang 'c-lang)
  =
  (provided-mode-derived-p 'c++-ts-mode 'c-lang)

Which can lead some callers to decide that c++-ts-mode's language is C.

OTOH, the major mode can only run the language hook, I think, if any major
mode can correspond only to one language.
Not so.  A major mode can easily do
      (run-mode-hooks (compute-the-hook))
I guess that would mean that the language hook is not run automatically,
that each major mode would need explicit code to compute it and run.

Not necessarily, e.g. you could specify the language to
`define-derived-mode` with something like

     :language (compute-the-language)

and then have `define-derived-mode` compute the hook name from that.
This said, I suspect that generic major modes supporting many languages
will not be very numerous (after all, that's the point of being generic,
no?), so it should be OK if they have to do some things manually.

Okay.

The side-effect of this approach is that we basically declare a mode's language twice: once in the attribute above, and once in the major-mode-remap-alist which is put into autoloads. But it's probably minor enough.

And if languages are distinct from major modes in naming, the :language attribute in define-derived-mode could make it run the corresponding hook at the end. Which seems good.

Though I suppose if set-auto-mode-0 saves the currently "detected"
language somewhere, the major mode definitions could pick it up and
call the corresponding hook.
Major modes are not activated solely via `set-auto-mode-0`, so relying
on that is a crutch/hack, not something on which to base a design.
The major mode could compute which language it is for. But the algorithm
could be undecidable if the buffer is not visiting a file yet, doesn't have
an interpreter comment, etc. That's where the command set-buffer-language
was supposed to come in handy.

That still doesn't justify the major mode relying on `set-auto-mode-0`.

AFAICT you seem to want to standardize how the user controls the language of
language-generic major modes.  I'm not sure we need such a standard.
Do we even have such a generic major mode yet?

In my picture that was just the natural conclusion. What I was trying to do, is put a level of control above the major modes - the mapping from languages to modes, and make it more easy to control and configurable.

It didn't seem that the presence of a major mode was required to detect the expected language, hence the addition of the new values. At that point it seemed natural to both allow the absence of a configured major mode (why not), and to run the language hook anyway, for reliability.

If we really don't need any of that, then the auto-mode-alist and the companion vars don't even have to change, and the only place where the language name could feature (aside from the code looking it up), is the mode definitions.

I'm not comfortable enshrining the "-ts-mode" convention here.
We can still go the "strict" approach, where when no language is assigned,
we don't try to guess it.
I think the `<LANG>-mode` heuristic is acceptable, because it's been
*the* convention used in Emacs.
We are now getting a whole set of new modes for which this heuristic isn't
going to work

Cue the patch I submitted when I open this bug report 🙂
Now `<LANG>-mode` is again included in `derived-mode-all-parents` for
those new modes.

If the language is called <LANG>-lang instead (of without suffix), then the major mode could also run the language-specific hook, which in your patch it cannot do.

Admittedly, it doesn't fully give a solution to the problem of computing
"the" language of a buffer.  But that gets us back to one of my recent
questions: beside Eglot, which other package needs that?  Is "the"
language always unique and always the same for all those packages?
Is it really the only thing those packages need?

In the case of Eglot, at least it doesn't seem to be the case: we don't
just need the language, but also the name of the language server to use.
And for some buffers there can be several applicable language servers,
and they don't necessarily all accept the same language.

So we need either the major mode to provide the name of the server, or
a central database that maps from language/type/mode to server name.
In both cases, adding the language info to the server name is
a non-issue.  And in neither case is it necessary to know "the" language
in order to find the server.  My patch makes the central database
work better.

I think I've included some thoughts on this subject in my previous email. They don't seem to be quoted/commented on here.

The major-mode could be fundamental-mode. If the language were to be
specifiable through settings external to major modes, we could still do
useful things while in fundamental-mode (e.g. do some useful editing with
Eglot, provided it supports indentation and completion), or suggest which
major modes to install from ELPA.

I don't see the interest of using specifically `fundamental-mode` for
that.  In any case, this seems too hypothetical at this stage to have
a good idea of what we'd need in such circumstances.

The latter feature (suggest which major modes to install) has come up recently. It's not that difficult to implement (with a whitelist of packages), and fundamental-mode is most likely *the* major mode which would be used until the suitable major mode is installed.

Would we really care that an xml file inside an archive is applied both
archive-subfile-mode and xml-mode dir-locals settings?

No, I wasn't thinking of XML files inside archives, but about files
which are both archives and something else (e.g. ODT).  The same applies
for most other "generic" data containers, like XML and JSON.

Okay, ODT. Which we can view with either doc-view-mode or xml-mode. Languages :doc or :xml. We configure one of these langauges to be used by default, and switch to another at will.

Not sure it's useful to consider both modes somehow active at the same time.

Although this example does underscore the problem of major modes needing to be able to specify/change the current language themselves. At least if we don't want such modes as doc-view to have to be rewritten.

On the third hand, external tools (lsp servers, ripgrep, etc) will view such files as a certain type only - just ODT. Which might make us a disservice if the current detected language changes as we change the major mode. Hmm. And since xml-mode itself doesn't know ODT, it won't be able to "compute" that language value either (same would likely be true for other "container" modes).

Anyway, as I mentioned elsewhere, I think this discussion of "languages"
is only tangentially related to my proposed patch.  There is some
overlap, but they serve different purposes, and they're not
mutually exclusive.

I think the "languages" feature seems to cover the same functionality as your patch, and then some. Although at the expense of the downstream callers having to use the new feature, rather than having things work "automagically" (as soon as they stop supporting Emacs 29.1, that is).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]