bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68246: 30.0.50; Add non-TS mode as extra parent of TS modes


From: Dmitry Gutov
Subject: bug#68246: 30.0.50; Add non-TS mode as extra parent of TS modes
Date: Thu, 18 Jan 2024 07:05:55 +0200
User-agent: Mozilla Thunderbird

On 17/01/2024 19:08, Stefan Monnier wrote:
@@ -3150,6 +3150,9 @@ auto-mode-alist
  Visiting a file whose name matches REGEXP specifies FUNCTION as the
  mode function to use.  FUNCTION will be called, unless it is nil.
+FUNCTION can also be a keyword denoting a language, to be looked
+up in `major-mode-remap-alist'.

Side note: the intention is OK but `major-mode-remap-alist' is
a defcustom and should remain nil by default.  It's there for the users
to express which major modes they prefer.  So if we want a mapping
between some new language/type concept and major modes, it should be
stored elsewhere (could be a plain alist that's handled as a kind of
"implicit tail of `major-mode-remap-alist`").

Good point. The user can customize it and lose the non-default modes configured for a language.

The way I introduced languages as keywords was an experiment, really. Mostly to save on typing - because the first plan was to have a parallel set of alists (since we can't right away deprecate the file -> mmode mappings right away). The language-specific version of major-mode-remap-alist looks necessary after all.

@@ -3206,10 +3209,10 @@ interpreter-mode-alist
       ("emacs" . emacs-lisp-mode)))
    "Alist mapping interpreter names to major modes.
  This is used for files whose first lines match `auto-mode-interpreter-regexp'.
-Each element looks like (REGEXP . MODE).
+Each element looks like (REGEXP . MODE-OR-LANGUAGE).
  If REGEXP matches the entire name (minus any directory part) of
  the interpreter specified in the first line of a script, enable
-major mode MODE.
+MODE-OR-LANGUAGE.

There's a similar need for "content type" rather than "language".  If we
want to mention "language" we should also take the opportunity to
mention other related categorizations like "content type".

Are "content type" and "language" going to be different things? They seem the same to me.

-      (funcall (alist-get mode major-mode-remap-alist mode))
+      ;; XXX: When there's no mapping for `:<language>', we could also
+      ;; look for a function called `<language>-mode'.
+      (funcall (alist-get mode major-mode-remap-alist (if (keywordp mode)
+                                                          #'fundamental-mode
+                                                        mode)))
+      (when (keywordp mode)             ;Perhaps do that unconditionally.
+        (run-hooks (intern (format "%s-language-hook" (buffer-language)))))

That seems wrong:
- Why should this hook run when `auto-mode-alist` says `:js` but not
   when doing `M-x javascript-mode` (or other ways to enable this mode)?
- Why run this hook *after* the mode's `:after-hook` and after
   things like `after-change-major-mode-hook`?

I think it should remain the major mode's responsibility to decide which
hooks it runs.

On one hand, this is an artefact of not implementing the language-classification inside define-derived-mode.

OTOH, the major mode can only run the language hook, I think, if any major mode can correspond only to one language. Though I suppose if set-auto-mode-0 saves the currently "detected" language somewhere, the major mode definitions could pick it up and call the corresponding hook.

Hmm, perhaps in that case the major modes won't need any special attribute in their definitions (to specify their language): any major mode would run <lang>-language-hook where <lang> is the language detected for the buffer or assigned explicitly.

+(defun buffer-language ()
+  "Return the language of the current buffer.
+A language is a lowercase keyword with the name of the language."
+  ;; Alternatively, we could go through all the matchers in
+  ;; auto-mode-alist, interpreter-mode-alist,
+  ;; magic-fallback-mode-alist here, possibly using a cache keyed on
+  ;; buffer-file-name.  But that's probably an overkill: if the user
+  ;; changes the settings, they can call `M-x revert-buffer' at the end.
+  (if (keywordp (car set-auto-mode--last))
+      (car set-auto-mode--last)
+    ;; Backward compatibility.
+    (intern (format ":%s" (replace-regexp-in-string "\\(?:-ts\\)?-mode\\'" ""
+                                                    (symbol-name 
major-mode))))))

I'm not comfortable enshrining the "-ts-mode" convention here.

We can still go the "strict" approach, where when no language is assigned, we don't try to guess it.

Also I think if we want a `buffer-language` function, it should not rely
on how the mode was installed (e.g. `set-auto-mode--last`) but only on
the major mode itself, i.e. something like

     (defun buffer-language ()
       (or buffer-language

Where would the buffer-language variable be set, if not inside set-auto-mode-*?

           (some heuristic based on major-mode and/or derived-modes)))

If we're sure we don't want several languages to be able to refer to the same major mode...

[ Of course, I already mentioned that I also suspect that there can/will
   be sometimes several languages (or none).  ]

I'm not clear on this. You mentioned complex cases - like an xml inside an archive? But depending on the usage, only one of the languages might be "active" at a given time. Or a combination of languages would simply be another language, basically.

A more specific scenario might clarify this better.

+(defun set-buffer-language (language)
+  "Set the language of the current buffer.
+And switch the major mode appropriately."
+  (interactive
+   (list (let* ((ct (mapcan
+                     (lambda (pair) (and (keywordp (car pair))
+                                    (list (symbol-name (car pair)))))
+                     major-mode-remap-alist))
+                (lang (completing-read "Language: " ct)))
+           (and lang (intern lang)))))
+  (set-auto-mode-0 language))

I see several issues with this function (name and implementation), but
I wonder when we'd ever need such a thing.

It seemed like a missed opportunity not to provide a more high-level command to switch to a specific language for the buffer. E.g. how we sometimes use 'M-x foo-major-mode' when a file type's been misdetected, or the buffer is non-file-visiting (perhaps very temporary).

A command which does this with exhaustive completion across the configured languages seems handy. At least that's my impression from briefly testing it out.

Also, get-current-mode-for-language can be implemented in terms of set-buffer-language (see my earlier email to Joao).

  ;;;###autoload
  (dolist (name (list "node" "nodejs" "gjs" "rhino"))
-  (add-to-list 'interpreter-mode-alist (cons (purecopy name) 'js-mode)))
+  (add-to-list 'interpreter-mode-alist (cons (purecopy name) :js)))

BTW, my suggested patch basically proposes to use `<LANG>-mode` instead
of `:LANG>` which saves us from those changes since that matches our
historical conventions.

<LANG>-mode is lexically indistinguishable from <NONLANG>-mode. If we used the names like <LANG>-lang, at least one could tell whether one of the parents of a given <foo>-mode is a language.

Another issue I see if we don't use something like
`derived-mode-add-parents` is that all the various places where we use
mode-indexing, such as `.dir-locals.el`, `ffap`, YASnippet, etc... will
need to be extended with a way to use "languages" as well, and then we
also need to define a sane precedence between settings that apply to
a given mode and settings that apply to a given language (setting for
`js-ts-mode` should presumably take precedence over settings for
`:js` which should take precedence over settings for `prog-mode`).

That's a good point: if "languages" as a separate notion gets added, it would make sense to use them in more places (not 100% necessary, but good for consistency). With the associated complexity that you mention.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]