Eli Zaretskii <
address@hidden> schrieb am So., 22. Nov. 2015 um 20:51 Uhr:
> > No matter what we expect or tolerate, we need to state that.
>
> No, we don't. When the callers violate the contract, they cannot
> expect to know in detail what will happen. If they want to know, they
> will have to read the source.
>
> So you want this to be unspecified or undefined behavior? That might be OK (we
> already have that in several places), but we still need to state what the
> contract is.
You can call it "undefined behavior" if you want. Personally, I don't
think that's accurate: "undefined" means anything can happen, whereas
Emacs at least promises to output the original bytes unchanged, as
long as the text modifications didn't touch them.
"Unspecified" would fit the bill better. Actually for most interesting inputs (UTF-8 strings) the behavior is well-defined anyway.
> > An Emacs string is a sequence of integers.
>
> No, it's a sequence of bytes.
>
> From
> https://www.gnu.org/software/emacs/manual/html_node/elisp/String-Basics.html:
> "In Emacs Lisp, characters are simply integers ... A string is a fixed sequence
> of characters"
That's the _User_ manual, it simplifies things to avoid too much
complexity.
So where's the programmer's manual then? The source code? ;-)
> How a string is represented internally shouldn't be the concern of module
> authors.
Indeed. But it does concern us, the developers of Emacs internals.
> No, I will definitely fix it.
Thank you.
Attached a patch that uses make_multibyte_string directly.