[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Display of characters #xa0 and #xad in unibyte buffers
From: |
Kenichi Handa |
Subject: |
Re: Display of characters #xa0 and #xad in unibyte buffers |
Date: |
Mon, 28 Sep 2009 10:10:32 +0900 |
In article <address@hidden>, Eli Zaretskii <address@hidden> writes:
> > >> $ emacs -Q
> > >> M-x toggle-enable-multibyte-characters RET C-q 240 RET C-q 255 RET
> > >>
> > >> The characters are displayed as "_-" (approximately).
> > >>
> > >> Shouldn't they be displayed as "\240\255", considering that these are
> > >> raw bytes with no specific meaning?
> >
> > > There are no ``raw bytes'' in a unibyte buffer. Every byte there is
> > > interpreted as a character, and shown as such. This is the main
> > > feature of unibyte buffers; otherwise, who'd want them?
I think the main feature of unibyte buffers is to handle
raw-bytes as is. For those who want to see a raw-byte as a
character of their locale (language environment), we have
unibyte-display-via-language-environment.
> > Different question then: Why are all other characters in the range from
> > #x80 to #xff shown in the backslash-escaped notation, #xa0 and #xad
> > being the only exceptions?
> I don't know, but it sounds like a bug. Or maybe what I wrote above
> is just my pipe dream, not the reality.
> Handa-san, can you please comment on this?
The code for handling nobreak-char-display in
get_next_display_element should pay attention to
unibyte-display-via-language-environment. I've just
installed the attached change.
In article <address@hidden>, Stefan Monnier <address@hidden> writes:
> The patch below should help.
[...]
> --- xdisp.c.~1.1301.~ 2009-09-20 13:01:24.000000000 -0400
> +++ xdisp.c 2009-09-25 10:02:08.000000000 -0400
> @@ -5794,7 +5794,8 @@
> /* Handle non-break space in the mode where it only gets
> highlighting. */
> - if (EQ (Vnobreak_char_display, Qt)
> + if ((it->multibyte_p || unibyte_display_via_language_environment)
> + && EQ (Vnobreak_char_display, Qt)
> && it->c == 0xA0)
If unibyte_display_via_language_environment is nonzero, we
must compare DECODE_CHAR (unibyte, it->c) against 0xA0.
Otherwise, for instance in KOI8 locale, we wrongly display
some box-drawing character in KOI8 charset.
---
Kenichi Handa
address@hidden
Index: xdisp.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v
retrieving revision 1.1304
diff -u -r1.1304 xdisp.c
--- xdisp.c 27 Sep 2009 19:11:13 -0000 1.1304
+++ xdisp.c 28 Sep 2009 01:03:40 -0000
@@ -5684,6 +5684,10 @@
{
Lisp_Object dv;
struct charset *unibyte = CHARSET_FROM_ID (charset_unibyte);
+ int nbsp_or_shy = 0; /* 1:NO-BREAK SPACE, 2:SOFT HYPHEN, 0:ELSE */
+#define IS_NBSP (nbsp_or_shy == 1)
+#define IS_SHY (nbsp_or_shy == 2)
+ int decoded = it->c;
if (it->dp
&& (dv = DISP_CHAR_VECTOR (it->dp, it->c),
@@ -5712,6 +5716,18 @@
goto get_next;
}
+ if (unibyte_display_via_language_environment
+ && it->c >= 0x80)
+ decoded = DECODE_CHAR (unibyte, it->c);
+
+ if (it->c >= 0x80 && ! NILP (Vnobreak_char_display))
+ {
+ if (it->multibyte_p)
+ nbsp_or_shy = it->c == 0xA0 ? 1 : it->c == 0xAD ? 2 : 0;
+ else if (unibyte_display_via_language_environment)
+ nbsp_or_shy = decoded == 0xA0 ? 1 : decoded == 0xAD ? 2 : 0;
+ }
+
/* Translate control characters into `\003' or `^C' form.
Control characters coming from a display table entry are
currently not translated because we use IT->dpvec to hold
@@ -5724,21 +5740,19 @@
If it->multibyte_p is zero, eight-bit characters that
don't have corresponding multibyte char code are also
translated to octal form. */
- else if ((it->c < ' '
- ? (it->area != TEXT_AREA
- /* In mode line, treat \n, \t like other crl chars. */
- || (it->c != '\t'
- && it->glyph_row
- && (it->glyph_row->mode_line_p ||
it->avoid_cursor_p))
- || (it->c != '\n' && it->c != '\t'))
- : (it->multibyte_p
- ? (!CHAR_PRINTABLE_P (it->c)
- || (!NILP (Vnobreak_char_display)
- && (it->c == 0xA0 /* NO-BREAK SPACE */
- || it->c == 0xAD /* SOFT HYPHEN */)))
- : (it->c >= 127
- && (! unibyte_display_via_language_environment
- || (DECODE_CHAR (unibyte, it->c) <= 0xA0))))))
+ if ((it->c < ' '
+ ? (it->area != TEXT_AREA
+ /* In mode line, treat \n, \t like other crl chars. */
+ || (it->c != '\t'
+ && it->glyph_row
+ && (it->glyph_row->mode_line_p || it->avoid_cursor_p))
+ || (it->c != '\n' && it->c != '\t'))
+ : (nbsp_or_shy
+ || (it->multibyte_p
+ ? ! CHAR_PRINTABLE_P (it->c)
+ : (! unibyte_display_via_language_environment
+ ? it->c >= 0x80
+ : (decoded >= 0x80 && decoded < 0xA0))))))
{
/* IT->c is a control character which must be displayed
either as '\003' or as `^C' where the '\\' and '^'
@@ -5794,7 +5808,7 @@
highlighting. */
if (EQ (Vnobreak_char_display, Qt)
- && it->c == 0xA0)
+ && IS_NBSP)
{
/* Merge the no-break-space face into the current face. */
face_id = merge_faces (it->f, Qnobreak_space, 0,
@@ -5844,7 +5858,7 @@
highlighting. */
if (EQ (Vnobreak_char_display, Qt)
- && it->c == 0xAD)
+ && IS_SHY)
{
it->c = '-';
XSETINT (it->ctl_chars[0], '-');
@@ -5855,10 +5869,10 @@
/* Handle non-break space and soft hyphen
with the escape glyph. */
- if (it->c == 0xA0 || it->c == 0xAD)
+ if (nbsp_or_shy)
{
XSETINT (it->ctl_chars[0], escape_glyph);
- it->c = (it->c == 0xA0 ? ' ' : '-');
+ it->c = (IS_NBSP ? ' ' : '-');
XSETINT (it->ctl_chars[1], it->c);
ctl_len = 2;
goto display_control;
- Display of characters #xa0 and #xad in unibyte buffers, Ulrich Mueller, 2009/09/24
- Re: Display of characters #xa0 and #xad in unibyte buffers, Eli Zaretskii, 2009/09/25
- Re: Display of characters #xa0 and #xad in unibyte buffers, Ulrich Mueller, 2009/09/25
- Re: Display of characters #xa0 and #xad in unibyte buffers, Eli Zaretskii, 2009/09/25
- Re: Display of characters #xa0 and #xad in unibyte buffers,
Kenichi Handa <=
- Re: Display of characters #xa0 and #xad in unibyte buffers, Eli Zaretskii, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Kenichi Handa, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Eli Zaretskii, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Stefan Monnier, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Kenichi Handa, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Stefan Monnier, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Kenichi Handa, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Stefan Monnier, 2009/09/28
- Re: Display of characters #xa0 and #xad in unibyte buffers, Kenichi Handa, 2009/09/29
Re: Display of characters #xa0 and #xad in unibyte buffers, Stephen J. Turnbull, 2009/09/25