bug#56393: Actually fix the long lines display bug

bug-gnu-emacs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#56393: Actually fix the long lines display bug

From:	Eli Zaretskii
Subject:	bug#56393: Actually fix the long lines display bug
Date:	Mon, 18 Jul 2022 15:20:20 +0300
> Date: Mon, 18 Jul 2022 10:19:28 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: larsi@gnus.org, 56393@debbugs.gnu.org
> 
> >> I can't reproduce either of these two bugs with a regular build, both 
> >> of these recipes produce the expected result.  Are these segfaults due 
> >> to --enable-checking?
> >
> > No, I don't think so.  --enable-checking produces SIGABRT, not SIGSEGV.
> 
> Yes, I know, but it's possible that some checking causes a segfault.

Not in this case.  I guess your OS/architecture don't (always)
segfault upon dereferencing a pointer that points one byte before
an allocated chunk of memory -- whether this is or isn't detected
depends on details of memory allocation and run-time diagnostics.

> > The segfault happens because it->bidi_it.bytepos is 1, and FETCH_BYTE 
> > cannot be called with a zero argument.  Here's a relevant portion of a 
> > debugging session:
> >
> 
> Thanks.  With that indication, and after adding assertions in the code, I 
> could make Emacs abort when that problem occurred.  I pushed a fix a few 
> minutes ago.

Thanks.  The crash is gone, of course, but I'm not sure this is the
correct fix.  See below.

> But I'm still unable to reproduce your second recipe.  I tried again and 
> again, both with a regular and a checking-enable build, with and without 
> the fix for the first recipe, and with various frame sizes... but without 
> success.  If you can still reproduce it with the fix, could you perhaps 
> send me a backtrace?

I can send you a backtrace from yesterday:

  Thread 1 received signal SIGSEGV, Segmentation fault.
  0x01055028 in get_visually_first_element (it=0x82af68) at xdisp.c:8691
  8691                   && (FETCH_BYTE (it->bidi_it.bytepos - 1) == '\n'
  (gdb) bt
  #0  0x01055028 in get_visually_first_element (it=0x82af68) at xdisp.c:8691
  #1  0x010563d4 in next_element_from_buffer (it=0x82af68) at xdisp.c:9149
  #2  0x01052bd3 in get_next_display_element (it=0x82af68) at xdisp.c:7786
  #3  0x0108a318 in display_line (it=0x82af68, cursor_vpos=0) at xdisp.c:24377
  #4  0x0107c178 in try_window (window=XIL(0xa000000007a89d98), pos=...,
      flags=1) at xdisp.c:20268
  #5  0x01078db7 in redisplay_window (window=XIL(0xa000000007a89d98),
      just_this_one_p=false) at xdisp.c:19675
  #6  0x010701d7 in redisplay_window_0 (window=XIL(0xa000000007a89d98))
      at xdisp.c:17236
  #7  0x0127074c in internal_condition_case_1 (
      bfun=0x107017f <redisplay_window_0>, arg=XIL(0xa000000007a89d98),
      handlers=XIL(0xc00000000612fdac), hfun=0x106fe41 <redisplay_window_error>)
      at eval.c:1509
  #8  0x0106fe03 in redisplay_windows (window=XIL(0xa000000007a89d98))
      at xdisp.c:17206
  #9  0x0106fd99 in redisplay_windows (window=XIL(0xa000000007a89bb8))
      at xdisp.c:17200
  #10 0x0106e5ca in redisplay_internal () at xdisp.c:16656
  #11 0x0106bd8c in redisplay () at xdisp.c:15860
  #12 0x01173882 in read_char (commandflag=1, map=XIL(0xc000000007b8e0d0),
      prev_event=XIL(0), used_mouse_menu=0x82f43f, end_time=0x0)
      at keyboard.c:2595
  #13 0x0118d771 in read_key_sequence (keybuf=0x82f728, prompt=XIL(0),
      dont_downcase_last=false, can_return_switch_frame=true,
      fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9947
  #14 0x0116f1eb in command_loop_1 () at keyboard.c:1391
  #15 0x01270662 in internal_condition_case (bfun=0x116eab0 <command_loop_1>,
      handlers=XIL(0x90), hfun=0x116da7e <cmd_error>) at eval.c:1485
  #16 0x0116e51d in command_loop_2 (handlers=XIL(0x90)) at keyboard.c:1132
  #17 0x0126f4e9 in internal_catch (tag=XIL(0xfe70),
      func=0x116e4e6 <command_loop_2>, arg=XIL(0x90)) at eval.c:1208
  #18 0x0116e488 in command_loop () at keyboard.c:1110
  #19 0x0116d4de in recursive_edit_1 () at keyboard.c:719
  #20 0x0116d77c in Frecursive_edit () at keyboard.c:802
  #21 0x011686e6 in main (argc=2, argv=0xa42a10) at emacs.c:2517

  Lisp Backtrace:
  "redisplay_internal (C function)" (0x0)
  (gdb) fr 4
  #4  0x0107c178 in try_window (window=XIL(0xa000000007a89d98), pos=...,
      flags=1) at xdisp.c:20268
  20268         if (display_line (&it, cursor_vpos))
  (gdb) p pos
  $1 = {
    charpos = 1,
    bytepos = 1
  }
  (gdb) fr 5
  #5  0x01078db7 in redisplay_window (window=XIL(0xa000000007a89d98),
      just_this_one_p=false) at xdisp.c:19675
  19675             if (try_window (window, startp, TRY_WINDOW_CHECK_MARGINS) < 
0)

  (gdb) p startp
  $2 = {
    charpos = 1,
    bytepos = 1
  }
  (gdb) p w->start
  $3 = XIL(0xa000000007a89f78)
  (gdb) xtype
  Lisp_Vectorlike
  PVEC_MARKER
  (gdb) xmarker
  $4 = (struct Lisp_Marker *) 0x7a89f78
  (gdb) p *$
  $5 = {
    header = {
      size = 1124081664
    },
    buffer = 0x7665a48,
    need_adjustment = 0,
    insertion_type = 0,
    next = 0x7a89fa8,
    charpos = 1,
    bytepos = 1
  }
  (gdb) p w->pointm
  $8 = XIL(0xa000000007a89f90)
  (gdb) xtype
  Lisp_Vectorlike
  PVEC_MARKER
  (gdb) xmarker
  $9 = (struct Lisp_Marker *) 0x7a89f90
  (gdb) p *$
  $10 = {
    header = {
      size = 1124081664
    },
    buffer = 0x7665a48,
    need_adjustment = 0,
    insertion_type = 0,
    next = 0x7a9c530,
    charpos = 15000,
    bytepos = 15000
  }

As you see, the cause of problem was the same one: try_window was
called to attempt displaying a window whose window-start position is
1, which is outside of the "narrowing".  This is because, when we
split the window with "C-x 2", the new window is given window-start of
its buffer's BEGV, in this case 1 -- because split-window runs outside
of redisplay, and so doesn't know about the "narrowing".  Then
redisplay is invoked, and it attempts to reuse the existing
window-start point.

Since you have now moved the iteration to begin at the "narrowed"
BEGV, this crash is also gone.

To tell the truth, I'm not sure this kind of fix is the correct
solution, because basically its success is a matter of luck (or lack
thereof).  For example, recall the higher-level context of the first
segfault:

  Thread 1 received signal SIGSEGV, Segmentation fault.
  0x01055028 in get_visually_first_element (it=0x82d250) at xdisp.c:8691
  8691                   && (FETCH_BYTE (it->bidi_it.bytepos - 1) == '\n'
  (gdb) bt
  #0  0x01055028 in get_visually_first_element (it=0x82d250) at xdisp.c:8691
  #1  0x010563d4 in next_element_from_buffer (it=0x82d250) at xdisp.c:9149
  #2  0x01052bd3 in get_next_display_element (it=0x82d250) at xdisp.c:7786
  #3  0x01057166 in move_it_in_display_line_to (it=0x82d250, to_charpos=306655,
      to_x=0, op=(MOVE_TO_X | MOVE_TO_POS)) at xdisp.c:9574
  #4  0x01059ce7 in move_it_to (it=0x82d250, to_charpos=306655, to_x=-1,
      to_y=543, to_vpos=-1, op=10) at xdisp.c:10245
  #5  0x01038364 in pos_visible_p (w=0x75fd640, charpos=306655, x=0x82e23c,
      y=0x82e238, rtop=0x82e24c, rbot=0x82e248, rowh=0x82e244, vpos=0x82e240)
      at xdisp.c:1730
  #6  0x010ca308 in Fpos_visible_in_window_p (pos=XIL(0), window=XIL(0),

  Lisp Backtrace:
  "pos-visible-in-window-p" (0x68b0248)
  "pos-visible-in-window-group-p" (0x68b01f0)
  "isearch-update" (0x68b01a0)
  "isearch-search-and-update" (0x68b0178)
  "isearch-process-search-string" (0x68b0138)
  "isearch-process-search-char" (0x68b00d8)
  "isearch-quote-char" (0x82ebd0)
  "funcall-interactively" (0x82ebc8)
  "call-interactively" (0x68b0078)
  "command-execute" (0x82f5a8)

This is isearch.el trying to establish whether the position of the
match is visible in the window.  To do that, it starts from the
current window-start position (which happens to be 1), and simulates
display of the buffer text portion up until point or until the end of
window, whichever comes first, to determine whether point is beyond
the end of the window.  Your change makes the search start from some
much later position instead, which could very well produce incorrect
results: pos-visible-in-window-p could decide that point _is_ visible,
when it really isn't.  (It doesn't happen in this particular case
because the newline is far away -- at the very end of the buffer -- so
it isn't visible in both cases.  But that's sheer luck.)

More generally, if you look at redisplay_window, you will see that
about 2/3 of its code tries very hard to reuse the previous
window-start position, before it gives up and looks for a new starting
position.  So in any situation where the previous window-start is far
enough before point, all that code will basically work with a buffer
position that is at risk of being before the "narrowed" BEGV.  Thus,
any code there which tries stuff like start_display+move_it_to will
risk hitting this kind of problems -- either FETCH_BYTE will crash, or
we risk producing the wrong result because we force the code to jump
to the "narrowed" BEGV before doing anything, while its caller expects
results relative to a different position.

I think this is because the display engine assumes that BEGV stays put
during the entire redisplay_window lifetime, i.e. that all of the
subroutines it calls see the same value of BEGV.  This is no longer so
on the branch, and I wonder whether and how we should handle this new
situation to keep the display code stable and reliable.
[Prev in Thread]
Current Thread
[Next in Thread]
bug#56393: Actually fix the long lines display bug, (continued)
Prev by Date: bug#56393: Soon also looking at your branch
Next by Date: bug#56527: 28.1; Crash on macOS with magit
Previous by thread: bug#56393: Actually fix the long lines display bug
Next by thread: bug#56393: Actually fix the long lines display bug
Index(es):
- Date
- Thread