Re: lynx-dev current_codepage in WIN_EX&&CJK

lynx-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for

From:	Klaus Weide
Subject:	Re: lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for Borland C ++)
Date:	Mon, 10 Jan 2000 09:43:46 -0600 (CST)

On Mon, 10 Jan 2000, Hataguchi Takeshi wrote:

> I wrote a patch for dev18. The main changes are:
>   o wrap a long text which includes only CJK characters in source mode.
>   o avoid to write CJK characters at the 80th column.
> 
> A text which includes only CJK characters was never wrapped in 
> source mode before. So I changed "goto check_IgnoreExcess;" to 
> "goto check_Tab;". But I'm not sure this is an appropriate way.

This looks basically right to me,  I had always wondered whether it
was correct that the 'goto check_IgnoreExcess' should be skipping
so much.

The rest is details...

You are now jumping to
> +check_Tab:
>      if (ch == '\t') {
but is that really what you want?  I.e., do you really want the stuff
between there and
    } /* if tab */
    else if ( (text->source || dont_wrap_pre) && text == HTMainText) {
?

If yes, note that the line can be split in "that stuff".  So you may
have to add "kanji_state-preservation" (
> +         int save_kanji_buf = text->kanji_buf;
> +         int save_state = text->state;
etc. around new_line calls) in that code, too, if you want it to handle
'\t' always correctly.

If no, then note that the 'else' in
    else if ( (text->source || dont_wrap_pre) && text == HTMainText) {
is completely unnecessary.  It can be removed without changing the flow
of control.  If that were more obvious, maybe that is where you would have
put your new goto label?


> I've only tested long texts in PRE and in source mode. It seems 
> to work fine. I think the fragments, which includes current_codepage, 
> should be removed.
> 
> On Thu, 6 Jan 2000, Klaus Weide wrote:
> > It is HText_appendCharacter's responsibility to split lines early
> > enough (if lines are to be split).  It is display_line's
> > responsibility to not actually output something into the 80th column,
> > even if the line structure should be longer than displayable.  That
> 
> I changed only HText_appendCharacter, but didn't change display_line.
> Though display_line should be changed also, I don't know how to change it.

Before LY_SOFT_NEWLINE was introduced, in SOURCE mode lines would just
grow (without splitting at LYcols-1) up to MAX_LINE.  It was completely
up to display_line to suppress characters beyond LYcols-1.  Now it isn't
that important.  It will be if we return to the older handling (which would
be a useful option IMO), and it is possible that some of Vlad's changes
also effectively do something like that (dont_wrap_pre??).

In the display_line loop, i is the current display position for the
character, although with some weird offset applied (it already points
to the next position, or something like that.  It's even possible that
it's broken).  So to do the right change to display_line, you'd probably,
test i against LYcols(+something) under
                } else if (HTCJK != NOCJK && !isascii((unsigned 
char)buffer[0])) {
and break or continue if there isn't enough space.


> --- GridText.c.org    Fri Jan  7 12:02:22 2000
> +++ GridText.c        Mon Jan 10 09:09:36 2000
> @@ -3677,7 +3677,7 @@
>               }
>           }
>       } else {
> -         goto check_IgnoreExcess;
> +         goto check_Tab;
(see comments above)
>       }
>      } else if (ch == CH_ESC) {  /* S/390 -- gil -- 1587 */
>       return;
> @@ -3882,6 +3882,7 @@
>      /*
>       *  Tabs.
>       */
> +check_Tab:
(see comments above)
>      if (ch == '\t') {
>       CONST HTTabStop * Tab;
>       int target, target_cu;  /* Where to tab to */
> @@ -3959,13 +3960,21 @@
>        */
>       int target = (int)(line->offset + line->size) - ctrl_chars_on_this_line;
>       int target_cu = target + utfxtra_on_this_line;
> -     if (target >= (LYcols-1) - style->rightIndent ||
> +     if (target >= (LYcols-1) - style->rightIndent - 
> +         ((HTCJK != NOCJK) && text->kanji_buf) ? 1 : 0 ||
>           (text->T.output_utf8 &&
>            target_cu + UTF_XLEN(ch) >= (LYcols_cu-1))
>           ) {
> +         int save_kanji_buf = text->kanji_buf;
> +         int save_state = text->state;
> +
> +         text->kanji_buf = '\0';
> +         text->state = S_text;
>           new_line(text);
>           line = text->last_line;
>           HText_appendCharacter (text, LY_SOFT_NEWLINE);
> +         text->kanji_buf = save_kanji_buf;
> +         text->state = save_state;
>       }
>      }


This whole section where you applies most of your change - between
    } /* if tab */
and
    if (ch == ' ') {
- wasn't there at all before the LY_SOFT_NEWLINE introduction.
Now apparently - it seemed so to you, at least? - it does the main branch
of line splitting logic.  It didn't use to (since it wasn't there at all.
compare some 2.7.1 source, for example.).

The unplanned (it seems it "just happened") shifting of line splitting
logic from after check_IgnoreExcess to before it may be responsible for
some of the stuff we (you) now have to fix up.

Anyway, I suggest you try to imagine ths section were not there at all..
(or actually remove it).  The logic should still be right, with exception
of LY_SOFT_NEWLINE insertion.

(But dont_wrap_pre has been spliced into the existing code in such a
strange way that it doesn't seem to make much sense.  I suggest ignore it.)


> @@ -3998,6 +4007,7 @@
>       */
>      if (((indent + (int)line->offset + (int)line->size) +
>        (int)style->rightIndent - ctrl_chars_on_this_line +
> +      (((HTCJK != NOCJK) && text->kanji_buf) ? 1 : 0) +
>        ((line->size > 0) &&
>         (int)(line->data[line->size-1] ==
>                               LY_SOFT_HYPHEN ?

Shouldn't you do kanji_state-preservation in the new_line calls that
follow after this, too?

It seems you could do this by putting the equivalent of
> +         int save_kanji_buf = text->kanji_buf;
> +         int save_state = text->state;
> +
> +         text->kanji_buf = '\0';
> +         text->state = S_text;
  .....
> +         text->kanji_buf = save_kanji_buf;
> +         text->state = save_state;

*into* split_line, so you don't have to surround all new_line
occurrences with it.


   Klaus

[Prev in Thread]

Current Thread

[Next in Thread]

lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for Borland C ++), Hataguchi Takeshi, 2000/01/09
- Re: lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for Borland C ++), Klaus Weide <=
- Re: lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for Borland C ++), Hataguchi Takeshi, 2000/01/11

Prev by Date: Re: lynx-dev slang 1.4 officially released (fwd)
Next by Date: Re: lynx-dev Converting HTML to Text with Lynx
Previous by thread: lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for Borland C ++)
Next by thread: Re: lynx-dev current_codepage in WIN_EX&&CJK_EX (was: Lynx .IDE file for Borland C ++)
Index(es):
- Date
- Thread