lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev megapatch to dev.10 available


From: Klaus Weide
Subject: lynx-dev megapatch to dev.10 available
Date: Wed, 13 Oct 1999 12:31:11 -0500 (CDT)

It's in <http://enteract.com/~kweide/lynx/>.

I just saw there is dev.10 now.  Tom, can you merge?
After getting everything (I hope) up to dev.10, I don't feel like
going through everything again...

  Klaus

* Changes to UCAuto.c (already sent? - this is updated to dev.10)
* HTFTP.c: (already sent w/o description - this is updated to dev.10)
  - interrupted_in_next_data_char was not being reset.  That could
    make all subsequent FTP directory listings fail (by showing
    an empty directory) after receipt of one directory listing had
    been interrupted.
  - Be nice, send quit before closing at least in the normal (non-
    interrupted and successful) case.  Some servers (wu-ftpd at
    least) otherwise complain with "You could at least say goodbye"
    which in turn causes unnecessary RST packets.  To minimize
    round-trip delays, the QUIT is sent before we start reading
    the returned data (but after the initial response to our
    retrieval command).
  - Always close data connection immediately after we are done reading
    from it, also for directory requests.  This was already the case
    for file requests.  Some servers (including recent wu-ftpd beta)
    wait for indication that we closed before proceding.
  - Keep better track of closed sockets.  Some more trace messages.
    Some comments corrected.

* Tabular representation for simple tables.  See included file
  README.TRST.

[ It works quite nicely for me.  I don't understand exactly how...
Internal code is still a mess.  Interface to rest-of-lynx is ok though.
Mess is happily contained in separate file.
There is no flag or option to turn it off (everyone wants this, right? :),
and it does work nicely enough that i didn't find YA option necessary).
It would be easy to disable though, all it takes is removing the one
call of HText_startStblTABLE() in HTML.c.
For systems not using ./configure and makefile.in, you have to manually
add "TRSTable.o" (or whatever the equivalent name is for you) to your
system's equivalent of a makefile.  Otherwise lynx won't compile. ]

* Made User-Agent warning more friendly, and more specific.  Tell
  the user what lynx expects in order to avoid the warning.  On
  the other hand, issue an equivalent warning when -useragent is
  used.  Change documentation accordingly.

[ The UA_COPYRIGHT_WARNING text was IMNSHO nonsense and has bugged me
for a long time.  It's not Lynx's job to convey some company's bogus
claims to users.  OTOH there are good reasons to encourage users to
identify lynx correctly.  Just don't give them bogus reasons.  For
consistency there needs ot be a warning for -useragent if there is one
for an 'O'ptions Screen change. ]

* Don't send User-Agent header at all if it somehow would be blank.
* Indicate on forms 'O'ptions Screen which options are not saved
  to .lynxrc.
* Disable the form fields in the 'O'ptions Screen if the screen is
  generated when FORMS_OPTIONS code is compiled in but not actually
  active.

[ The following invocation causes that situation:
    lynx -forms_options=off LYNXOPTIONS:
It might actually be useful for initial reviewing of settings. ]

* LYPrint.c: In subject_translate8bit (see OUTGOING_MAIL_CHARSET
  option), use higher level function to charset-translate mail Subject
  line, rather than low-level UCTransCharStr.

[ It actually works better now, including for UTF-8.
OUTGOING_MAIL_CHARSET is awfully misnamed for what it does. ]

* UPPER8, UCForce8bitTOUPPER: was severely broken for UTF-8 display,
  making WHEREIS search for strings containing non-ASCII characters
  impossible (and probably with other bad effects).  Now case mapping
  may still be wrong, but at least identical strings compare as equal.
* LYHistory.c: Entification for saved statusline messages happened
  twice by mistake.
* HTFWriter.c: Made code for automatic decompression of bzip2 files
  conditional on BZIP2_PATH.  Such files should be treated as normal
  binary files on systems without bzip2.  The configure seems to
  always define BZIP2_PATH, but it could be undefined manually.
* HTFWriter.c: Use LYRemoveTemp instead of remove in some cases,
  to avoid keeping those files in the temp file list after they are
  long gone.
* HTTCP.c: Check whether port numbers in URLs are really numbers.
* HTPlain.c:
  - Deal with backspace formatting as used in formatted man pages.
    (No highlighting, only avoid double output of characters)
  - Pass on 0xAD (soft hyphen) character in more cases.
* HTNews.c: Prevent some ways that could trick lynx into treating
  a regular "news:"; or "nntp:"; URL as something else, like snewspost.
  Extra check in LYNews.c whether posing is allowed.  Return with
  an error message in some cases of URLs that are too long, instead
  of silently truncating.  Make HEAD work again on news articles.
  Some memory leaks in error path removed.  A message tweak.
* HTFormat.c: HTStreamStack: avoid some unnecessary work, add a trace
  message to show what is returned.
* SGML.c: some cleanup of ugly ifdefs, and of unnecessary abuse of
  global variables.  (still a lot left!)
* More consistent and correct recognition of element names.  The
  characters "_-.:" don't end the name.
* Handle INCLUDE and CDDATA marked sections: output the contents.
* SGML.c etc.:
* Parse various elements differently that had/have special requirements
  or hacks.  Extend meaning of Tgf_strict for litteral-like content
  modes.  Use SGML_CDATA in some cases (and treat it similar to
  SGML_LITTERAL), use SGML_PCDATA for litteral-like parsing (but if
  modified by Tgf_strict it's more like regular SGML_MIXED).  A '<'
  that would start a tag gets displayed (since not element content is
  allowed that's just error recovery).  Comments now work in TEXTAREA
  instead of getting displayed as text (SortaSGML mode only).

[ Try comments within TEXTAREA, try also various invalid constructs
(including anchors) within TITLE, with or without SortaSGML, old
vs. new code. ]

* Minor tweak of sorta SGML handling for invalid end tag if start tag
  could be validly omitted.
* More consistent and correct recognition of element names.  The
  characters "_-.:" don't end tag names.

[ This alone should prevent <http://www.quantum.com/> showing as empty.
Changes for OBJECT handling (see below) should prevent the same effect
for most pages that really _do_ have unclosed OBJECT tags, not just
something that is misinterpreted by lynx as that. ]

* Improved handling of '/' after element name in a tag:
  "<foo/>" is treated as an empty element (as in XML).  If we recognize
  "foo" as an empty element, do nothing special; and if we recognize "foo"
  as a non-empty element; convert to "<foo></foo>".
  "<foo/bar/" is treated as a shortref construct, by converting to
  "<foo>bar</foo>" (for non-empty and recognized "foo").
  This is not general as it would have to be for or real SGML parser,
  in particular '/' is only treated this way if it directly follows the
  element name, and it may not even be quite right.  It is better than
  the recovery lynx previously did in these cases though.

[ Try <http://www.nyct.net/~aray/junk/hide.html> as a test, you should
see "Hello World!".  Also look at prettysrc rendering. ]

* Changed handling of include buffer which is used to pass back data
  from HTML.c to SGML.c.  Passing data upstream now works without strange
  reordering effects even when SGML_character was already parsing data from
  a previous include buffer.
  Character set translation would happen several times on data passed back
  to SGML_character in the include buffer for re-parsing.  This is now
  avoided.  Well at least in most cases, and for characters that *can* be
  translated, there are likely combinations of input and output character
  sets where the assumptions made are still wrong.
* The start_element and end_element methods of structured stream class now
  return a status code.  Currently only used for the OBJECT stuff below.
* mostly HTML.c, SGML.c: Changed handling of OBJECT and MAP.
  - Avoid using the include buffer mechanism as much as possible.
    This involves introducing some new special handling in SGML.c
    to change parsing mode for element contents, and a way for
    HTML_{start,end}_element to signal to SGML_character what it
    should do.  In most cases when the OBJECT element content should
    be parsed and displayed, SGML_character now only needs one pass
    through the data.
  - Don't lose content when several OBJECTs are nested.
  - In HTML 4, an OBJECT with USEMAP attribute can refer to a MAP
    within the OBJECT's content, possibly within nested inner OBJECTs.
    Lynx would fail to find the MAP in that case, now it doesn't.
  - In HTML 4, MAP can contain arbitray block elements in addition to
    AREA.  Lynx now shows such block content, even if it occurs
    within (possibly nested) OBJECTs with USEMAP whose contents we
    would otherwise skip.  Sometimes we may show too much now, by
    generating a LYNXIMGMAP link as well as showing block content
    or by showing more of the OBJECT content than what is within a
    MAP, but that is preferable to losing data.
  - Treat an A tag with COORDS attribute as equivalent to an AREA
    when it is within MAP, for the purpose of collecting links for
    LYNXIMGMAP.
  - As a fallback, internally redirect a LYNXIMGMAP request to
    the position of the MAP element in the normally rendered text
    of the document containing the MAP, if it is known that the MAP
    element exists and just doesn't contain any AREA (or equivalent
    A-with-COORDS) links.  It is assumed that in such a case there
    is some block content within the MAP that is rendered normally.

[ Try the example fragments in the HTML 4.0 (or 4.01) text from W3C,
especially those with OBJECT and USEMAP.  Also adding some more
content within OBJECT (maybe before, within, or after a nested
OBJECT and see what happens. ]

* HTFile.c: new function LYGetFileInfo.
* HTAnchor.c: new function HTAnchor_findSimpleAddress.
* New function HTStartAnchor5.

* Modified the way link text is (re-)drawn by function highlight.
  The bulk of processing now happens in new function LYMoveToLink.
  The data of the containing line is now scanned from the beginning,
  using the same logic as in display_line to make sure that lynx
  and the display library have the same idea of where in the line
  the link starts.  In UTF-8 output mode, parts of the line preceding
  the link are also repainted if this is necessary.  Refreshing of
  the physical line is forced if necessary in UTF-8 mode.  For anchors
  split across lines, the new approach is currently only used for the
  first line.
  This change is not in effect for lynx with color style.  In that
  that case highlighting already is sometimes done in a similar
  similar, but not quite the same, separate function.
* Modified WHEREIS target hightlighting for hypertext links.
  Now this is done in the same pass as drawing the normal link
  text, in LYMoveToLink.  This avoids problems in UTF-8 display
  mode.  It also avoids a lot of complicated and extremely hard
  to understand older code in highlight(), but that code is still
  there for use by lynx with color style and for other remaining
  cases (non-hypertext anchors, second line highlighting).
* Modified WHEREIS target hightlighting for general text.
  Instead of first writing each line's characters in display_line,
  then scanning again through the line's data for portions to
  highlight and repainting those parts after in display_page,
  this is now done in one pass within display_line.  However,
  this isn't (yet?) done for lynx with color style which still
  uses the old code.
* These last three changes reduce problems that occur when using
  UTF-8 display character set (in an appropriate terminal environment
  that understands it, of course).  Most of them don't apply with
  color style lynx, so it continues to have more UTF-8 problems.
  Pages with mostly ASCII characters should be more or less ok.
  Problems that otherwise are not visible become apparent in
  search higlighting, and after ^Z / fg.

[ As one example, visit <http://www.w3.org/>.  (In a correctly set-up
UTF-8 environment with display charset set accordingly - otherwise
all this doesn't apply).  Enter '/' (WHEREIS), enter search text "W3C".
Go down to "W3C Services", the first line after "CSS2 Package:" should
have a middle dot and a highlighted "W3C".  Do ^Z and fg, see whether
the highlighted string is still in the right place. ]

* GridText.c: More changes to deal with problems caused by using
  UTF-8 output with a display library that isn't aware of it.
  Break line with UTF-8 before curses does it.  This causes lines
  that are too short, effectively the rightmost part of a line cannot
  be used if there are UTF-8 encoded characters.  The alternative,
  letting curses wrap the line when it thinks it got too long, is
  worse, so do it in lynx code instead.
* Avoid memory overrun for very long lines in UTF-8 mode.
  Avoid splitting line in the middle of a mutibyte UTF-8 character.
* Test for SHOW_WHEREIS_TARGETS instead of 'defined(FANCY_CURSES) ||
  defined(USE_SLANG)'.
* Initialize new textarea lines created by insert_new_textarea_anchor
  with current display character set for value_cs.  (The "cloned"
  value can be stale in some cases if the user changed the display
  character set after the document was last loaded - normally that
  should not happen).  For a file inserted into a textarea with
  INSERTFILE use new function LYGetFileInfo instead to determine
  the file content's charset.  Thus -assume_local_charset and
  conventions based on file suffix should be honored.

[ This is untested. ]

* For Unix, added more specific error message if calling external
  editor for textarea failed, based on the status returned by
  system().

[ The interpretation of system()'s return code may be not quite
right or portable? ]

* It is possible to require an additional prompt before Enter in
  an input field causes form submission: define TEXT_SUBMIT_CONFIRM_WANTED,
  explained in userdefs.h.

[ Al Gilman brought this up some time ago, to avoid unexpexted action
for the benefit of blind (and other?) users.  I doubt that lynx's
behavior in this respect actually caused a problem in this respect,
and no blind users have spoken up, so it is only a compile-time option
for now. ]

* Some small changes to prevent overstepping string boundary
  (HTParse.c,)
* Extended SUFFIX option, added SUFFIX_ORDER option, see documentation
  in lynx.cfg.  The long list of built-in file suffix rules in HTInit.c
  can now be disabled, either at compile time - see userdefs.h - or at
  run time.  The equivalent functionality is now available in lynx.cfg
  for those who want it. Added somments, see HTFileInit in HTInit.c.

[ A lot of those built-in rules were useless and/or outdated for most people,
and not being maintained.  They are also start-up overhead that could
not be disabled. ]

* Various tweaks of built-in file suffix rules.

[ But if you _do_ use them, let me know if I broke something... ]

* Allow XLOADIMAGE_COMMAND to be empty (in lynx.cfg) or NULL (in userdefs.h),
  just don't use a default X viewer for image types in that case.
* Moved UCGetUniFromUtf8String from LYCharUtils.c to UCAux.c.
* Renamed LYUCFullyTranslateString -> LYUCTranslateHTMLString, and
  LYUCFullyTranslateString_1 -> LYUCFullyTranslateString.
* Tweaks for special chars in (what is now) LYUCFullyTranslateString,
  in obscure cases (input fields of type password prefilled with
  unusual content) lynx would pass text back to the server with special
  characters (soft hyphen or non-break space) expressed as lynx-internal
  code values.
* Added some replacement characters or strings to various chartrans
  tables.
* Experimental command line option -convert_to, only compiled in if
  new MISC_EXP symbol is defined.  This takes a string in the form
  of a MIME type, which can also be combined with an appended ";charset="
  parameter.  (This needs shell quoting of course).  The charset
  value can be used to set the display character set from the command
  line.  The MIME type can be one of the non-official types used
  internally, for some interesting effects (crshing lynx not excluded).
  Try www/download, www/source, www/dump, or some unrecognized string.
* Fixed HTMainText_Get_UCLYhndl, it was returning the wrong kind of
  charset handle (a "UChndl", which is different from a "LYhndl" or
  "UCLYhndl" etc. and shouldn't be directly accessed by arbitrary
  bits of lynx code - it should be regarded as private to the chartrans
  mechanism).
* Protect various printf-like calls against crashes from strings with
  '%': LYSyslog, exit_immediately_with_error_message.
* LYDownload.c: made parsing of LYNXDOWNLOAD: URL slightly more robust.
* Disabled some broken pieces.

[ And some minor tweak not mentioned specifically, the list is already long
enough...  the end ]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]