lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev tech. question: translating strings to different charsets


From: Klaus Weide
Subject: Re: lynx-dev tech. question: translating strings to different charsets
Date: Mon, 6 Sep 1999 07:14:08 -0500 (CDT)

On Sun, 5 Sep 1999, Vlad Harchev wrote:
> On Fri, 3 Sep 1999, Klaus Weide wrote:
> 
>  As for translation, here are my thoughts:
> * to avoid performance decrease due to LYUCFullyTranslateString_1, the
>   following thing can be used:
>     the translation of each character used in hydict chset (aka "human
>     letter")  to d.c.s. can be precalculated (since translation of even
>     unicode "characters" is zero-state machine) - so seems flexibility is

What does that remark in parenthesis mean?  I don't understand it
at all.

>     regained - user will have to specify either in hydict (as comment) or in
>     lynx.cfg the chset used in hydict to make such translation.

It seems you are talking about translating the patterns at runtime (at
program start and/or each time the display character set (or something
else?) is changed?).  That will only be general enough if the the pattern
input before translation is in a form that is general enough for the
languages to be covered, which in general means you have to provide
them as UCS (in whatever encoding, e.g. UTF-8) for some languages
(or possibly combinations of languages, if that's supposed to be
covered too).

>                                                                  As for
>     Unicode, IMO even at the present state (without modification) libhnj is
>     suitable for this - simply there will be extra (that can be avoided with
>     cleverer approach - of using 'int' instead of 'char') states used by UTF
>     prefixes.

Again I don't understand.  Are you talking about a specific encoding,
UTF-8, when you write "Unicode"?  I don't know what kind of "states"
you mean.

> * IMO we can turn lynx is a powerfull charset translator with a very cheap
>     hack ( I mean adding something like 'lynx -recode utf-8 koi8-r < in >out')
>     IMO this worth this.

Lynx already is a "powerfull charset translator" that one could use
in place of packages like "recode" etc., although one should expect
those specific packages to be better (more correct / more general /
more flexible / more efficient) at the job they were written for.
Lynx just doesn't have a convenient syntax to invoke it as a filter
for this (maybe to encourage to use "the right tool for the right
job").

But try the appended script.  It will only work right if there is
no ~/.lynxrc.  (It would probably better to temporarily mess with
~/.lynxrc instead of messing with lynx.cfg, and just using -cfg=/dev/null
for speed.)  Yes it requires bash, won't work with any Bourne-like shell.

 Klaus

--------------- lynx-recode.sh -----------------------------------
#! /bin/bash
if [ $# -ne 3 -a $# -ne 2 ]; then
   echo "Usage: $0 cs_in cs_out [file]" >&2
   exit 1
fi
LYNX="${LYNX:-lynx}"
LYNX_CFG="${LYNX_CFG:-/usr/local/lib/lynx.cfg}"
file="${3:-/dev/stdin}"
if [ $# = 3 -a "$file" != "/dev/stdin" ]; then
   cat "$file" | $0 $1 $2
else
   $LYNX -assume_charset="$1" -assume_local_charset="$1" \
        -cfg <(sed -e "s/^#\?CHARACTER_SET:.*/CHARACTER_SET:$2/" "$LYNX_CFG") \
        -dump "$file"
fi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]