Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]

From:	Patrick Georgi
Subject:	Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]
Date:	Sat, 17 Feb 2007 08:45:46 +0100
User-agent:	Thunderbird 1.5.0.8 (X11/20061204)

Nathaniel Smith schrieb:

have no idea what's going to happen on, say, OSX or *BSD or Solaris.

For solaris: it will fail as it can't find that table you refer("ASCII//whatever") as it's non-standard.The same for BSD, unless they rebuilt the GNU extension (in which caseyou'd better look out for implementation differences)

One option is just to write our own "//IGNORE"-style iconv wrapper.
iconv's normal API is that it does as much work as it can, then it
tells you where it bombed out.  It's perfectly possible at that point
to skip ahead a byte or more on the input, stick a question mark in
the output string, and then try again from there.  Not the most
efficient thing in the world, but probably a lot easier than trying to
ship iconv conversion tables.

"skip ahead a byte" is troublesome - if your illegal sequence is amultibyte character (or even some state machine changing sequence insome of the obscure encodings), your next character will be wrong orillegal, too.


but skipping a character should be possible:

- build another iconv state that translates input encoding into inputencoding (unless that enables a fast-path, which I'm not sure of -alternative might be some encoding that is the ultimate superset, ifsuch an encoding exists)- push first unknown byte into it. if that creates a response already,discard (as it might be some header sequence) and restart with the samebyte in the next step, otherwise start at the next byte

- until iconv emits a response, push byte after byte into it
- skip that many bytes in the input, replace with one "?"

not so simple anymore, but imho still easier than integrating gnu iconv.


patrick georgi

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Why is utf8 type _NOVERIFY, and other vocab stuff., Timothy Brownawell, 2007/02/14
- Re: [Monotone-devel] Why is utf8 type _NOVERIFY, and other vocab stuff., Nathaniel Smith, 2007/02/15
  - [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Lapo Luchini, 2007/02/15
    - Re: [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Zack Weinberg, 2007/02/15
    - [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Lapo Luchini, 2007/02/15
    - [Monotone-devel] iconv diffs [Was: Why is utf8...], Lapo Luchini, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Nathaniel Smith, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Patrick Georgi <=
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ethan Blanton, 2007/02/17
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ulrich Drepper, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Ulrich Drepper, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Keller, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Thomas Moschny, 2007/02/16
    - Re: [Monotone-devel] iconv diffs [Was: Why is utf8...], Patrick Georgi, 2007/02/16
    - [Monotone-devel] Re: iconv diffs [Was: Why is utf8...], Lapo Luchini, 2007/02/16
    - Re: [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Justin Patrin, 2007/02/16
    - Re: [Monotone-devel] Re: Why is utf8 type _NOVERIFY, and other vocab stuff., Zack Weinberg, 2007/02/17

Prev by Date: Re: [Monotone-devel] Mac OS X - resource fork empty after checkout
Next by Date: [Monotone-devel] The web is moving
Previous by thread: Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]
Next by thread: Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]
Index(es):
- Date
- Thread