beaver-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Beaver-devel] Gettext and stuff


From: Leslie Polzer
Subject: Re: [Beaver-devel] Gettext and stuff
Date: Mon, 17 Mar 2003 09:54:14 +0100

On Mon, 17 Mar 2003 02:41:12 -0500
Michael Terry <address@hidden> wrote:

> Leslie Polzer wrote:
> >>I may be spending some time this week on Beaver after all -- probably 
> >>just the Search/Replace stuff still.  It's coming along nicely, but it's 
> >>taking me a bit to wrap my head around the incredibly complex 
> >>GtkTreeView family.
> > 
> > I am currently working on the 'Invalid UTF8' problem.
> > Try loading about/about.html in Beaver's WWW tree and you
> > will see that Beaver refuses to load it because of the
> > 'copyright' sign at the end of the file.
> > 
> > g_get_charset() is acting weird. This function is supposed to return
> > the current locale's charset, but it gives me 'ANSI_X3.4-1968',
> > a charset I have never heard of - and iconv() fails, too.
> > With ISO-8859-1 it works and the (C) sign is correctly converted,
> > but I guess we have to provide the user with a list where he can
> > select the charset he wants. Maybe also a 'Charsets' tab in the
> > prefs... or both (radio "ask me every time a conversion must be performed"
> > / radio "always use this charset: " -> selection list of charsets).
> > 
> > What do you think?
> 
> It really should be possible to autodetect this, right?  Any other 
> editor I've used always makes such details invisible to the user.  I 
> think we should too.
I must admit I've never been messing around with charsets that much
before. I did some research and it seems as if there's no certain method
to "auto-detect" - a more appropriate word would be "guess".
Please compare

http://www.mail-archive.com/address@hidden/msg01273.html

a discussion on a similar problem. And other editors not talking about
this problem of course doesn't imply that they solved it perfectly.

In my opinion we should stick to my proposal but also insert a choice
"auto-detect/guess".
Sylpheed, my mail client, a quite excellent program IMHO, does it just
that way. 
I've even found some code for it in a tool called asrecod (guess.c attached
to this mail). I haven't fully understood the code there but it seems to
look out for certain char patterns.
Sylpheed's code might be interesting, too.

> I didn't get too far into the problem, but I did notice that converting 
> to my locale's charset didn't help the problem at all.
I used g_convert_with_iconv() with ISO-8859-1 as the source charset
and Beaver's TextView accepted it happily as valid UTF8.
Have a look at editor.c:~220, it's already there. Try it.

> Ideally, we could load the file, discover the charset, make a note of 
> it, convert the text to UTF-8 so that GtkTextBuffer can handle it, and 
> when saving, convert back.
Yes, that's alright being a rough outline of the process.

Leslie

Attachment: guess.c
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]