bug-gnupod
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnupod] Encoding of non-ascii characters in GNUtunesDB.xml


From: H. Langos
Subject: Re: [Bug-gnupod] Encoding of non-ascii characters in GNUtunesDB.xml
Date: Tue, 15 Apr 2008 01:26:18 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

Patch of the patch ... performance is better but still could be improved
i guess.

Instead of making 4 utf8 conversions and 3 substring operations on each
character we are down to one ord() and one substr() per character. Still
bad but way better than before.

-henrik

PS: Anybody interested in getting complete usable files instead of
patches?

On Mon, Apr 14, 2008 at 08:12:30PM +0200, H. Langos wrote:
> 
> Ok, here's the patch ...
> 
> Took longer than I thought because UTF8 in perl is a major pain.
> 
> cheers
> -henrik
> 
> PS: The line "$xutf =~ tr/\000-\037//d;" is not without problems. It
> will reduce all control characters to nothing including TAB, LF, 
> and CR eventhough they are valid XML characters. 
> 
> Could somebody check out how iTunes handles those? Does it also remove 
> those characters or does it convert them into 	 and so on?
> 
> 
> On Mon, Apr 14, 2008 at 02:14:18PM +0200, H. Langos wrote:
> > Hi there,
> > 
> > I wonder If anybody else has the ocassional problem with editing her/his
> > GNUtunesDB.xml. 
> > 
> > Since it is XML and the encoding is UTF-8 you don't have any problem as
> > long as your system is completely UTF-8 compliant. I however have a
> > mixed iso-8859-1 iso-8859-15 and UTF-8 mess and some of the editors 
> > that I like to use are not very smart about handling the character 
> > encoding.
> > 
> > It would be very easy to convert everything outsite the ascii range to 
> > the XML escaped version. So say, instead of some garbage you'd see 
> > "ś" where a "Latin Small Letter s with Acute" is.
> > 
> > Pro: GNUtunesDB.xml becomes a pure ascii file. No more editor/viewer 
> >   issues.
> > 
> > Contra: The GNUtunesDB.xml becomes slightly bigger and for people with a
> >   clean UTF-8 toolchain it becomes a little less readable. (Note: You can
> >   still edit the file and insert native UTF-8 as you please.)
> > 
> > Any thoughts?
> > 
> > cheers
> > -henrik
> > 
> > 
> > 
> > _______________________________________________
> > Bug-gnupod mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/bug-gnupod

> commit 5ce6a9e9173dce95287ff4b15deda67b569dd365
> Author: Heinrich Langos <address@hidden>
> Date:   Mon Apr 14 19:49:54 2008 +0200
> 
>     Changed encoding of unicode characters outside of ascii range to XML 
> notation.
>     
>     This change will make your GNUtunesDB.xml into a pure ascii file. Making 
> it
>     easier to view and manipulate on non-utf8 capable systems.
>     
>     Note: "xescaped()" is not only called for attribute values but also for
>     element names and attribute names. So if sombody comes up with non-ascii
>     element names or attribute names we would have to treat those differently.
> 
> diff --git a/src/ext/XMLhelper.pm b/src/ext/XMLhelper.pm
> index 5eaeb48..2a230a3 100755
> --- a/src/ext/XMLhelper.pm
> +++ b/src/ext/XMLhelper.pm
> @@ -124,8 +124,15 @@ sub xescaped {
>       my $xutf = Unicode::String::utf8($ret)->utf8;
>       #Remove 0x00 - 0x1f chars (we don't need them)
>       $xutf =~ tr/\000-\037//d;
> -     
> -     return $xutf;
> +     my $out = Unicode::String::utf8("")->utf8;
> +     for (my $i = 0 ; $i < Unicode::String::utf8($xutf)->length ; $i++) {
> +             if (Unicode::String::utf8($xutf)->substr($i,1)->ord > 127) {
> +                     $out .= '&#' . 
> Unicode::String::utf8($xutf)->substr($i,1)->ord . ';';
> +             } else {
> +                     $out .= Unicode::String::utf8($xutf)->substr($i,1) ;
> +             }
> +     }
> +     return $out;
>  }
>  
>  

> _______________________________________________
> Bug-gnupod mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/bug-gnupod

Attachment: feat_1ace2709_improved_performance_of_utf8_to_ascii_encoding.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]