[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: locale for scanf backed out
From: |
Rik |
Subject: |
Re: locale for scanf backed out |
Date: |
Tue, 26 Feb 2013 14:07:36 -0800 |
On 02/26/2013 01:03 AM, Pascal Dupuis wrote:
> 2013/2/26 Rik <address@hidden>:
>> 2/25/13
>>
>> All,
>>
>> I backed out the locale changeset for scanf. Unfortunately, due to issues
>> in the build system you will need to manually remove
>> libinterp/interpfcn/file-io.cc-tst before tests will pass.
>>
>> --Rik
> Rik,
>
> I find this a bit unfair. There was an agreement that supporting
> locale on printf is not a good idea; nor implement something like
> National Instruments extension to force the use of a decimal point.
I wouldn't have done it if I thought it unfair. As Jordi has suggested,
the locale patch was primarily to help you resolve issues with CSV. You
know how to build Octave and navigate Mercurial, so I would simply maintain
your patch for yourself locally until a better solution can be found.
Octave is really dedicated to matrix processing and not text processing.
Things like locale and character encoding, and even regular expressions,
are just not as well handled as a more dedicated text processing language
like Perl or Python.
>
> OTOH, I can assure you that changing the locales used in a CSV file is
> everything but easy. CSV files contains strings and numbers intermixed
> by separators. What about separators and points and comma inside
> strings ? What about string delimiter inside string like 'O''Reilly' ?
> Some broken programs even quote every number !
It is really complicated, which is why I wouldn't bother to re-invent the
wheel. Perl programmers have been around a long time and have dealt with
most of these oddities. I would rely on the vast archive of code out
there, in particular the CPAN module Text::CVS.
>
> The only sane way I found in the database package was
> 1) use strsplit() to decompose a line into fields
> 2) apply scant() onto each field. If it fails, try to detect a
> date-time pattern.
>
> As you can see, half the hard work is done inside strsplit, and we
> have the power of Octave to store each substring into a cell. Doing it
> from some external program implies to write and debug a full strsplit
> replacement.
Attached is a rough draft for a script which converts csv files between
fr_BE.utf8 and the 'C' locale. It can be customized to your particular
data format pretty easily by configuring the options to Text::CVS.
--Rik
cvtcsv.pl
Description: Perl program
locale_tst.pl
Description: Perl program