[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug when field-separator used

From: Jeroen Hoek
Subject: Re: Possible bug when field-separator used
Date: Thu, 30 Nov 2023 14:52:30 +0100
User-agent: Mozilla Thunderbird

Hi Erik,

Good catch on the locale, that is indeed part of the reproduction case. I didn't consider that.


echo -e "a,14,1\nb,1,14\na,2,1" | \
    LC_NUMERIC=en_GB.UTF-8 \
    datamash --field-separator=, -s groupby 1 sum 2,3


echo -e "a,14,1\nb,1,14\na,2,1" | \
    LC_NUMERIC=nl_NL.UTF-8 \
    datamash --field-separator=, -s groupby 1 sum 2,3

And indeed, in Dutch as in German, the decimal separator is the comma. Using . as separator reverses this: en_GB fails and nl_NL works.

As an end user I wouldn't mind if decimal separators which happen to match the specified field separator do not get interpreted as decimal separators at all. I would consider such input as faulty. (This goes for periods too in relevant locales I suppose).

Thanks for looking into this.

On 30-11-2023 14:22, Erik Auerswald wrote:
Hi Jeroen,

I think this is an interaction with the locale support of GNU Datamash
and the way GNU Datamash parses numbers.  You can work around it by
temporarily overwriting the locale settings:

     echo -e "a,14,1\nb,1,14\na,2,1" | \
       LC_ALL=C datamash --field-separator=, -s groupby 1 sum 2,3
     --> a,16,2
     --> b,1,14

The problem occurs as soon as the second column is summed over:

     echo -e "a,14,1\nb,1,14\na,2,1" | \
       datamash --field-separator=, -s groupby 1 sum 2
     --> datamash: invalid numeric value in line 1 field 2: '14'

The root cause is that GNU Datamash uses the locale settings for parsing
its input, and thus treats ',' as decimal separator in some locales
(e.g., in the de_DE.UTF-8 locale).  This interacts with using ',' as
field separator.

I have not looked into the code and thus do not know how involved it
would be to fix this.  (I do think this is a bug.)

Best regards,

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]