help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] unique


From: Assaf Gordon
Subject: Re: [Help-bash] unique
Date: Sun, 24 Apr 2016 18:09:49 -0400

Hello,

> On Sat, 23 Apr 2016 19:20:46 +0000 (UTC), Val Krem <address@hidden>
> wrote:
> 
>> Hi all,
>> 
>> 
>> I have a file with  several variables. Sample of data is below.
>> I want to count the unique occurrence of Name  (column 1) in column 2(V1)
>> and columns 3(V2).
>> 
>>  Name v1    v2 
>> ABX123  12  125 
>> ABX123  12  135
>> 
>> ABX123  13  113
>> AcX222  12  225
>> AcX222  12  235
>> AcX222  13  213
>> AcX222  13  313
>> 
>> AcX222  14  413
>> 
>> AdX222 14  512
>> 
>> The output should like 
>> 
>> ABX123  2   3
>> 
>> AcX123  2   5
>> AdX222 1    1

A new GNU program, datamash ( http://www.gnu.org/software/datamash/ ) can 
perform this operation without the need for a specialized script, including 
handling the header line.
Example:

    $ cat input.txt
    Name        v1      v2
    ABX123      12      125
    ABX123      12      135
    ABX123      13      113
    AcX222      12      225
    AcX222      12      235
    AcX222      13      213
    AcX222      13      313
    AcX222      14      413
    AdX222      14      512

    $ datamash --sort --headers groupby 1 countunique 2 countunique 3 < 
input.txt
    GroupBy(Name)       countunique(v1) countunique(v2)
    ABX123              2               3
    AcX222              3               5
    AdX222              1               1


The 'groupby 1' tells it to group the data by the first column,
then perform the rest of the operations (countunique on column 2 and 3).

If the input data is not tab-separated, use '-W' to use whitespace as field 
delimiter.
If the data is already sorted, omit the '--sorted' to save time.

regards,
 - assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]