[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] unique
From: |
Assaf Gordon |
Subject: |
Re: [Help-bash] unique |
Date: |
Sun, 24 Apr 2016 18:09:49 -0400 |
Hello,
> On Sat, 23 Apr 2016 19:20:46 +0000 (UTC), Val Krem <address@hidden>
> wrote:
>
>> Hi all,
>>
>>
>> I have a file with several variables. Sample of data is below.
>> I want to count the unique occurrence of Name (column 1) in column 2(V1)
>> and columns 3(V2).
>>
>> Name v1 v2
>> ABX123 12 125
>> ABX123 12 135
>>
>> ABX123 13 113
>> AcX222 12 225
>> AcX222 12 235
>> AcX222 13 213
>> AcX222 13 313
>>
>> AcX222 14 413
>>
>> AdX222 14 512
>>
>> The output should like
>>
>> ABX123 2 3
>>
>> AcX123 2 5
>> AdX222 1 1
A new GNU program, datamash ( http://www.gnu.org/software/datamash/ ) can
perform this operation without the need for a specialized script, including
handling the header line.
Example:
$ cat input.txt
Name v1 v2
ABX123 12 125
ABX123 12 135
ABX123 13 113
AcX222 12 225
AcX222 12 235
AcX222 13 213
AcX222 13 313
AcX222 14 413
AdX222 14 512
$ datamash --sort --headers groupby 1 countunique 2 countunique 3 <
input.txt
GroupBy(Name) countunique(v1) countunique(v2)
ABX123 2 3
AcX222 3 5
AdX222 1 1
The 'groupby 1' tells it to group the data by the first column,
then perform the rest of the operations (countunique on column 2 and 3).
If the input data is not tab-separated, use '-W' to use whitespace as field
delimiter.
If the data is already sorted, omit the '--sorted' to save time.
regards,
- assaf