bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15077: Clarification


From: Assaf Gordon
Subject: bug#15077: Clarification
Date: Mon, 12 Aug 2013 18:04:34 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130630 Icedove/17.0.7

Hello Federico Alves,

On 08/12/2013 12:31 PM, CDR wrote:
I just found out that the "v" option does what I need. So in my opinion,
the "a" option is useless, for it gives you no new information.

I'm glad to hear you found the combination of options that works for you.
I would humbly disagree that the "-a" option is useless - it simply does 
something different than what you need.
Especially when combined with output specifier ("-o") - the output of "join" is 
indeed not what you wanted.

When used without "-o", the "-a1/2" options allow you see which keys are common 
to both files and which keys are just in one file.

Example:
The following "join" will show which lines are common (will have 9 fields) and which 
lines are only in the second file ("-a 2"):
---
$ join -t, -1 1 -2 1 -a 2 today.txt yesterday.txt
2012067075,2013106025,6214,0,201,2019269533,6664,0,201
2012087388,8623689800,6214,2,201,2012320000,6006,0,201
2012088887,8623689800,6214,0,201,8624520081,6529,0,201
2012140209,2013700000,6006,0,201,9733360000,392A,0,201
2012204272,2019269533,6664,0,201
2012226151,2018209998,954F,0,201
2012299682,2018209998,954F,0,201
2012324322,9733360000,392A,0,201
2012334444,2017809469,6664,0,201
2012389608,2012320000,6006,0,201
---

If you don't care about the other fields, and just want to see the keys, using "-o 
0,1.1,2.1" will give:
----
$ join -t, -1 1 -2 1 -a 2 -o 0,1.1,2.1 today.txt yesterday.txt
2012067075,2012067075,2012067075
2012087388,2012087388,2012087388
2012088887,2012088887,2012088887
2012140209,2012140209,2012140209
2012204272,,2012204272
2012226151,,2012226151
2012299682,,2012299682
2012324322,,2012324322
2012334444,,2012334444
2012389608,,2012389608
----
Which again, quickly shows that lines with empty second field exist only in the 
second file.

You can combine "-a 1" and "-a 2", to show all combination of items in both 
files:
---
$ join -t, -1 1 -2 1 -a 1 -a 2 -o 0,1.1,2.1 today.txt yesterday.txt
2012054455,2012054455,
2012067075,2012067075,2012067075
2012087388,2012087388,2012087388
2012088887,2012088887,2012088887
2012120319,2012120319,
2012121177,2012121177,
2012122869,2012122869,
2012140209,2012140209,2012140209
2012143002,2012143002,
2012149116,2012149116,
2012204272,,2012204272
2012226151,,2012226151
2012299682,,2012299682
2012324322,,2012324322
2012334444,,2012334444
2012389608,,2012389608
---
In this example, all lines have three fields:
First field is the combined key, and is always non-empty.
Second field is non-empty if the key exists in the first file.
Third field is non-empty if the key exists in the second file.
(and thus, if both second and third fields are non empty, the key is common to 
both files).


In terms of new functionality, the "-o" option, format, should allow to add
arbitrary data, like ",A", "4", etc., in addition to the list of fields
(2.1 1.1 etc.)

I would suggest using a different program (perhaps awk or sed), down-stream from the 
"join" program to add any additional information you need.
Consider combining it with "-o auto" (new in join version 8.10) that will 
maintain the column ordering of the combined input files, and will allow you to easily 
add information.

Example with "-a 1 -2" AND "-o auto":
---
$ join -t, -1 1 -2 1 -a 1 -a 2 -o auto today.txt yesterday.txt 
2012054455,8624520081,6529,0,201,,,,
2012067075,2013106025,6214,0,201,2019269533,6664,0,201
2012087388,8623689800,6214,2,201,2012320000,6006,0,201
2012088887,8623689800,6214,0,201,8624520081,6529,0,201
2012120319,9739789996,392A,0,201,,,,
2012121177,9739789996,392A,0,201,,,,
2012122869,2013700000,6006,0,201,,,,
2012140209,2013700000,6006,0,201,9733360000,392A,0,201
2012143002,2012339982,6529,0,201,,,,
2012149116,2012339982,6529,0,201,,,,
2012204272,,,,,2019269533,6664,0,201
2012226151,,,,,2018209998,954F,0,201
2012299682,,,,,2018209998,954F,0,201
2012324322,,,,,9733360000,392A,0,201
2012334444,,,,,2017809469,6664,0,201
2012389608,,,,,2012320000,6006,0,201
---

In this example, all lines have nine fields, and are easy to parse:
1. The common key
2-5 - The four fields from the first file (possibly empty)
6-9 - The four fields from the second file (possibly empty).

Adding AWK on the output of "join" is now easy, because the fields are in fixed 
order.
for example, adding "AA" as a first field and "44" as the last field:
---
$ join -t, -1 1 -2 1 -a 1 -a 2 -o auto today.txt yesterday.txt | awk -F, -v OFS=, '{print 
"AA", $0, "44"}'
AA,2012054455,8624520081,6529,0,201,,,,,44
AA,2012067075,2013106025,6214,0,201,2019269533,6664,0,201,44
AA,2012087388,8623689800,6214,2,201,2012320000,6006,0,201,44
AA,2012088887,8623689800,6214,0,201,8624520081,6529,0,201,44
AA,2012120319,9739789996,392A,0,201,,,,,44
AA,2012121177,9739789996,392A,0,201,,,,,44
AA,2012122869,2013700000,6006,0,201,,,,,44
AA,2012140209,2013700000,6006,0,201,9733360000,392A,0,201,44
AA,2012143002,2012339982,6529,0,201,,,,,44
AA,2012149116,2012339982,6529,0,201,,,,,44
AA,2012204272,,,,,2019269533,6664,0,201,44
AA,2012226151,,,,,2018209998,954F,0,201,44
AA,2012299682,,,,,2018209998,954F,0,201,44
AA,2012324322,,,,,9733360000,392A,0,201,44
AA,2012334444,,,,,2017809469,6664,0,201,44
AA,2012389608,,,,,2012320000,6006,0,201,44
---

Or something a little more informative:
---
$ join -t, -1 1 -2 1 -a 1 -a 2 -o auto today.txt yesterday.txt |
     awk -F, -v OFS=, '$2=="" && $6!="" { print $1, "Yesterday" }
                       $2!="" && $6=="" { print $1, "Today" }
                       $2!="" && $6!="" { print $1, "Both" }'
2012054455,Today
2012067075,Both
2012087388,Both
2012088887,Both
2012120319,Today
2012121177,Today
2012122869,Today
2012140209,Both
2012143002,Today
2012149116,Today
2012204272,Yesterday
2012226151,Yesterday
2012299682,Yesterday
2012324322,Yesterday
2012334444,Yesterday
2012389608,Yesterday
---


Hope this helps,
 -gordon






reply via email to

[Prev in Thread] Current Thread [Next in Thread]