help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] files


From: Val Krem
Subject: Re: [Help-bash] files
Date: Fri, 22 Apr 2016 23:54:41 +0000 (UTC)

Hi John, Bob and all,
thank you very much for you help, Your suggestion works fine for small data 
set. When I applied it for bigger data set it not working. IS an array a 
limitation?
here is my file type and files are sorted.


f1
956KP9700234 14 1792
111LU8700245 21 1152
420MN5700252 31 1324
100JK3700296 14 1406
300RY2000731 49 1152


f2

956KP9700234 -200.717346095694742 311.25949043870489
111LU8700245 -211.413271898174370 423.77554923238302
420MN5700252 -101.817482577632098 525.97564879400684
100JK3700296 -201.538301283322073 663.03751565313559
300RY2000731 -209.159539234748780 782.81789556241458


1. I tried join
join f1 f2 gave me the following

956KP9700234 14 1792 -200.717346095694742 311.25949043870489
111LU8700245 21 1152 -211.413271898174370 423.77554923238302
420MN5700252 31 1324 -101.817482577632098 525.97564879400684
100JK3700296 14 1406 -201.538301283322073 663.03751565313559
300RY2000731 49 1152 -209.159539234748780 782.81789556241458


When I applied this for my big data it gave me  the last two columns.



John,




I also tried your awk script and it is the same thing.

f1
956KP9700234|14|1792
111LU8700245|21|1152
420MN5700252|31|1324
100JK3700296|14|1445
300RY2000731|49|1152


f2
956KP9700234 -200.717346095694742 311.25949043870489
111LU8700245 -211.413271898174370 423.77554923238302
420MN5700252 -101.817482577632098 525.97564879400684
100JK3700296 -201.538301283322073 663.03751565313559
300RY2000731 -209.159539234748780 782.81789556241458

gawk  -F '[| ]' -f jw.awk f1 f2 result

100JK3700296|14|1445|-201.538301283322073|663.03751565313559
111LU8700245|21|1152|-211.413271898174370|423.77554923238302
300RY2000731|49|1152|-209.159539234748780|782.81789556241458
420MN5700252|31|1324|-101.817482577632098|525.97564879400684
956KP9700234|14|1792|-200.717346095694742|311.25949043870489


when I applied to my data ( > 2000 records)
gave me like this. These are the last two column of file 


|-1.18415352216211|1.77941377451768
|-2.51205277969501|2.41657492841508
|-4.22298749946797|2.41007785066459
|-0.449232016218862|1.37583886443860


In addition if I change the order of the files then I get different result.
gawk -F '[| ]' -f jw.awk f2 f1


Thank you in advance.





On Thursday, April 21, 2016 8:19 AM, John McKown <address@hidden> wrote:
On Wed, Apr 20, 2016 at 7:09 PM, Val Krem <address@hidden> wrote:

> Hi John and  all,
>
> I have two files the first file is pipe delimited  and the other file is
> space delimited. I want to combine the two files by the first column and
> the final result should be pipe delimited file

file 1
>
> A123|24|315
> A125|63|450
>
> file 2
> A123 009 163
> A125 091 112
>
> i want the result
> A123|24|315|009|163
> A125|63|450|091|112
>
> I tried  join and awk but failed to work for me.
>

​Well, not a BASH solution, per say. But I have a gawk (not generic awk)
solution. Solution:

$ cat f1
A123|24|315
A125|63|450
$ cat f2
A123 009 163
A125 091 112
$ cat f1-f2.awk
#!/usr/bin/awk -F '[| ]' -f
{b=$1; ;a[b]=(a[b] gensub(/^[^| ]+/,"",1,$0));} #magic line
END {
    PROCINFO["sorted_in"]="@ind_str_asc"; # return sorted by index value
     for (b in a) {
        print b gensub(/ /,"|","g",a[b]);
    }
}
$ gawk  -F '[| ]' -f f1-f2.awk f1 f2

A123|24|315|009|163
A125|63|450|091|112
​
​The "magic line" does the join work. The "b" variable is just to hold the
array index value. The "a" variable​ is an associative array which contains
the "built up" result for the value in "b". This works in AWK because if
the index value is not already in the array, it : (1) has the value of ""
if referenced and (2) will be dynamically added to the array if assigned
to; The gensub() mess basically removes the index value (equivalent of $1)
from the value concatenated into the a[b] array element. The print command
later will "re-add" this value.

In the for(...) command in the END{...} portion, the index values are
iterated over in value order (default is "random"). The print then prints
the index value followed by the array value accumulated previously. The
gensub(...) is used to change the " " (space) delimiter to a "| (pipe)
delimiter. The -F '[| ]' sets the AWK field separator to either a single
pipe (|) or space character.


-- 
"He must have a Teflon brain -- nothing sticks to it"
Phyllis Diller

Maranatha! <><
John McKown


reply via email to

[Prev in Thread] Current Thread [Next in Thread]