help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] split


From: Greg Wooledge
Subject: Re: [Help-bash] split
Date: Tue, 29 May 2018 08:57:44 -0400
User-agent: NeoMutt/20170113 (1.7.2)

On Sat, May 26, 2018 at 05:28:05PM +0000, Val Krem wrote:
> I wanted to split a big file based on the value of the first column of thsi 
> file

You seem to be asking a lot of basic programming questions here.
Have you never programmed before?  This is elementary work.  Even a
first-year student should be able to do this.

> file 1 (atest.dat).1 ah1 251 ah2 26
> 4 ah5 354 ah6 362 ah2 54

Are you writing your email in a freakin' web browser?  Instead of a text
editor in a terminal like a normal programmer?

This looks like you've completely corrupted the input file by feeding
it to a web browser, or a Windows-based "word processor".  Where do the
lines begin and end?  How much whitespace is actually present, and
where?

> I want to split this file into three files based on the first column 

> The range of the first column could vary from 1 up to 100.

FIELD.

A COLUMN is a single character.

A FIELD is a "word" composed of one or more characters, terminated or
delimited in some way.  In your case there MIGHT be whitespace delimiters
between fields.  It's hard to be sure because your sample input has been
corrupted.

Anyway....

The basic algorithm here is extremely simple.

1) Open three output file descriptors.

2) Read the input file line by line.

2a) For each line, examine the first field, and use that to decide which
    output file to write to.

2b) Write the line to the appropriate file descriptor.

3) There is no 3.  Once you reach the end of the input, you're done.

>  file 1 will be 
>      1 ah1 25     1 ah2 26
> file 2 would be    4 ah5 35    4 ah6 36
> file three would be     2 ah2 54

How does the first field value tell you which output FD to use?

You want "1" to map to file 1, and "4" to map to file 2, and "2" to map
to file 3?  This is lunacy.  There is no discernable pattern here.  Where
does 17 go?  Where does 42 go?  Where does 100 go?

Is it supposed to be RANDOM?!

> I trad  the following script 
> ################################################
> #! bin/bash
> numb=($(seq 1 1 10))

seq is Linux-only, and is stupid.  If you want to loop 10 times, simply
write a for loop that counts to 10.

> for i in "address@hidden"

You don't need to store a list of the integers {1..10} in an array.
You could simply write

for i in {1..10}

Except, don't do that!  Why are you looping 10 times?  Where did you
get the number 10 from?  Which part of the problem specification does
this represent?  Which part of the algorithm that I described above
includes the number 10?

>    do
>      awk '{if($1=='"${i}"') print $0}' atest.dat   > numb${i}.txt

This would have been a code injection vulnerability if you didn't already
know that $i will be an integer.

Also, you're reading the input 10 times instead of 1 time.  Why?

Also also, you're only handling 10 of the possible 100 values
of the first input field.  You're putting "4" into file 4, and so on.
Where does 11 go?  Nowhere.  Where does 47 go?  Nowhere.

Also also also, your mapping does not match what you said you
wanted in each output file.  You said that "4" should map to output
file 2.  But you're putting "4" in output file 4.

>    done
> #################################################

> The above script gave me 10  files while I was expecting only 3  files. 

Because you ran awk 10 times!  If you looped 10 times, and you produced
a different output file each time, why are you surprised when there are
10 files?

Which part of your program had the number 3 in it?  NONE!

Why would you expect 3 output files when you loop 10 times instead of
3 times?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]