help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] first part


From: Val Krem
Subject: Re: [Help-bash] first part
Date: Mon, 23 May 2016 23:27:07 +0000 (UTC)

Hi Greg and all,

I am so sorry for the  lack of clarity  on my question.

I am reading a file which  has  more than  5 million records and this file has 
more than 10 fields. One of the fields look like as I describe in my previous 
email. That field (the 7th filed in my file) is composed of three components 
(area code plus telephone number.  My goal is to find out how many unique  area 
codes are there in that file. So i used



awk -F '-' '{print $7}' myfile | sort | uniq -c  | wc -l 



Is there a better way of doing? Does it have a pitfall?


Many thanks for your suggestion.

Val









On Monday, May 23, 2016 8:34 AM, Greg Wooledge <address@hidden> wrote:
On Sat, May 21, 2016 at 12:44:48PM +0000, Val Krem wrote:
> I  have field  that look like the following
> 124-20-25
> 014-2012-26
> 1024-212-27
> 1-001-29

You have a "field"?  Or is it a "file"?  Is this the entire file?  Is
there additional stuff on each line?  Are there extra lines that have
to be skipped?  Is the input file 4 lines, or 400 lines, or 4 million
lines?


> I want extract the first part of the string,  prior to the first "-"
> 124
> 014
> 1024
> 
> 1

And do what with it?  Store it in a variable?  Dump it all to stdout?

If you just want to dump it all to stdout:

sed "s/-.*//" "$yourfile"

awk is not a bad answer *if* you send the entire file through a single
invocation of awk.  Don't send each line through a separate invocation.

awk -F- '{print $1}' "$yourfile"

The sed version may be slightly faster.  It's basically a toss-up.

If you want to process each line in a shell loop, doing god-knows-what
to it, then use the IFS/while read approach.

while IFS=- read -r foo _; do
   ... something with "$foo"
done < "$yourfile"

Please, please, please ask your full, real questions.  State what
the goals are.  Don't ask part of a question.  Don't skip details.
Don't make up fake data that look nothing at all like the real data.
(It's OK to mangle patient names to comply with HIPAA and so on, but
make sure the *format* of the data remains true.)

The answer to "I have a file full of hyphenated serial numbers.  How do
I dump all of the first segments to standard output?" is completely
different from the answer to "I have a file.... How do I loop over each
of the first segments and match it against filenames in the current
working directory?"  And so on.

We need to understand the actual goals in order to give the right answer.
We need to understand the *full* syntax of the input file in order to
decide on a parsing approach.  Even the slightest change in the input
file's syntax may steer you toward a completely different approach.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]