help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] bash suitable for parsing big files?


From: Matthew Cengia
Subject: Re: [Help-bash] bash suitable for parsing big files?
Date: Fri, 13 Sep 2013 14:03:29 +1000
User-agent: Mutt/1.5.21 (2010-09-15)

On 2013-09-13 02:55, adrelanos wrote:
> Dennis Williamson:
[...]
> line="firstone secondone thirdone"
> 
> How can I get "firstone" into variable "first"? I am using awk.
> 
> first="$(echo "$line" | awk '{print $1}')"

read -r first second third _ <<< "$line"

Or:

read -ra arr <<< "$line"
echo "${arr[0]}"

> 
> The recommendation to use awk came from search engines. I wouldn't know
> how to do it without external utility, never found answers to do it in
> pure bash. Until now, it worked well, but if you have an idea how to do
> it pure bash, that'd be great.

Google is not your friend in this case; too many examples of bad code. I
strongly recommend dropping into the #bash channel on Freenode for this
sort of question.

> 
> > Consider that many things like awk and
> > grep iterate over the lines in a file for free.
> 
> I don't understand. Please elaborate.

When Awk receives input, and that input is multiple lines long, it'll
*automatically* iterate over each line sequentially by default:

address@hidden:tmp$ printf "%s\n" a b c | awk '{ printf("Line %s: %s\n", NR, 
$0); }'
Line 1: a
Line 2: b
Line 3: c

This means everything is done in a single Awk call, which eliminates
thousand of fork/exec calls and runs lots faster than iterating with a
'while' or 'for' loop in Bash then processing each line in Awk. Either
do it all in Bash, or do it all in Awk. Avoid mixing if at all possible.

> 
> > Ultimately, it comes down to "What are you really trying to do?"
> 
> Imagine you are using $linux-distribution on hdd and you want to check
> the integrity of your system. You're booting from USB or DVD and which
> you assume the clean of backdoors while you're not so sure your hdd
> contains a backdoor.
> 
> The script I am writing looks what files are installed, downloads the
> package from $linux-distribution's repositories and compares them with
> the ones on the disk. Finally reports which were modified and which ones
> could not be verified (because they are not in a package, auto generated
> files, etc.). [And more.] I am doing such a thing, just not to verify a
> hdd, but to verify a virtual machine image.
> 
> Code:
> https://github.com/Whonix/Whonix/blob/master/release/verify_build#L187
> 
> Function:
> parse_dpkg_status_file
> 

This is what debsums is for: http://packages.debian.org/search?keywords=debsums

-- 
Regards,
Matthew Cengia

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]