[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] bash suitable for parsing big files?

From: adrelanos
Subject: Re: [Help-bash] bash suitable for parsing big files?
Date: Fri, 13 Sep 2013 02:55:36 +0000

Dennis Williamson:
> On Sep 12, 2013 7:05 PM, "adrelanos" <address@hidden> wrote:
>> Hi,
>> I've been using:
>> mapfile -t lines < "/var/lib/dpkg/status"
>> for line in "address@hidden"; do
>> ... (parsing it with things like awk, {var:0:6}, {var,,} and
>> pkg_arch[$package]="$arch".) ...
>> For those who don't know /var/lib/dpkg/status, it's size is roughly 2 MB
>> and contains roughly 50.000 lines.
>> Parsing it with bash takes a long time.
>> Is there any way to speed it up or is bash not the right tool for
>> parsing such big files?
>> All the best,
>> adrelanos
> Reading the whole file into an array for a file that size is the wrong
> approach. Use while read instead.


> Also, calling external utilities many
> times in a loop can be very slow.

line="firstone secondone thirdone"

How can I get "firstone" into variable "first"? I am using awk.

first="$(echo "$line" | awk '{print $1}')"

The recommendation to use awk came from search engines. I wouldn't know
how to do it without external utility, never found answers to do it in
pure bash. Until now, it worked well, but if you have an idea how to do
it pure bash, that'd be great.

> Consider that many things like awk and
> grep iterate over the lines in a file for free.

I don't understand. Please elaborate.

> Ultimately, it comes down to "What are you really trying to do?"

Imagine you are using $linux-distribution on hdd and you want to check
the integrity of your system. You're booting from USB or DVD and which
you assume the clean of backdoors while you're not so sure your hdd
contains a backdoor.

The script I am writing looks what files are installed, downloads the
package from $linux-distribution's repositories and compares them with
the ones on the disk. Finally reports which were modified and which ones
could not be verified (because they are not in a package, auto generated
files, etc.). [And more.] I am doing such a thing, just not to verify a
hdd, but to verify a virtual machine image.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]