zutils-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Zutils-bug] zgrep performance long line


From: Walter Anema
Subject: [Zutils-bug] zgrep performance long line
Date: Wed, 15 Aug 2018 15:47:17 +0000

Hi Antonio,

 

You made a nice package with z utilities.

I am using this in a docker container (Alpine) and try to analyse JSON logging.

 

I have a problem with the performance of a special file. It is a file with logging in json format, without a \n.
I need to append an `echo` before `wc` shows up with a count.

 

(zcat /logs/s3/2018/04/11/08/prod-kinesis-firehose-stream-1-2018-04-11-08-05-23-bcdf3841-52b5-47eb-bf85-c36dfa2d0d55;echo ) | wc

      1 2145643 37786248

 

Somehow the zgrep takes a long time:

# /usr/bin/zgrep -V

zgrep (zutils) 1.7

Copyright (C) 2018 Antonio Diaz Diaz.

License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.


# time (/usr/bin/zgrep -o connect largefile_with_one_json_line| wc)

     97      97     776

real   0m19.320s

user   0m19.317s

sys    0m0.078s

 

When I use GNU zgrep it is 20 times faster:

# zgrep -H

zgrep (gzip) 1.5

Copyright (C) 2010-2012 Free Software Foundation, Inc.

This is free software.  You may redistribute copies of it under the terms of

the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.

There is NO WARRANTY, to the extent permitted by law.

 

Written by Jean-loup Gailly.

 

# time (/usr/bin/zgrep -o connect largefile_with_one_json_line | wc)

     97      97     776

real   0m0.830s

user   0m0.964s

sys    0m0.044s

 

Can you explain the difference?

 

Best regards,

 

Walter Anema

Technisch Applicatie Beheer

 

be smart. get connected.

portbase

 

Blaak 16 3011 TA Rotterdam The Netherlands

+31 (0)88 625 25 37 +31 (0)6 54 32 76 70

 

portbase.com

 

Op dit bericht is de e-mail disclaimer van Portbase van toepassing.

Please consider the environment before printing this e-mail.

 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]