bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20029: 'yes' surprisingly slow


From: Giuseppe Scrivano
Subject: bug#20029: 'yes' surprisingly slow
Date: Tue, 10 Mar 2015 01:31:36 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)

Pádraig Brady <address@hidden> writes:

> The attached should make things more efficient here.
>
> thanks,
> Pádraig.
>
>
> From 7959bbf19307705e98f08cfa32a9dcf67672590c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <address@hidden>
> Date: Mon, 9 Mar 2015 19:27:32 +0000
> Subject: [PATCH] yes: output data more efficiently
>
> yes(1) may be used to generate repeating patterns of text
> for test inputs etc., so adjust to be more efficient.
>
> Profiling the case where yes(1) is outputting small items
> through stdio (which was the default case), shows the overhead
> continuously processing small items in main() and in stdio:
>
>     $ yes >/dev/null & perf top -p $!
>     31.02%  yes           [.] main
>     27.36%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>     14.51%  libc-2.20.so  [.] fputs_unlocked
>     13.50%  libc-2.20.so  [.] strlen
>     10.66%  libc-2.20.so  [.] __GI___mempcpy
>      1.98%  yes           [.] address@hidden
>
> Sending more data per stdio call improves the situation,
> but still, there is significant stdio overhead due to memory copies,
> and the repeated string length checking:
>
>     $ yes "`echo {1..1000}`" >/dev/null & perf top -p $!
>     42.26%  libc-2.20.so  [.] __GI___mempcpy
>     17.38%  libc-2.20.so  [.] strlen
>      5.21%  [kernel]      [k] __srcu_read_lock
>      4.58%  [kernel]      [k] __srcu_read_unlock
>      4.27%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>      2.50%  libc-2.20.so  [.] __GI___libc_write
>      2.45%  [kernel]      [k] system_call
>      2.40%  [kernel]      [k] system_call_after_swapgs
>      2.27%  [kernel]      [k] vfs_write
>      2.09%  libc-2.20.so  [.] _IO_do_write@@GLIBC_2.2.5
>      2.01%  [kernel]      [k] fsnotify
>      1.95%  libc-2.20.so  [.] _IO_file_write@@GLIBC_2.2.5
>      1.44%  yes           [.] main
>
> We can avoid all stdio overhead by building up the buffer
> _once_ and outputting that, and the profile below shows
> the bottleneck moved to the kernel:
>
>     $ src/yes >/dev/null & perf top -p $!
>     15.42%  [kernel]      [k] __srcu_read_lock
>     12.98%  [kernel]      [k] __srcu_read_unlock
>      9.41%  libc-2.20.so  [.] __GI___libc_write
>      9.11%  [kernel]      [k] vfs_write
>      8.35%  [kernel]      [k] fsnotify
>      8.02%  [kernel]      [k] system_call
>      5.84%  [kernel]      [k] system_call_after_swapgs
>      4.54%  [kernel]      [k] __fget_light
>      3.98%  [kernel]      [k] sys_write
>      3.65%  [kernel]      [k] selinux_file_permission
>      3.44%  [kernel]      [k] rw_verify_area
>      2.94%  [kernel]      [k] __fsnotify_parent
>      2.76%  [kernel]      [k] security_file_permission
>      2.39%  yes           [.] main
>      2.17%  [kernel]      [k] __fdget_pos
>      2.13%  [kernel]      [k] sysret_check
>      0.81%  [kernel]      [k] write_null
>      0.36%  yes           [.] address@hidden
>
> Note this change also ensures that yes will only write complete lines
> for lines softer than BUFSIZ.
>
> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
> and output that, rather than having stdio process each item.
> * tests/misc/yes.sh: Add a new test for various buffer sizes.
> * tests/local.mk: Reference the new test.
> Fixes http://bugs.gnu.org/20029
> ---
>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>  tests/local.mk    |  1 +
>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>  3 files changed, 70 insertions(+), 2 deletions(-)
>  create mode 100755 tests/misc/yes.sh
>
> diff --git a/src/yes.c b/src/yes.c
> index b35b13f..91dea11 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 
> 'y'.\n\
>  int
>  main (int argc, char **argv)
>  {
> +  char buf[BUFSIZ];
> +  char *pbuf = buf;
> +  int i;
> +
>    initialize_main (&argc, &argv);
>    set_program_name (argv[0]);
>    setlocale (LC_ALL, "");
> @@ -77,9 +81,44 @@ main (int argc, char **argv)
>        argv[argc++] = bad_cast ("y");
>      }
>  
> -  while (true)
> +  /* Buffer data locally once, rather than having the
> +     large overhead of stdio buffering each item.   */
> +  for (i = optind; i < argc; i++)
> +    {
> +      size_t len = strlen (argv[i]);
> +      if (BUFSIZ < len || BUFSIZ - len <= pbuf - buf)
> +        break;
> +      memcpy (pbuf, argv[i], len);
> +      pbuf += len;
> +      *pbuf++ = i == argc - 1 ? '\n' : ' ';
> +    }
> +  if (i < argc)
> +    pbuf = NULL;

since the buffer is partly filled, wouldn't be better to reuse it?

Something like this (barely tested):

diff --git a/src/yes.c b/src/yes.c
index 91dea11..ac690ce 100644
--- a/src/yes.c
+++ b/src/yes.c
@@ -92,9 +92,7 @@ main (int argc, char **argv)
       pbuf += len;
       *pbuf++ = i == argc - 1 ? '\n' : ' ';
     }
-  if (i < argc)
-    pbuf = NULL;
-  else
+  if (i == argc)
     {
       size_t line_len = pbuf - buf;
       size_t lines = BUFSIZ / line_len;
@@ -106,7 +104,7 @@ main (int argc, char **argv)
     }
 
   /* The normal case is to continuously output the local buffer.  */
-  while (pbuf)
+  while (i == argc)
     {
       if (write (STDOUT_FILENO, buf, pbuf - buf) == -1)
         {
@@ -117,11 +115,17 @@ main (int argc, char **argv)
 
   /* If the data doesn't fit in BUFSIZ then it's large
      and not too onerous to output using stdio in any case.  */
-  while (! pbuf)
+  while (i != argc)
     {
-      for (i = optind; i < argc; i++)
-        if (fputs (argv[i], stdout) == EOF
-            || putchar (i == argc - 1 ? '\n' : ' ') == EOF)
+      int j;
+      if ((pbuf - buf) && fwrite (buf, pbuf - buf, 1, stdout) == 0)
+        {
+          error (0, errno, _("standard output"));
+          return EXIT_FAILURE;
+        }
+      for (j = i; j < argc; j++)
+        if (fputs (argv[j], stdout) == EOF
+            || putchar (j == argc - 1 ? '\n' : ' ') == EOF)
           {
             error (0, errno, _("standard output"));
             return EXIT_FAILURE;


The difference seems to be (without the patch):

$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686  -

real    0m7.994s
user    0m11.093s
sys     0m2.953s

versus (with the patch):

$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686  -

real    0m3.534s
user    0m4.164s
sys     0m1.803s

Regards,
Giuseppe






reply via email to

[Prev in Thread] Current Thread [Next in Thread]