bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #19363] running any of the sum tools on a filename with gives s


From: Eric Blake
Subject: Re: [bug #19363] running any of the sum tools on a filename with gives superfluous in output
Date: Wed, 21 Mar 2007 07:25:58 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.10) Gecko/20070221 Thunderbird/1.5.0.10 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Jim Meyering on 3/21/2007 3:52 AM:
>>
>> Closing.  The leading \ is indeed a feature, and has been documented in
>> coreutils md5sum and sha*sum for years when \\ or \n appear in a filename

It appears I am wrong.  I could find no mention of it in coreutils.texi.

>> (although I would like to see \r added to this list).  There is no more
>> elegant solution for handling arbitrary file names, and it is a security hole
>> if you can't handle arbitrary file names.
> 
> Thanks for handling that.
> I suppose the addition of \r-handling would be useful for Cygwin.
> On other types of systems, file names containing \r are sufficiently
> rare that the impact would be negligible.
> 
> Do you want to add the feature?

How about this?  A related change, but not added here until I get
feedback, would be that if a line ends in \r, but a filename cannot be
found that also ends in \r, then try stripping the \r rather than giving
up right away.  In fact, if the \r escape sequence is seen, then always
strip literal \r.  This would be more forgiving of the case where, even on
Linux, a user is parsing a DOS-ending checksum file copied from a network.
 Oh, and I guess I should submit a followup with testsuite changes, too
(using tests/md5sum/newline-1 as a pattern).

Also, I would really like to change the flag character for text vs. binary
mode.  Ever since md5sum was changed to default to binary calculations
(which was a good change, since you often run it on binaries, where a text
sum is worthless), it is annoying that on cygwin, I have to manually
delete the * flag character to make output match Linux.  I would much
rather have an alternate flag character that is used to denote text mode,
since I feel that is the exceptional case; but for backwards
compatibility, it can't be *.  Would you accept a patch to make md5sum
output ' ' for binary and '^' for text (instead of the current ' ' for
text and '*' for binary), so long as it can continue to parse the old
formats?  The only breakage will be people with a check file where
checksums were created in text mode by older versions, but trying to
verify using newer coreutils, and even then, it is only on platforms where
text vs. binary makes a difference; nor is the breakage that common, since
text-mode checksums have to be explicitly requested.

ChangeLog:
2007-03-21  Eric Blake  <address@hidden>

        Avoid ambiguity with checkfile in DOS text file format.
        * src/md5sum.c (split_3): Parse escaped \r in filename.
        (main): Escape \r on output.
        * NEWS: Document this change.

doc/ChangeLog:
2007-03-21  Eric Blake  <address@hidden>

        * coreutils.texi (md5sum invocation): Document escapes in output
        format.
        Reported by Armijn Hemel.

[Is there an easy way to make git or cogito commit the ChangeLog changes
alongside the patch, but to omit them from the patchfile that I mail,
since I am duplicating the changelog in the body of the mail?]

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGATJl84KuGfSFAYARAkqrAJ9RTTc95AfIV3C5IIhxHR+EvpZC+QCfZhCz
274NKHa+CIkmy0CLZmZrQGs=
=dwZ2
-----END PGP SIGNATURE-----
Avoid ambiguity with checkfile in DOS text file format.

---
commit ac3ca7f2f07c744c4ac0e3056d4c21fa50a14c41
tree ed550f2b0820ae1662542d02f20daea2f6049029
parent 373c95c45ae5ac72a585bbec1ffb6185da84f29d
author Eric Blake <address@hidden> Wed, 21 Mar 2007 06:57:29 -0600
committer Eric Blake <address@hidden> Wed, 21 Mar 2007 06:57:29 -0600

 ChangeLog          |    7 +++++++
 NEWS               |    7 +++++++
 doc/ChangeLog      |    6 ++++++
 doc/coreutils.texi |    4 ++++
 src/md5sum.c       |   18 ++++++++++++++----
 5 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index dea085f..8fcef23 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2007-03-21  Eric Blake  <address@hidden>
+
+       Avoid ambiguity with checkfile in DOS text file format.
+       * src/md5sum.c (split_3): Parse escaped \r in filename.
+       (main): Escape \r on output.
+       * NEWS: Document this change.
+
 2007-03-21  Jim Meyering  <address@hidden>
 
        * gl/lib/savewd.c: Remove this file, since the savewd_save change
diff --git a/NEWS b/NEWS
index 4efb18f..8349b98 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,13 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   Using pr -m -s (i.e. merging files, with TAB as the output separator)
   no longer inserts extraneous spaces between output columns.
 
+** Improved robustness
+
+  md5sum, sha1sum, and the other similar checksumming utilities, now
+  escape any carriage return in the file name, in addition to newline
+  or backslash, in order to avoid ambiguity when the checksum file is
+  parsed with DOS line endings.
+
 
 * Noteworthy changes in release 6.8 (2007-02-24) [not-unstable]
 
diff --git a/doc/ChangeLog b/doc/ChangeLog
index c4171b8..390627a 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,9 @@
+2007-03-21  Eric Blake  <address@hidden>
+
+       * coreutils.texi (md5sum invocation): Document escapes in output
+       format.
+       Reported by Armijn Hemel.
+
 2007-03-15  Paul Eggert  <address@hidden>
 
        Fix manual in response to bug reports by Dan Jacobson.
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 7a4cc2d..81b1fc7 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3203,6 +3203,10 @@ md5sum address@hidden@dots{} address@hidden@dots{}
 
 For each @var{file}, @samp{md5sum} outputs the MD5 checksum, a flag
 indicating a binary or text input file, and the file name.
+If @var{file} contains a backslash, newline, or carriage return, the
+line is started with a backslash, and the problematic characters in
+the file name are escaped with a backslash, making the output
+unambiguous even in the presence of arbitrary file names.
 If @var{file} is omitted or specified as @samp{-}, standard input is read.
 
 The program accepts the following options.  Also see @ref{Common options}.
diff --git a/src/md5sum.c b/src/md5sum.c
index a8ce1cf..22c17de 100644
--- a/src/md5sum.c
+++ b/src/md5sum.c
@@ -1,5 +1,5 @@
 /* Compute MD5, SHA1, SHA224, SHA256, SHA384 or SHA512 checksum of files or 
strings
-   Copyright (C) 1995-2006 Free Software Foundation, Inc.
+   Copyright (C) 1995-2007 Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -299,6 +299,7 @@ split_3 (char *s, size_t s_len,
   if (escaped_filename)
     {
       /* Translate each `\n' string in the file name to a NEWLINE,
+        each '\r' to a CARRIAGE RETURN,
         and each `\\' string to a backslash.  */
 
       char *dst = &s[i];
@@ -319,11 +320,14 @@ split_3 (char *s, size_t s_len,
                case 'n':
                  *dst++ = '\n';
                  break;
+               case 'r':
+                 *dst++ = '\r';
+                 break;
                case '\\':
                  *dst++ = '\\';
                  break;
                default:
-                 /* Only `\' or `n' may follow a backslash.  */
+                 /* Only `\', '\n', or '\r' may follow a backslash.  */
                  return false;
                }
              break;
@@ -679,8 +683,9 @@ main (int argc, char **argv)
              size_t i;
 
              /* Output a leading backslash if the file name contains
-                a newline or backslash.  */
-             if (strchr (file, '\n') || strchr (file, '\\'))
+                a newline, carriage return, or backslash.  */
+             if (strchr (file, '\n') || strchr (file, '\\')
+                 || strchr (file, '\r'))
                putchar ('\\');
 
              for (i = 0; i < (digest_hex_bytes / 2); ++i)
@@ -693,6 +698,7 @@ main (int argc, char **argv)
                putchar (' ');
 
              /* Translate each NEWLINE byte to the string, "\\n",
+                each CARRIAGE RETURN byte to the string "\\r",
                 and each backslash to "\\\\".  */
              for (i = 0; i < strlen (file); ++i)
                {
@@ -702,6 +708,10 @@ main (int argc, char **argv)
                      fputs ("\\n", stdout);
                      break;
 
+                   case '\r':
+                     fputs ("\\r", stdout);
+                     break;
+
                    case '\\':
                      fputs ("\\\\", stdout);
                      break;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]