findutils-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Findutils-patches] [PATCH] updatedb: run in the C locale, don't do case


From: James Youngman
Subject: [Findutils-patches] [PATCH] updatedb: run in the C locale, don't do case-folding.
Date: Sat, 9 Jan 2016 21:18:24 +0000

* locate/updatedb.sh: Set LC_ALL to C to avoid unexpected character
encodings in path names causing sort to fail (idea from Clarence
Risher).  Don't do case-folding, since the character set in now C,
which is likely inconsistent with the user's expectations anyway.
Honour $TMPDIR. Correct the error message you get if you specify
both --old-format and --dbformat.
* NEWS: Explain these changes.
---
 NEWS               |  7 +++++++
 locate/updatedb.sh | 33 ++++++++++++++++++++++++---------
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/NEWS b/NEWS
index f72f021..8865b8e 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,13 @@ GNU findutils NEWS - User visible changes.      -*- outline 
-*- (allout)
 
 * Major changes in release 4.7.0-git, YYYY-MM-DD
 
+** Changes to locate / updatedb
+
+The updatedb script now operates in the C locale only.  This means
+that character encoding issues are now not likely to cause sort to
+fail.  It also honours the TMPDIR environment variable if that was
+set, and no longer sorts file names case-insensitively.
+
 ** Translations
 
 Updated translations: Hungarian, Slovak, Dutch, German.
diff --git a/locate/updatedb.sh b/locate/updatedb.sh
index 9cb2811..3861915 100644
--- a/locate/updatedb.sh
+++ b/locate/updatedb.sh
@@ -31,6 +31,19 @@ There is NO WARRANTY, to the extent permitted by law.
 Written by Eric B. Decker, James Youngman, and Kevin Dalley.
 '
 
+# File path names are not actually text, anyway (since there is no
+# mechanism to enforce any constraint that the basename of a
+# subdirectory has the same character encoding as the basename of its
+# parent).  The practical effect is that, depending on the way a
+# oarticular system is configured and the content of its filesystem,
+# passing all the file names in the system through "sort" may generate
+# character encoding errors in text-based tools like "sort".  To avoid
+# this, we set LC_ALL=C.  This will, presumably, not work perfectly on
+# systems where LC_ALL is not the way to do locale configuration or
+# some other seting can override this.
+LC_ALL=C
+export LC_ALL
+
 
 usage="\
 Usage: $0 [--findoptions='-option1 -option2...']
@@ -75,7 +88,7 @@ done
 
 case "${dbformat:+yes}_${old}" in
     yes_yes)
-       echo "The --dbformat and --old cannot both be specified." >&2
+       echo "The --dbformat and --old-format cannot both be specified." >&2
        exit 1
        ;;
        *)
@@ -186,12 +199,14 @@ test -z "$PRUNEREGEX" &&
 : address@hidden@}
 
 # Directory to hold intermediate files.
-if test -d /var/tmp; then
-  : ${TMPDIR=/var/tmp}
-elif test -d /usr/tmp; then
-  : ${TMPDIR=/usr/tmp}
-else
-  : ${TMPDIR=/tmp}
+if test -z "$TMPDIR"; then
+  if test -d /var/tmp; then
+    : ${TMPDIR=/var/tmp}
+  elif test -d /usr/tmp; then
+    : ${TMPDIR=/usr/tmp}
+  else
+    : ${TMPDIR=/tmp}
+  fi
 fi
 export TMPDIR
 
@@ -320,7 +335,7 @@ if [ "$myuid" = 0 ]; then
     exit $?
   fi
 fi
-} | $sort -f | $frcode $frcode_options > $LOCATE_DB.n
+} | $sort | $frcode $frcode_options > $LOCATE_DB.n
 then
     : OK so far
     true
@@ -387,7 +402,7 @@ if test -n "$NETPATHS"; then
     exit $?
   fi
 fi
-} | tr / '\001' | $sort -f | tr '\001' / > "$filelist"
+} | tr / '\001' | $sort | tr '\001' / > "$filelist"
 
 # Compute the (at most 128) most common bigrams in the file list.
 $bigram $bigram_opts < $filelist | sort | uniq -c | sort -nr |
-- 
2.1.4




reply via email to

[Prev in Thread] Current Thread [Next in Thread]