[Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0

findutils-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0

From:	James Youngman
Subject:	[Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0 database format.
Date:	Sat, 9 Jan 2016 22:26:08 +0000
* locate/testsuite/Makefile.am (EXTRA_DIST_EXP): Remove
locate.gnu/old_prefix.exp and locate.gnu/oldformat.exp.
(EXTRA_DIST_XO): Remove locate.gnu/old_prefix.xo and
locate.gnu/oldformat.xo.
* doc/find.texi (Database Formats): Remove the warning about old
versions of locate failing to read the LOCATE02 database format.
Mention that the slocate database format is also supported.
(Old Database Format): Point out that updatedb will no longer
produce the old format.
(Invoking updatedb): Remove mention of the --old-format option.
Remove mention of --dbformat=old.
(Long File Name Bugs with Old-Format Databases): Remove this
section.
* locate/updatedb.sh: remove support for --dbformat=old and
--old-format.
(checkbinary): Don't look for the bigram and code binaries.
* locate/updatedb.1: Explain that support for the old database
format has been removed from updatedb and will shortly be removed
from locate also.  Remove the documentation for the removed
option --old-format and mention of --dbformat-old.
* locate/code.c: remove since this program was only used to
generate old-format databases.
* locate/bigram.c: remove since this program was only used to
generate old-format databases.
* po/POTFILES.in: Remove bigram.c and code.c.
* locate/word_io.c (putword): Remove this function, since it was
only needed for making old-format databases.
* find/find.1 (NON-BUGS): Don't mention bigram.c and code.c in the
example.
* locate/locatedb.h: Remove declaration of putword, which has been
deleted.
* locate/Makefile.am (libexec_PROGRAMS): Remove bigram and code
(since they were only used to generate old-format databases).
(updatedb): Don't substitute @bigram@ and @address@hidden
(code_SOURCES): Delete.
* locate/testsuite/locate.gnu/old_prefix.exp: delete test case for
the old database format.
* locate/testsuite/locate.gnu/old_prefix.xo: Likewise.
* locate/testsuite/locate.gnu/oldformat.exp: Likewise.
* locate/testsuite/locate.gnu/oldformat.xo: Likewise.
* TODO: manpages for bigram and code are no longer needed.
* NEWS: Mention these changes.
---
 NEWS                                       |   6 +
 TODO                                       |   2 +-
 doc/find.texi                              | 125 +++----------
 find/find.1                                |   2 +-
 locate/Makefile.am                         |  12 +-
 locate/bigram.c                            | 140 --------------
 locate/code.c                              | 285 -----------------------------
 locate/locatedb.h                          |   4 -
 locate/testsuite/Makefile.am               |   6 +-
 locate/testsuite/locate.gnu/old_prefix.exp |  13 --
 locate/testsuite/locate.gnu/old_prefix.xo  |   5 -
 locate/testsuite/locate.gnu/oldformat.exp  |  12 --
 locate/testsuite/locate.gnu/oldformat.xo   |   1 -
 locate/updatedb.1                          |  26 +--
 locate/updatedb.sh                         | 143 +++------------
 locate/word_io.c                           |  23 ---
 po/POTFILES.in                             |   2 -
 17 files changed, 70 insertions(+), 737 deletions(-)
 delete mode 100644 locate/bigram.c
 delete mode 100644 locate/code.c
 delete mode 100644 locate/testsuite/locate.gnu/old_prefix.exp
 delete mode 100644 locate/testsuite/locate.gnu/old_prefix.xo
 delete mode 100644 locate/testsuite/locate.gnu/oldformat.exp
 delete mode 100644 locate/testsuite/locate.gnu/oldformat.xo

diff --git a/NEWS b/NEWS
index 8865b8e..e50bd9d 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,12 @@ GNU findutils NEWS - User visible changes.      -*- outline 
-*- (allout)
 
 ** Changes to locate / updatedb
 
+Support for generating old-format databases (with updatedb
+--old-format or updatedb --dbformat=old) has been removed.  The old
+database format was deprecated in 2007 (and updatedb has warned about
+this since that time).  The locate program will will read old-format
+databases, though this support also will be removed.
+
 The updatedb script now operates in the C locale only.  This means
 that character encoding issues are now not likely to cause sort to
 fail.  It also honours the TMPDIR environment variable if that was
diff --git a/TODO b/TODO
index 50fe730..54ede6f 100644
--- a/TODO
+++ b/TODO
@@ -2,7 +2,7 @@
 * Internationalization
 ** updatedb.sh should be internationalized
 
-* man pages for frcode, bigram, and code
+* man page for frcode
 Perhaps a better description in texi pages as well.
 
 * Add option for find to sort output in lexical order for use for updatedb
diff --git a/doc/find.texi b/doc/find.texi
index c61bbd0..fdeaefa 100644
--- a/doc/find.texi
+++ b/doc/find.texi
@@ -2912,18 +2912,17 @@ directory trees when the databases were last updated.  
The file name
 database format changed starting with GNU @code{locate} version 4.0 to
 allow machines with different byte orderings to share the databases.
 
-GNU @code{locate} can read both the old and new database formats.
-However, old versions of @code{locate} (on other Unix systems, or GNU
address@hidden before version 4.0) produce incorrect results if run
-against a database in something other than the old format.
-
-Support for the old database format will eventually be discontinued,
-first in @code{updatedb} and later in @code{locate}.
+GNU @code{locate} can read both the old pre-findutils-4.0 database
+format and the @samp{LOCATE02} database format.  Support for the old
+database format will shortly be removed from @code{locate}.  It has
+already been removed from @code{updatedb}.
 
 If you run @samp{locate --statistics}, the resulting summary indicates
 the type of each @code{locate} database.   You select which database
 format @code{updatedb} will use with the @samp{--dbformat} option.
 
+The @samp{slocate} database format is very similar to @samp{LOCATE02}
+and is also supported (in both @code{updatedb} and @code{locate}).
 
 @menu
 * LOCATE02 Database Format::
@@ -3024,21 +3023,20 @@ interpreted as for the GNU LOCATE02 format.
 @subsection Old Database Format
 
 The old database format is used by Unix @code{locate} and @code{find}
-programs and earlier releases of the GNU ones.  @code{updatedb}
-produces this format if given the @samp{--old-format} option.
-
address@hidden runs programs called @code{bigram} and @code{code} to
-produce old-format databases.  The old format differs from the new one
-in the following ways.  Instead of each entry starting with an
-offset-differential count byte and ending with a null, byte values
-from 0 through 28 indicate offset-differential counts from -14 through
-14.  The byte value indicating that a long offset-differential count
-follows is 0x1e (30), not 0x80.  The long counts are stored in host
-byte order, which is not necessarily network byte order, and host
-integer word size, which is usually 4 bytes.  They also represent a
-count 14 less than their value.  The database lines have no
-termination byte; the start of the next line is indicated by its first
-byte having a value <= 30.
+programs and pre-4.0 releases of GNU findutils.  @code{locate}
+understands this format, though @code{updatedb} will no longer produce
+it.
+
+The old format differs from @samp{LOCATE02} in the following ways.
+Instead of each entry starting with an offset-differential count byte
+and ending with a null, byte values from 0 through 28 indicate
+offset-differential counts from -14 through 14.  The byte value
+indicating that a long offset-differential count follows is 0x1e (30),
+not 0x80.  The long counts are stored in host byte order, which is not
+necessarily network byte order, and host integer word size, which is
+usually 4 bytes.  They also represent a count 14 less than their
+value.  The database lines have no termination byte; the start of the
+next line is indicated by its first byte having a value <= 30.
 
 In addition, instead of starting with a dummy entry, the old database
 format starts with a 256 byte table containing the 128 most common
@@ -3049,17 +3047,13 @@ offset-differential count coding makes these databases 
20-25% smaller
 than the new format, but makes them not 8-bit clean.  Any byte in a
 file name that is in the ranges used for the special codes is replaced
 in the database by a question mark, which not coincidentally is the
-shell wildcard to match a single character.
+shell wildcard to match a single character. The old format therefore
+cannot faithfully store entries with non-ASCII characters.
 
-The old format therefore cannot faithfully store entries with
-non-ASCII characters. It therefore should not be used in
-internationalised environments.  That is, most installations should
-not use it.
-
-Because the long counts are stored by the @code{code} program as
+Because the long counts are stored as
 native-order machine words, the database format is not easily used in
 environments which differ in terms of byte order.  If locate databases
-are to be shared between machines, the LOCATE02 database format should
+are to be shared between machines, the @samp{LOCATE02} database format should
 be used.  This has other benefits as discussed above.  However, the
 length of the filename currently being processed can normally be used
 to place reasonable limits on the long counts and so this information
@@ -3098,16 +3092,6 @@ the newline character, meaning that parts of file names 
containing
 newlines will be incorrectly sorted.  This can result in both
 incorrect matches and incorrect failures to match.
 
-On the other hand, if you are using the old database format, file
-names with embedded newlines are not correctly handled.  There is no
-technical limitation which enforces this, it's just that the
address@hidden program has not been updated to support lists of file
-names separated by nulls.
-
-So, if you are using the new database format (this is the default) and
-your system uses GNU @code{sort}, newlines will be correctly handled
-at all times.  Otherwise, newlines may not be correctly handled.
-
 @node File Permissions
 @chapter File Permissions
 
@@ -3631,24 +3615,12 @@ The user to search network directories as, using 
@code{su}.  Default
 @code{user} is @code{daemon}.  You can also use the environment variable
 @code{NETUSER} to set this user.
 
address@hidden --old-format
-Generate a @code{locate} database in the old format, for compatibility
-with versions of @code{locate} other than GNU @code{locate}.  Using
-this option means that @code{locate} will not be able to properly
-handle non-ASCII characters in file names (that is, file names
-containing characters which have the eighth bit set, such as many of
-the characters from the ISO-8859-1 character set).  @xref{Database
-Formats}, for a detailed description of the supported database
-formats.
-
 @item address@hidden
 Generate the locate database in format @code{FORMAT}.  Supported
-database formats include @code{LOCATE02} (which is the default),
address@hidden and @code{slocate}.  The @code{old} format exists for
-compatibility with implementations of @code{locate} on other Unix
-systems.  The @code{slocate} format exists for compatibility with
address@hidden  @xref{Database Formats}, for a detailed description
-of each format.
+database formats include @code{LOCATE02} (which is the default) and
address@hidden  The @code{slocate} format exists for compatibility
+with @code{slocate}. @xref{Database Formats}, for a detailed
+description of each format.
 
 @item --help
 Print a summary of the command line usage and exit.
@@ -5377,47 +5349,6 @@ resolved by using @code{locate}'s @samp{-0} option, this 
still leaves
 the race condition problems associated with @samp{find @dots{} -print0}.
 There is no way to avoid these problems in the case of @code{locate}.
 
address@hidden Long File Name Bugs with Old-Format Databases
-Old versions of @code{locate} have a bug in the way that old-format
-databases are read.  This bug affects the following versions of
address@hidden:
-
address@hidden
address@hidden All releases prior to 4.2.31
address@hidden All 4.3.x releases prior to 4.3.7
address@hidden enumerate
-
-The affected versions of @code{locate} read file names into a
-fixed-length 1026 byte buffer, allocated on the heap.  This buffer is
-not extended if file names are too long to fit into the buffer.  No
-range checking on the length of the filename is performed.  This could
-in theory lead to a privilege escalation attack.  Findutils versions
-4.3.0 to 4.3.6 are also affected.
-
-On systems using the old database format and affected versions of
address@hidden, carefully-chosen long file names could in theory allow
-malicious users to run code of their choice as any user invoking
-locate.
-
-If remote users can choose the names of files stored on your system,
-and these files are indexed by @code{updatedb}, this may be a remote
-security vulnerability.  Findutils version 4.2.31 and findutils
-version 4.3.7 include fixes for this problem.  The @code{updatedb},
address@hidden and @code{code} programs do no appear to be affected.
-
-If you are also using GNU coreutils, you can use the following command
-to determine the length of the longest file name on a given system:
-
address@hidden
-find / -print0 | tr -c '\0' 'x' | tr '\0' '\n' | wc -L
address@hidden example
-
-Although this problem is significant, the old database format is not
-the default, and use of the old database format is not common.  Most
-installations and most users will not be affected by this problem.
-
-
-
 @node Security Summary
 @section Summary
 
diff --git a/find/find.1 b/find/find.1
index 7827aca..f4e8473 100644
--- a/find/find.1
+++ b/find/find.1
@@ -2212,7 +2212,7 @@ resulting in
 actually receiving a command line like this:
 .nf
 .
-.B find . \-name bigram.c code.c frcode.c locate.c \-print
+.B find . \-name frcode.c locate.c word_io.c \-print
 .
 .fi
 That command is of course not going to work.  Instead of doing things
diff --git a/locate/Makefile.am b/locate/Makefile.am
index ba30d01..7b9a8f6 100644
--- a/locate/Makefile.am
+++ b/locate/Makefile.am
@@ -4,12 +4,9 @@ AM_CFLAGS = $(WARN_CFLAGS)
 LOCATE_DB = $(localstatedir)/locatedb
 localedir = $(datadir)/locale
 
-AM_INSTALLCHECK_STD_OPTIONS_EXEMPT = \
-       frcode$(EXEEXT) \
-       code$(EXEEXT) \
-       bigram$(EXEEXT)
+AM_INSTALLCHECK_STD_OPTIONS_EXEMPT = frcode$(EXEEXT)
 bin_PROGRAMS = locate
-libexec_PROGRAMS = frcode code bigram
+libexec_PROGRAMS = frcode
 bin_SCRIPTS = updatedb
 man_MANS = locate.1 updatedb.1 locatedb.5
 BUILT_SOURCES = dblocation.texi
@@ -18,7 +15,6 @@ CLEANFILES = updatedb
 
 DISTCLEANFILES = dblocation.texi
 locate_SOURCES = locate.c word_io.c
-code_SOURCES = code.c word_io.c
 nodist_locate_TEXINFOS = dblocation.texi
 
 AM_CPPFLAGS = -I$(top_srcdir)/lib -I../gl/lib -I$(top_srcdir)/gl/lib 
-DLOCATE_DB=\"$(LOCATE_DB)\" -DLOCALEDIR=\"$(localedir)\"
@@ -34,8 +30,6 @@ updatedb: updatedb.sh Makefile
        rm -f $@
        find=`echo find|sed '$(transform)'`; \
        frcode=`echo frcode|sed '$(transform)'`; \
-       bigram=`echo bigram|sed '$(transform)'`; \
-       code=`echo code|sed '$(transform)'`; \
        sed \
        -e "s,@""bindir""@,$(bindir)," \
        -e "s,@""libexecdir""@,$(libexecdir)," \
@@ -44,8 +38,6 @@ updatedb: updatedb.sh Makefile
        -e "s,@""PACKAGE_NAME""@,$(PACKAGE_NAME)," \
        -e "s,@""find""@,$${find}," \
        -e "s,@""frcode""@,$${frcode}," \
-       -e "s,@""bigram""@,$${bigram}," \
-       -e "s,@""code""@,$${code}," \
        -e "s,@""SORT""@,$(SORT)," \
        -e "s,@""SORT_SUPPORTS_Z""@,$(SORT_SUPPORTS_Z)," \
        $(srcdir)/updatedb.sh > $@
diff --git a/locate/bigram.c b/locate/bigram.c
deleted file mode 100644
index 56df447..0000000
--- a/locate/bigram.c
+++ /dev/null
@@ -1,140 +0,0 @@
-/* bigram -- list bigrams for locate
-   Copyright (C) 1994, 2007, 2009-2011, 2016 Free Software Foundation,
-   Inc.
-
-   This program is free software: you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation, either version 3 of the License, or
-   (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program.  If not, see <http://www.gnu.org/licenses/>.
-*/
-
-/* Usage: bigram < text > bigrams
-   Use `code' to encode a file using this output.
-
-   Read a file from stdin and write out the bigrams (pairs of
-   adjacent characters), one bigram per line, to stdout.  To reduce
-   needless duplication in the output, it starts finding the
-   bigrams on each input line at the character where that line
-   first differs from the previous line (i.e., in the ASCII
-   remainder).  Therefore, the input should be sorted in order to
-   get the least redundant output.
-
-   Written by James A. Woods <address@hidden>.
-   Modified by David MacKenzie <address@hidden>.  */
-
-/* config.h must always be included first. */
-#include <config.h>
-
-/* system headers. */
-#include <errno.h>
-#include <stdio.h>
-#include <locale.h>
-#include <string.h>
-#include <stdlib.h>
-#include <sys/types.h>
-
-/* gnulib headers. */
-#include "closeout.h"
-#include "gettext.h"
-#include "progname.h"
-#include "xalloc.h"
-#include "error.h"
-
-/* find headers would go here but we don't need any. */
-
-
-/* We use gettext because for example xmalloc may issue an error message. */
-#if ENABLE_NLS
-# include <libintl.h>
-# define _(Text) gettext (Text)
-#else
-# define _(Text) Text
-#define textdomain(Domain)
-#define bindtextdomain(Package, Directory)
-#endif
-
-
-/* Return the length of the longest common prefix of strings S1 and S2. */
-
-static int
-prefix_length (char *s1, char *s2)
-{
-  register char *start;
-
-  for (start = s1; *s1 == *s2 && *s1 != '\0'; s1++, s2++)
-    ;
-  return s1 - start;
-}
-
-int
-main (int argc, char **argv)
-{
-  char *path;                  /* The current input entry.  */
-  char *oldpath;               /* The previous input entry.  */
-  size_t pathsize, oldpathsize;        /* Amounts allocated for them.  */
-  int line_len;                        /* Length of input line.  */
-
-  if (argv[0])
-    set_program_name (argv[0]);
-  else
-    set_program_name ("bigram");
-
-#ifdef HAVE_SETLOCALE
-  setlocale (LC_ALL, "");
-#endif
-  bindtextdomain (PACKAGE, LOCALEDIR);
-  textdomain (PACKAGE);
-
-  (void) argc;
-  if (atexit (close_stdout))
-    {
-      error (EXIT_FAILURE, errno, _("The atexit library function failed"));
-    }
-
-  pathsize = oldpathsize = 1026; /* Increased as necessary by getline.  */
-  path = xmalloc (pathsize);
-  oldpath = xmalloc (oldpathsize);
-
-  /* Set to empty string, to force the first prefix count to 0.  */
-  oldpath[0] = '\0';
-
-  while ((line_len = getline (&path, &pathsize, stdin)) > 0)
-    {
-      register int count;      /* The prefix length.  */
-      register int j;          /* Index into input line.  */
-
-      path[line_len - 1] = '\0'; /* Remove the newline. */
-
-      /* Output bigrams in the remainder only. */
-      count = prefix_length (oldpath, path);
-      for (j = count; path[j] != '\0' && path[j + 1] != '\0'; j += 2)
-       {
-         putchar (path[j]);
-         putchar (path[j + 1]);
-         putchar ('\n');
-       }
-
-      {
-       /* Swap path and oldpath and their sizes.  */
-       char *tmppath = oldpath;
-       size_t tmppathsize = oldpathsize;
-       oldpath = path;
-       oldpathsize = pathsize;
-       path = tmppath;
-       pathsize = tmppathsize;
-      }
-    }
-
-  free (path);
-  free (oldpath);
-
-  return 0;
-}
diff --git a/locate/code.c b/locate/code.c
deleted file mode 100644
index 92f267c..0000000
--- a/locate/code.c
+++ /dev/null
@@ -1,285 +0,0 @@
-/* code -- bigram- and front-encode filenames for locate
-   Copyright (C) 1994, 2005, 2007-2008, 2010-2011, 2016 Free Software
-   Foundation, Inc.
-
-   This program is free software: you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation, either version 3 of the License, or
-   (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program.  If not, see <http://www.gnu.org/licenses/>.
-*/
-
-/* Compress a sorted list.
-   Works with `find' to encode a filename database to save space
-   and search time.
-
-   Usage:
-
-   bigram < file_list > bigrams
-   process-bigrams > most_common_bigrams
-   code most_common_bigrams < file_list > squeezed_list
-
-   Uses `front compression' (see ";login:", March 1983, p. 8).
-   The output begins with the 128 most common bigrams.
-   After that, the output format is, for each line,
-   an offset (from the previous line) differential count byte
-   followed by a (partially bigram-encoded) ASCII remainder.
-   The output lines have no terminating byte; the start of the next line
-   is indicated by its first byte having a value <= 30.
-
-   The encoding of the output bytes is:
-
-   0-28                likeliest differential counts + offset (14) to make 
nonnegative
-   30          escape code for out-of-range count to follow in next halfword
-   128-255      bigram codes (the 128 most common, as determined by `updatedb')
-   32-127       single character (printable) ASCII remainder
-
-   Written by James A. Woods <address@hidden>.
-   Modified by David MacKenzie <address@hidden>.  */
-
-/* config.h should always be included first. */
-#include <config.h>
-
-/* system headers. */
-#include <errno.h>
-#include <stdbool.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/types.h>
-
-/* gnulib headers. */
-#include "closeout.h"
-#include "error.h"
-#include "gettext.h"
-#include "progname.h"
-#include "xalloc.h"
-
-/* find headers. */
-#include "findutils-version.h"
-#include "locatedb.h"
-
-#if ENABLE_NLS
-# include <libintl.h>
-# define _(Text) gettext (Text)
-#else
-# define _(Text) Text
-#define textdomain(Domain)
-#define bindtextdomain(Package, Directory)
-#endif
-
-
-#ifndef ATTRIBUTE_NORETURN
-# define ATTRIBUTE_NORETURN __attribute__ ((__noreturn__))
-#endif
-
-
-/* The 128 most common bigrams in the file list, padded with NULs
-   if there are fewer.  */
-static char bigrams[257] = {0};
-
-/* Return the offset of PATTERN in STRING, or -1 if not found. */
-
-static int
-strindex (char *string, char *pattern)
-{
-  register char *s;
-
-  for (s = string; *s != '\0'; s++)
-    /* Fast first char check. */
-    if (*s == *pattern)
-      {
-       register char *p2 = pattern + 1, *s2 = s + 1;
-       while (*p2 != '\0' && *p2 == *s2)
-         p2++, s2++;
-       if (*p2 == '\0')
-         return s2 - strlen (pattern) - string;
-      }
-  return -1;
-}
-
-/* Return the length of the longest common prefix of strings S1 and S2. */
-
-static int
-prefix_length (char *s1, char *s2)
-{
-  register char *start;
-
-  for (start = s1; *s1 == *s2 && *s1 != '\0'; s1++, s2++)
-    ;
-  return s1 - start;
-}
-
-extern char *version_string;
-
-static void
-usage (FILE *stream)
-{
-  fprintf (stream, _("\
-Usage: %s [--version | --help]\n\
-or     %s most_common_bigrams < file-list > locate-database\n"),
-          program_name, program_name);
-  fputs (_("\nReport bugs to <address@hidden>.\n"), stream);
-}
-
-
-static void inerr (const char *filename) ATTRIBUTE_NORETURN;
-static void outerr (void)                 ATTRIBUTE_NORETURN;
-
-static void
-inerr (const char *filename)
-{
-  error (EXIT_FAILURE, errno, "%s", filename);
-  /*NOTREACHED*/
-  abort ();
-}
-
-static void
-outerr (void)
-{
-  error (EXIT_FAILURE, errno, _("write error"));
-  /*NOTREACHED*/
-  abort ();
-}
-
-
-int
-main (int argc, char **argv)
-{
-  char *path;                  /* The current input entry.  */
-  char *oldpath;               /* The previous input entry.  */
-  size_t pathsize, oldpathsize;        /* Amounts allocated for them.  */
-  int count, oldcount, diffcount; /* Their prefix lengths & the difference. */
-  char bigram[3];              /* Bigram to search for in table.  */
-  int code;                    /* Index of `bigram' in bigrams table.  */
-  FILE *fp;                    /* Most common bigrams file.  */
-  int line_len;                        /* Length of input line.  */
-
-  set_program_name (argv[0]);
-  if (atexit (close_stdout))
-    {
-      error (EXIT_FAILURE, errno, _("The atexit library function failed"));
-    }
-
-  bigram[2] = '\0';
-
-  if (argc != 2)
-    {
-      usage (stderr);
-      return 2;
-    }
-
-  if (0 == strcmp (argv[1], "--help"))
-    {
-      usage (stdout);
-      return 0;
-    }
-  else if (0 == strcmp (argv[1], "--version"))
-    {
-      display_findutils_version ("code");
-      return 0;
-    }
-
-  fp = fopen (argv[1], "r");
-  if (fp == NULL)
-    {
-      fprintf (stderr, "%s: ", argv[0]);
-      perror (argv[1]);
-      return 1;
-    }
-
-  pathsize = oldpathsize = 1026; /* Increased as necessary by getline.  */
-  path = xmalloc (pathsize);
-  oldpath = xmalloc (oldpathsize);
-
-  /* Set to empty string, to force the first prefix count to 0.  */
-  oldpath[0] = '\0';
-  oldcount = 0;
-
-  /* Copy the list of most common bigrams to the output,
-     padding with NULs if there are <128 of them.  */
-  if (NULL == fgets (bigrams, 257, fp))
-    inerr (argv[1]);
-
-  if (256 != fwrite (bigrams, 1, 256, stdout))
-     outerr ();
-
-  if (EOF == fclose (fp))
-     inerr (argv[1]);
-
-  while ((line_len = getline (&path, &pathsize, stdin)) > 0)
-    {
-      char *pp;
-
-      path[line_len - 1] = '\0'; /* Remove newline. */
-
-      /* Squelch unprintable chars in path so as not to botch decoding.  */
-      for (pp = path; *pp != '\0'; pp++)
-       {
-         if (!(*pp >= 040 && *pp < 0177))
-           *pp = '?';
-       }
-
-      count = prefix_length (oldpath, path);
-      diffcount = count - oldcount;
-      oldcount = count;
-      /* If the difference is small, it fits in one byte;
-        otherwise, two bytes plus a marker noting that fact.  */
-      if (diffcount < -LOCATEDB_OLD_OFFSET || diffcount > LOCATEDB_OLD_OFFSET)
-       {
-         if (EOF ==- putc (LOCATEDB_OLD_ESCAPE, stdout))
-           outerr ();
-
-         if (!putword (stdout,
-                       diffcount+LOCATEDB_OLD_OFFSET,
-                       GetwordEndianStateNative))
-           outerr ();
-       }
-      else
-       {
-         if (EOF == putc (diffcount + LOCATEDB_OLD_OFFSET, stdout))
-           outerr ();
-       }
-
-      /* Look for bigrams in the remainder of the path.  */
-      for (pp = path + count; *pp != '\0'; pp += 2)
-       {
-         if (pp[1] == '\0')
-           {
-             /* No bigram is possible; only one char is left.  */
-             putchar (*pp);
-             break;
-           }
-         bigram[0] = *pp;
-         bigram[1] = pp[1];
-         /* Linear search for specific bigram in string table. */
-         code = strindex (bigrams, bigram);
-         if (code % 2 == 0)
-           putchar ((code / 2) | 0200); /* It's a common bigram.  */
-         else
-           fputs (bigram, stdout); /* Write the text as printable ASCII.  */
-       }
-
-      {
-       /* Swap path and oldpath and their sizes.  */
-       char *tmppath = oldpath;
-       size_t tmppathsize = oldpathsize;
-       oldpath = path;
-       oldpathsize = pathsize;
-       path = tmppath;
-       pathsize = tmppathsize;
-      }
-    }
-
-  free (path);
-  free (oldpath);
-
-  return 0;
-}
diff --git a/locate/locatedb.h b/locate/locatedb.h
index 7fac71d..a46baa0 100644
--- a/locate/locatedb.h
+++ b/locate/locatedb.h
@@ -63,10 +63,6 @@ int getword (FILE *fp, const char *filename,
             size_t maxvalue,
             GetwordEndianState *endian_state_flag);
 
-bool putword (FILE *fp, int word,
-             GetwordEndianState endian_state_flag);
-
-
 #define SLOCATE_DB_MAGIC_LEN 2
 
 #endif /* !INC_LOCATEDB_H */
diff --git a/locate/testsuite/Makefile.am b/locate/testsuite/Makefile.am
index 4a4b001..1ce5284 100644
--- a/locate/testsuite/Makefile.am
+++ b/locate/testsuite/Makefile.am
@@ -41,8 +41,6 @@ locate.gnu/slocate.exp \
 locate.gnu/notexists1.exp \
 locate.gnu/notexists2.exp \
 locate.gnu/notexists3.exp \
-locate.gnu/old_prefix.exp \
-locate.gnu/oldformat.exp \
 locate.gnu/space1st.exp \
 locate.gnu/sv-bug-14535.exp \
 locate.gnu/exceedshort.exp
@@ -63,9 +61,7 @@ locate.gnu/exists3.xo \
 locate.gnu/slocate.xo \
 locate.gnu/notexists1.xo \
 locate.gnu/notexists2.xo \
-locate.gnu/notexists3.xo \
-locate.gnu/old_prefix.xo \
-locate.gnu/oldformat.xo
+locate.gnu/notexists3.xo
 
 EXTRA_DIST = $(EXTRA_DIST_EXP) $(EXTRA_DIST_XO) $(EXTRA_DIST_XI)
 
diff --git a/locate/testsuite/locate.gnu/old_prefix.exp 
b/locate/testsuite/locate.gnu/old_prefix.exp
deleted file mode 100644
index e21cc61..0000000
--- a/locate/testsuite/locate.gnu/old_prefix.exp
+++ /dev/null
@@ -1,13 +0,0 @@
-set tmp "tmp"
-exec rm -rf $tmp
-exec mkdir $tmp
-exec mkdir $tmp/subdir
-exec touch 
$tmp/subdir/________________________________________________________________________________fred1
-exec touch 
$tmp/subdir/________________________________________________________________________________fred2
-exec touch 
$tmp/subdir/________________________________________________________________________________fred3
-exec touch 
$tmp/subdir/________________________________________________________________________________fred4
-
-locate_start p "--changecwd=. --output=$tmp/locatedb --old-format  
--localpaths=tmp/subdir 2>/dev/null" "--database=$tmp/locatedb tmp" {}
-
-
-exec rm -rf $tmp
diff --git a/locate/testsuite/locate.gnu/old_prefix.xo 
b/locate/testsuite/locate.gnu/old_prefix.xo
deleted file mode 100644
index 909b8e7..0000000
--- a/locate/testsuite/locate.gnu/old_prefix.xo
+++ /dev/null
@@ -1,5 +0,0 @@
-tmp/subdir
-tmp/subdir/________________________________________________________________________________fred1
-tmp/subdir/________________________________________________________________________________fred2
-tmp/subdir/________________________________________________________________________________fred3
-tmp/subdir/________________________________________________________________________________fred4
diff --git a/locate/testsuite/locate.gnu/oldformat.exp 
b/locate/testsuite/locate.gnu/oldformat.exp
deleted file mode 100644
index a85c8b9..0000000
--- a/locate/testsuite/locate.gnu/oldformat.exp
+++ /dev/null
@@ -1,12 +0,0 @@
-# A basic test for the old database format.  We need this test because (among
-# other reasons) the updatedb script only uses our mktemp replacement when
-# it needs to run bigram/code.
-set tmp "tmp"
-exec rm -rf $tmp
-exec mkdir $tmp
-exec mkdir $tmp/subdir
-exec touch $tmp/subdir/fred
-# Redirect stderr to /dev/null to throw away the warning message about using
-# the old format, because otherwise the presence of the error message would
-# cause locate_start to signal a test case failure.
-locate_start p "--changecwd=. --output=$tmp/locatedb --old-format  
--localpaths=tmp/subdir/ 2>/dev/null" "--database=$tmp/locatedb -e fred" {}
diff --git a/locate/testsuite/locate.gnu/oldformat.xo 
b/locate/testsuite/locate.gnu/oldformat.xo
deleted file mode 100644
index fdda926..0000000
--- a/locate/testsuite/locate.gnu/oldformat.xo
+++ /dev/null
@@ -1 +0,0 @@
-tmp/subdir/fred
diff --git a/locate/updatedb.1 b/locate/updatedb.1
index ebba3da..518bfd8 100644
--- a/locate/updatedb.1
+++ b/locate/updatedb.1
@@ -26,19 +26,13 @@ Users can select which databases \fBlocate\fP searches 
using an
 environment variable or command line option; see \fBlocate\fP(1).
 Databases cannot be concatenated together.
 .P
-The file name database format changed starting with GNU
-.B find
-and
-.B locate
-version 4.0 to allow machines with different byte orderings to share
-the databases.  The new GNU
-.B locate
-can read both the old and new database formats.
-However, old versions of
+The @samp{LOCATGE02} database format was introduced in GNU findutils
+version 4.0 in order to allow machines with different byte orderings
+to share the databases.  GNU
 .B locate
-and
-.B find
-produce incorrect results if given a new-format database.
+can read both the old and @samp{LOCATE02} database formats, though
+support for the old pre-4.0 database format will be removed shortly.
+
 .SH OPTIONS
 .TP
 .B \-\-findoptions='\fI\-option1 \-option2...\fP'
@@ -88,16 +82,8 @@ The user to search network directories as, using \fBsu\fP(1).
 Default is \fBdaemon\fP.
 You can also use the environment variable \fBNETUSER\fP to set this user.
 .TP
-.B \-\-old\-format
-Create the database in the old format.  This is a synonym for
-.BR \-\-dbformat=old .
-.TP
 .B \-\-dbformat=F
 Create the database in format F.  The default format is called LOCATE02.
-F can be
-.B old
-to select the old database format (this is the same as specifying
-.BR \-\-old\-format ).
 Alternatively the
 .B slocate
 format is also supported.  When the
diff --git a/locate/updatedb.sh b/locate/updatedb.sh
index 3861915..f8d50ee 100644
--- a/locate/updatedb.sh
+++ b/locate/updatedb.sh
@@ -50,11 +50,11 @@ Usage: $0 [--findoptions='-option1 -option2...']
        [--localpaths='dir1 dir2...'] [--netpaths='dir1 dir2...']
        [--prunepaths='dir1 dir2...'] [--prunefs='fs1 fs2...']
        [--output=dbfile] [--netuser=user] [--localuser=user]
-       [--old-format] [--dbformat] [--version] [--help]
+       [--dbformat] [--version] [--help]
 
 Report bugs to <address@hidden>."
 changeto=/
-old=no
+
 for arg
 do
   # If we are unable to fork, the back-tick operator will
@@ -72,7 +72,6 @@ do
     --output) LOCATE_DB="$val" ;;
     --netuser) NETUSER="$val" ;;
     --localuser) LOCALUSER="$val" ;;
-    --old-format) old=yes ;;
     --changecwd)  changeto="$val" ;;
     --dbformat)   dbformat="$val" ;;
     --version) fail=0; echo "$version" || fail=1; exit $fail ;;
@@ -83,51 +82,32 @@ $usage" >&2
   esac
 done
 
-
-
-
-case "${dbformat:+yes}_${old}" in
-    yes_yes)
-       echo "The --dbformat and --old-format cannot both be specified." >&2
-       exit 1
-       ;;
-       *)
-       ;;
+frcode_options=""
+case "$dbformat" in
+    "")
+        # Default, use LOCATE02
+        ;;
+    LOCATE02)
+        ;;
+    slocate)
+        frcode_options="$frcode_options -S 1"
+        ;;
+    *)
+        # The "old" database format is no longer supported.
+        echo "Unsupported locate database format ${dbformat}: Supported 
formats are:" >&2
+        echo "LOCATE02, slocate" >&2
+        exit 1
 esac
 
-if test "$old" = yes || test "$dbformat" = "old" ; then
-    echo "Warning: future versions of findutils will shortly discontinue 
support for the old locate database format." >&2
-    old=yes
+
+if @SORT_SUPPORTS_Z@
+then
+    sort="@SORT@ -z"
+    print_option="-print0"
+    frcode_options="$frcode_options -0"
+else
     sort="@SORT@"
     print_option="-print"
-    frcode_options=""
-else
-    frcode_options=""
-    case "$dbformat" in
-       "")
-               # Default, use LOCATE02
-           ;;
-       LOCATE02)
-           ;;
-       slocate)
-           frcode_options="$frcode_options -S 1"
-           ;;
-       *)
-           echo "Unsupported locate database format ${dbformat}: Supported 
formats are:" >&2
-           echo "LOCATE02, slocate, old" >&2
-           exit 1
-    esac
-
-
-    if @SORT_SUPPORTS_Z@
-    then
-        sort="@SORT@ -z"
-        print_option="-print0"
-        frcode_options="$frcode_options -0"
-    else
-        sort="@SORT@"
-        print_option="-print"
-    fi
 fi
 
 getuid() {
@@ -230,8 +210,6 @@ fi
 # The names of the utilities to run to build the database.
 : ${find:=${BINDIR}/@address@hidden
 : ${frcode:=${LIBEXECDIR}/@address@hidden
-: ${bigram:=${LIBEXECDIR}/@address@hidden
-: ${code:=${LIBEXECDIR}/@address@hidden
 
 make_tempdir () {
     # This implementation is adapted from the GNU Autoconf manual.
@@ -263,7 +241,7 @@ checkbinary () {
     fi
 }
 
-for binary in $find $frcode $bigram $code
+for binary in $find $frcode
 do
   checkbinary $binary
 done
@@ -303,8 +281,6 @@ fi
 rm -f $LOCATE_DB.n
 trap 'rm -f $LOCATE_DB.n; exit' HUP TERM
 
-if test $old = no; then
-# LOCATE02 or slocate format
 if {
 cd "$changeto"
 if test -n "$SEARCHPATHS"; then
@@ -356,73 +332,4 @@ else
   rm -f $LOCATE_DB.n
 fi
 
-else # old
-
-if temp_directory="`make_tempdir`"; then
-    bigrams="${temp_directory}"/bigrams
-    filelist="${temp_directory}"/filelist
-else
-    echo "failed to create temporary directory" >&2
-    exit 1
-fi
-
-rm -f $LOCATE_DB.n
-trap 'rm -f $LOCATE_DB.n; rm -rf "${temp_directory}"; exit' HUP TERM
-
-# Alphabetize subdirectories before file entries using tr.  James Woods says:
-# "to get everything in monotonic collating sequence, to avoid some
-# breakage i'll have to think about."
-{
-cd "$changeto"
-if test -n "$SEARCHPATHS"; then
-  if [ "$LOCALUSER" != "" ]; then
-    # : A5
-    su $LOCALUSER `select_shell $LOCALUSER` -c \
-    "$find $SEARCHPATHS $FINDOPTIONS \
-     \( $prunefs_exp \
-     -type d -regex '$PRUNEREGEX' \) -prune -o $print_option" || exit $?
-  else
-    # : A6
-    $find $SEARCHPATHS $FINDOPTIONS \
-     \( $prunefs_exp \
-     -type d -regex "$PRUNEREGEX" \) -prune -o $print_option || exit $?
-  fi
-fi
-
-if test -n "$NETPATHS"; then
-  myuid=`getuid`
-  if [ "$myuid" = 0 ]; then
-    # : A7
-    su $NETUSER `select_shell $NETUSER` -c \
-     "$find $NETPATHS $FINDOPTIONS \\( -type d -regex '$PRUNEREGEX' -prune \\) 
-o $print_option" ||
-    exit $?
-  else
-    # : A8
-    $find $NETPATHS $FINDOPTIONS \( -type d -regex "$PRUNEREGEX" -prune \) -o 
$print_option ||
-    exit $?
-  fi
-fi
-} | tr / '\001' | $sort | tr '\001' / > "$filelist"
-
-# Compute the (at most 128) most common bigrams in the file list.
-$bigram $bigram_opts < $filelist | sort | uniq -c | sort -nr |
-  awk '{ if (NR <= 128) print $2 }' | tr -d '\012' > "$bigrams"
-
-# Code the file list.
-$code "$bigrams" < "$filelist" > $LOCATE_DB.n
-
-rm -rf "${temp_directory}"
-
-# To reduce the chances of breaking locate while this script is running,
-# put the results in a temp file, then rename it atomically.
-if test -s $LOCATE_DB.n; then
-  chmod 644 ${LOCATE_DB}.n
-  mv ${LOCATE_DB}.n $LOCATE_DB
-else
-  echo "updatedb: new database would be empty" >&2
-  rm -f $LOCATE_DB.n
-fi
-
-fi
-
 exit 0
diff --git a/locate/word_io.c b/locate/word_io.c
index ff80fa0..4b64b48 100644
--- a/locate/word_io.c
+++ b/locate/word_io.c
@@ -140,26 +140,3 @@ getword (FILE *fp,
       return decode_value (data, maxvalue, endian_state_flag, filename);
     }
 }
-
-
-bool
-putword (FILE *fp, int word,
-        GetwordEndianState endian_state_flag)
-{
-  size_t items_written;
-
-  /* You must decide before calling this function which
-   * endianness you want to use.
-   */
-  assert (endian_state_flag != GetwordEndianStateInitial);
-  if (GetwordEndianStateSwab == endian_state_flag)
-    {
-      word = bswap_32(word);
-    }
-
-  items_written = fwrite (&word, sizeof (word), 1, fp);
-  if (1 == items_written)
-    return true;
-  else
-    return false;
-}
diff --git a/po/POTFILES.in b/po/POTFILES.in
index 6a1a7dd..f6b7aed 100644
--- a/po/POTFILES.in
+++ b/po/POTFILES.in
@@ -22,8 +22,6 @@ lib/findutils-version.c
 lib/listfile.c
 lib/regextype.c
 lib/safe-atoi.c
-locate/bigram.c
-locate/code.c
 locate/frcode.c
 locate/locate.c
 locate/word_io.c
-- 
2.1.4
[Prev in Thread]
Current Thread
[Next in Thread]
[Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0 database format., James Youngman <=
- Re: [Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0 database format., Eric Blake, 2016/01/13
  - Re: [Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0 database format., James Youngman, 2016/01/24
Prev by Date: [Findutils-patches] [PATCH] updatedb: run in the C locale, don't do case-folding.
Next by Date: Re: [Findutils-patches] [PATCH] updatedb: run in the C locale, don't do case-folding.
Previous by thread: [Findutils-patches] [PATCH] updatedb: run in the C locale, don't do case-folding.
Next by thread: Re: [Findutils-patches] [PATCH] updatedb: Remove support for the old pre-4.0 database format.
Index(es):
- Date
- Thread