bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch #3596] Sort directories before files in "ls"


From: Eric Blake
Subject: Re: [patch #3596] Sort directories before files in "ls"
Date: Thu, 29 Dec 2005 07:37:27 -0700
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Francesco Montorsi on 12/27/2005 11:42 AM:
> I attach the updated patch to this mail.
> It should address all previous points you told me...
> I've also created a separed changelog patch attached at:
> 
> http://savannah.gnu.org/patch/?func=detailitem&item_id=3596

Please just include the ChangeLog entry inline with the email that has the
attached patch; no need to use diff(1) on the ChangeLog, since we can't
use patch(1) to apply the changelog portion of the patch anyways (because
9 times out of 10 it would lead to a spurious conflict).  For a good
example of what a typical patch submission on this list looks like, see
http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00246.html.

> 2005-12-27  Francesco Montorsi  <address@hidden>
>
>     * src/ls.c: Added the --directories-first option to group

Formatting is wrong.  If you use emacs, then use changelog-mode when
editing the ChangeLog.  Otherwise, be aware that every indented line
starts with a leading tab character and not spaces.

>     directories before files; the sort_files() function has
>     been redesigned to use a function pointer array since now
>     the sort function variants are 56 (previously there were 28).
>     Some macros (DIRFIRST_CHECK, DEFINE_SORT_FUNCTIONS) were added
>     to create these variants without cluttering the source and to
>     avoid copy-and-paste of the same snippets.
>     A new function, is_directory(), has been created to group
>     in a single place all those types of checks.

If it were me, I would have done something like:

2005-12-27  Francesco Montorsi  <address@hidden>

        Add a --directories-first option to ls to group directories
        before files.
        * NEWS: Document new option.
        * src/ls.c (sort_type): Rearrange to use as an array index when
        choosing sort function.
        (directories_first): New variable.
        (DIRECTORIES_FIRST_OPTION): New enum.
        (long_options): Add --directories-first.
        (main): Support new option.
        (is_directory): New function.
        (extract_dirs_from_files): Use it.
        (FILEINFO_CAST, DIRFIRST_CHECK, DEFINE_SORT_FUNCTIONS)
        (LIST_SORTFUNCTION_VARIANTS): New macros.
        (sort_functions): New variable.
        (sort_files): Use it.
        (usage): Document new option.

> 
> 
> The FSF staff have contacted me saying that they will soon email me all
> the paperwork required for copyright assignment.
> 
> 
> Probably there are some GNU coding conventions I haven't respected and
> maybe other small touches to do but I think that now the patch really
> looks good ;)

Yes, you still have some coding convention violations that need to be
fixed.  Comments below, in portions of the patch that caught my eye.

> +++ NEWS    27 Dec 2005 18:38:25 -0000
> @@ -24,4 +24,7 @@
>    attempts to have the default be the best of both worlds.
> 
> +  ls now supports the '--directories-first' option to group directories
> +  before files.
> +

Good, you got the NEWS entry, but where is the documentation patch?
Submit ALL your related changes in a single patch, so that we don't have
to go tracking down the previous doc patch that you attached to savannah.

> @@ -399,9 +399,11 @@
>  ARGMATCH_VERIFY (time_style_args, time_style_types);
> 
> -/* Type of time to print or sort by.  Controlled by -c and -u.  */
> +/* Type of time to print or sort by.  Controlled by -c and -u.
> +   NB: the values of each item of this enum are important since they are
> +       used as indexes in the sort functions array (see sort_files()).  */

s/indexes/indices/ (isn't English fun?)
Typical comments in coreutils are full sentences, so I would omit the NB:
and just start my sentence with "The values".

> 
>  enum time_type
>    {
> -    time_mtime,            /* default */
> +    time_mtime = 0,        /* default */

Redundant.  C guarantees that an enumeration starts at 0 unless specified
otherwise.

>      time_ctime,            /* -c */
>      time_atime         /* -u */
> @@ -410,14 +412,16 @@
>  static enum time_type time_type;
> 
> -/* The file characteristic to sort by.  Controlled by -t, -S, -U, -X, -v.  */
> +/* The file characteristic to sort by.  Controlled by -t, -S, -U, -X, -v.
> +   NB: the values of each item of this enum are important since they are
> +       used as indexes in the sort functions array (see sort_files()).  */

Same comment as before.

> 
>  enum sort_type
>    {
> -    sort_none,         /* -U */
> +    sort_none = -1,    /* -U */
>      sort_name,         /* default */
>      sort_extension,        /* -X */
> -    sort_time,         /* -t */
>      sort_size,         /* -S */
> -    sort_version       /* -v */
> +    sort_version,      /* -v */
> +    sort_time,         /* -t */

C89 prohibits trailing commas in an enumeration.  C99 allows it, but I'm
not sure coreutils is ready to require that C99 feature yet.

>    };
> 
> @@ -592,4 +596,9 @@
>  static bool immediate_dirs;
> 
> +/* True means that directories are grouped before files.
> +   --directories-first  */

The sentence was good enough, no need to name --directories-first in the
comment.

> +
> +static bool directories_first;
> +
>  /* Which files to ignore.  */
> 
> @@ -741,4 +750,5 @@
>    SI_OPTION,
>    SORT_OPTION,
> +  DIRECTORIES_FIRST_OPTION,

Alphabetize, please.

>    TIME_OPTION,
>    TIME_STYLE_OPTION
> @@ -2728,4 +2744,12 @@
>  }
> 
> +
> +/* Returns true if the given fileinfo refers to a directory */
> +static bool is_directory (const struct fileinfo *f)

Formatting.  GNU Coding Standards require that C function names start in
the first column, a la:

static bool
is_directory (...

> +{
> +  return f->filetype == directory || f->filetype == arg_directory;
> +}
> +
> +
>  #ifdef S_ISLNK
> 
> @@ -2810,6 +2834,5 @@
>       order.  */
>    for (i = files_index; i-- != 0; )
> -    if ((files[i].filetype == directory || files[i].filetype == 
> arg_directory)
> -   && (!ignore_dot_and_dot_dot
> +    if (is_directory(&files[i]) && (!ignore_dot_and_dot_dot
>         || !basename_is_dot_or_dotdot (files[i].name)))

Formatting.  && and || should be indented to the column after the ( that
groups them.  Also, coreutils code typically puts a space between ! and
its argument, although in this case, the original code did not do that, so
I don't blame you.  In other words,

    if (is_directory(&files[i])
        && (! ignore_dot_and_dot_dot
            || ! basename_is_dot_or_dotdot (files[i].name)))

>        {
> @@ -2868,4 +2891,47 @@
> 
>  typedef void const *V;
> +typedef int (*qsortFunc)(V a, V b);
> +
> +#define FILEINFO_CAST(x)                ((struct fileinfo const *)x)

This macro was probably overkill, I would just do the cast inline instead
of using this macro.

> +
> +/* Used below in DEFINE_SORT_FUNCTIONS for _df_ sort function variants */
> +#define DIRFIRST_CHECK(a,b)                                         \
> +    bool a_is_dir = is_directory(FILEINFO_CAST(a)),                 \
> +        b_is_dir = is_directory(FILEINFO_CAST(b));                  \

Declare separately, rather than using , between two declarations.

> +    if (a_is_dir && !b_is_dir)                                      \
> +        return -1;         /* a goes before b */                    \

Indenting only needs to be 2 spaces, not 4, when you don't have {}.

> +    if (!a_is_dir && b_is_dir)                                      \
> +        return 1;          /* b goes before a */

I would enclose the body of the macro in its own scope, so that the macro
could be used more like a statement without violating C89 rules:
#define DIRFIRST_CHECK(a, b) do { ... } while (0)

> +
> +/*
> +   Defines the 8 different sort function variants required for each sortkey.
> +   The 'basefunc' should be an inline function so that compiler will be able
> +   to optimize all these functions. The 'basename' argument will be prefixed
> +   with the '[rev_][x]str{cmp|coll}[_df]_' string to create the function 
> name.
> +*/
> +#define DEFINE_SORT_FUNCTIONS(basename, basefunc)                   \
> +    /* direct, non-dirfirst versions */                             \
> +    static int xstrcoll_##basename (V a, V b)                       \
> +    { return basefunc (a, b, xstrcoll); }                           \
> +    static int strcmp_##basename (V a, V b)                         \
> +    { return basefunc (a, b, strcmp); }                             \
> +                                                                    \
> +    /* reverse, non-dirfirst versions */                            \
> +    static int rev_xstrcoll_##basename (V a, V b)                   \
> +    { return basefunc (b, a, xstrcoll); }                           \
> +    static int rev_strcmp_##basename (V a, V b)                     \
> +    { return basefunc (b, a, strcmp); }                             \
> +                                                                    \
> +    /* direct, dirfirst versions */                                 \
> +    static int xstrcoll_df_##basename (V a, V b)                    \
> +    { DIRFIRST_CHECK(a,b); return basefunc (a, b, xstrcoll); }      \
> +    static int strcmp_df_##basename (V a, V b)                      \
> +    { DIRFIRST_CHECK(a,b); return basefunc (a, b, strcmp); }        \
> +                                                                    \
> +    /* reverse, dirfirst versions */                                \
> +    static int rev_xstrcoll_df_##basename (V a, V b)                \
> +    { DIRFIRST_CHECK(a,b); return basefunc (b, a, xstrcoll); }      \
> +    static int rev_strcmp_df_##basename (V a, V b)                  \
> +    { DIRFIRST_CHECK(a,b); return basefunc (b, a, strcmp); }

Here, use "DIRFIRST_CHECK (a, b)", rather than "DIRFIRST_CHECK(a,b)".

> +
> +
> +/* Compares file versions. Unlike all other compare functions above,
> +   this function does not have any 'cmp' argument since it does rely
> +   directly on the strverscmp() GNU extension and thus it is available
> +   only when xstrcoll is available. */

The comment is not accurate.  All the other sort options need an xstrcoll
and strcmp variants, because they all use a string comparison (either as
the primary or secondary sort key), and xstrcoll has the ability to to a
longjmp if strcoll fails for locale reasons.  But cmp_version only depends
on strverscmp, which does not fail (even for locale reasons), and does not
need a secondary sort key.  strverscmp is ALWAYS available in coreutils,
thanks to the gnulib library, and xstrcoll was defined earlier in ls.c.

> +static inline int
> +cmp_version (struct fileinfo const *a, struct fileinfo const *b)
> +{
> +  return strverscmp (a->name, b->name);
> +}
> +
> +
> +/*
> +   We have 8 different variants for each sortkey function, and we have 7
> +   different sortkeys.
> +   The function pointers stored in this array are organized as:

I would have made this a multi-dimensional array, rather than flattening
it into one dimension.  sort_variants[num_sorts][2][2][2], deferenced by
sort_variants[sort_key][use_strcmp][reverse][dirs_first].

> +
> +    | idx | with...  | sort order | directories first ? |
> +    +-----+----------+------------+---------------------+
> +    |  0  | xstrcoll | direct     |          no         |
> +    |  1  | strcmp   | direct     |          no         |
> +    |  2  | xstrcoll | reverse    |          no         |
> +    |  3  | strcmp   | reverse    |          no         |
> +    |  4  | xstrcoll | direct     |          yes        |
> +    |  5  | strcmp   | direct     |          yes        |
> +    |  6  | xstrcoll | reverse    |          yes        |
> +    |  7  | strcmp   | reverse    |          yes        |
> +    +-----+----------+------------+---------------------+
> +
> +   for each sortkey.
> +
> +   NB: the order defined in the table above is arbitrary but
> +       the order in which sortkeys are listed in the function pointer
> +       array is defined by the time_type and sort_type enums !
> +*/
> +
> +
> +#define LIST_SORTFUNCTION_VARIANTS(basename)                \
> +    xstrcoll_##basename, strcmp_##basename,                 \
> +    rev_xstrcoll_##basename, rev_strcmp_##basename,         \
> +    xstrcoll_df_##basename, strcmp_df_##basename,           \
> +    rev_xstrcoll_df_##basename, rev_strcmp_df_##basename    \
> +
> +qsortFunc sort_functions[8*7] = {

I would have declared the outer dimension as [], then added an
ARGMATCH_VERIFY to ensure that sort_functions is consistent with the size
of enum time_type and enum sort_type, rather than using a magic number
8*7.  Even if you did use a magic number, operators should have spaces
around them ("8 * 7" not "8*7").  That way, if we ever add another
sort_type, the compiler will remind us if this sort_functions array was
not also updated.

> +    LIST_SORTFUNCTION_VARIANTS(name),
> +    LIST_SORTFUNCTION_VARIANTS(extension),
> +    LIST_SORTFUNCTION_VARIANTS(size),
> +
> +    /* for version comparisons, we don't have
> +      xstrcoll/strcmp differences */
> +    xstrcoll_version, xstrcoll_version,
> +    rev_xstrcoll_version, rev_xstrcoll_version,
> +    xstrcoll_df_version, xstrcoll_df_version,
> +    rev_xstrcoll_df_version, rev_xstrcoll_df_version,

You could actually use NULL for the strcmp versions of version comparison
(since as I explained above, version comparison does not rely on xstrcoll,
so it will never longjmp, and never need to try the strcmp fallback), with
an abort if dereferencing the array ever gets a NULL function pointer.

> +
> +    /* last are time sort functions */
> +    LIST_SORTFUNCTION_VARIANTS(mtime),
> +    LIST_SORTFUNCTION_VARIANTS(ctime),
> +    LIST_SORTFUNCTION_VARIANTS(atime)
> +  };
> +
> +
> 
>  /* Sort the files now in the table.  */
> @@ -2961,5 +3078,8 @@
>  sort_files (void)
>  {
> -  int (*func) (V, V);
> +  bool use_strcmp;
> +
> +  if (sort_type == sort_none)
> +      return;
> 
>    /* Try strcoll.  If it fails, fall back on strcmp.  We can't safely
> @@ -2969,76 +3089,24 @@
> 
>    if (! setjmp (failed_strcoll))
> -    {
> -      switch (sort_type)
> -   {
> -   case sort_none:
> -     return;
> -   case sort_time:
> -     switch (time_type)
> -       {
> -       case time_ctime:
> -         func = sort_reverse ? rev_cmp_ctime : compare_ctime;
> -         break;
> -       case time_mtime:
> -         func = sort_reverse ? rev_cmp_mtime : compare_mtime;
> -         break;
> -       case time_atime:
> -         func = sort_reverse ? rev_cmp_atime : compare_atime;
> -         break;
> -       default:
> -         abort ();
> -       }
> -     break;
> -   case sort_name:
> -     func = sort_reverse ? rev_cmp_name : compare_name;
> -     break;
> -   case sort_extension:
> -     func = sort_reverse ? rev_cmp_extension : compare_extension;
> -     break;
> -   case sort_size:
> -     func = sort_reverse ? rev_cmp_size : compare_size;
> -     break;
> -   case sort_version:
> -     func = sort_reverse ? rev_cmp_version : compare_version;
> -     break;
> -   default:
> -     abort ();
> -   }
> -    }
> +        use_strcmp = false;      /* [x]strcoll() available ! */

Misleading comment.  [x]strcoll is always available (thanks to gnulib), it
was more an issue of whether strcoll failed to do a comparison due to
locale issues.

>    else
>      {
> -      switch (sort_type)
> -   {
> -   case sort_time:
> -     switch (time_type)
> -       {
> -       case time_ctime:
> -         func = sort_reverse ? rev_str_ctime : compstr_ctime;
> -         break;
> -       case time_mtime:
> -         func = sort_reverse ? rev_str_mtime : compstr_mtime;
> -         break;
> -       case time_atime:
> -         func = sort_reverse ? rev_str_atime : compstr_atime;
> -         break;
> -       default:
> -         abort ();
> -       }
> -     break;
> -   case sort_name:
> -     func = sort_reverse ? rev_str_name : compstr_name;
> -     break;
> -   case sort_extension:
> -     func = sort_reverse ? rev_str_extension : compstr_extension;
> -     break;
> -   case sort_size:
> -     func = sort_reverse ? rev_str_size : compstr_size;
> -     break;
> -   default:
> -     abort ();
> -   }
> +        use_strcmp = true;
> +        if (sort_type == sort_version)
> +            abort(); /* cannot do version comparisons without xstrcoll() */

Comment is not quite right.  Version comparisons don't rely on xstrcoll.
Indentation again - just two spaces per indentation level; and in lines of
code with 8 or more leading spaces, use TAB characters.

>      }
> 
> -  qsort (files, files_index, sizeof *files, func);
> +  /* when sort_type == sort_time we need to use time_type as subindex. */
> +  int timeoffset = 0;
> +  if (sort_type == sort_time)
> +     timeoffset = time_type;
> +
> +  /* compute the index in the sort function pointers array */
> +  int idx = (sort_type+timeoffset)*8 +     /* select sortkey block */
> +            ((int)directories_first)*4 +   /* select (non-)dirfirst mode */

No need to cast bool to int.  Multiplying and adding bools works just fine
 per C99 rules (and guaranteed even for C89 by gnulib's magic).  See my
earlier comment about doing a multi-dimension array reference, rather than
multiplying values yourself.

Grouping and spacing - this should look like:
int idx = ((sort_type + timeoffset) * 8
           + directories_first * 4
           + ...);

> +            ((int)sort_reverse)*2 +        /* select sort order */
> +            (int)use_strcmp;               /* select xstrcoll/strcmp version 
> */
> +
> +  qsort (files, files_index, sizeof *files, sort_functions[ idx ]);
>  }
> 
> @@ -4137,4 +4205,7 @@
>    -D, --dired                generate output designed for Emacs' dired 
> mode\n\
>  "), stdout);
> +      fputs(_("\
> +      --directories-first    group directories before files\n\

I'm leaning towards naming the option --group-directories-first.  That
way, you don't have an ambiguous prefix with --dired and --directory; so
you could simply use --g as an abbreviation (at least, until some other
option starting with --g is added).

> +"), stdout);
>        fputs (_("\
>    -f                         do not sort, enable -aU, disable -lst\n\
> 
> 

I hope you don't view these comments as criticism, so much as ideas to
make your patch more likely to be incorporated once your copyright status
is cleared.  Remember, ls already has so many options that the bar for
adding a new option is very high - you have to give good reasons why the
patch is a useful addition, and minimize other's efforts to incorporate
your patch.  Your efforts to provide a good patch for a feature that you
desire are commendable; too many people complain about a missing feature,
then do nothing about it.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDs/Sl84KuGfSFAYARAtEwAJ9yl29TXmro+kSnkpmDhH7evgCYuACgigYY
CCPgicFvQzH94Q8E0eVUGYA=
=KXRO
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]