bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: About version sorting of `ls'. -- strverscmp.c


From: ISONO Tomoatsu 磯野友厚
Subject: Re: About version sorting of `ls'. -- strverscmp.c
Date: Thu, 08 Mar 2001 04:21:34 +0900

Thank you for redirecting.

 About << Re: About version sorting of `ls'. >> 
   address@hidden said:

jim> Thank you for the report and patch.
jim> Since strverscmp.c is part of the GNU C library,
jim> I'll pass this on to them.

Now, I propose two new definitions of ordering for version sorting.

Below there are two patches:
  the first one is for fileutils-4.0.41/lib/strverscmp.c
  and the other one is to be applied after the first one.

The reason I made patches for fileutils is that strverscmp.c in
fileutils-4.0.41 is newer than that in glibc-2.2.2.

                                              ISONO Tomoatsu
-------------------------------------------------------------------
 ISONO Tomoatsu <address@hidden>
-------------------------------------------------------------------
#### The first patch ####

diff -rc fileutils-4.0.41/lib/strverscmp.c fileutils-4.0.41-mod/lib/strverscmp.c
*** fileutils-4.0.41/lib/strverscmp.c   Thu Nov 16 18:08:53 2000
--- fileutils-4.0.41-mod/lib/strverscmp.c       Thu Mar  8 03:45:45 2001
***************
*** 30,41 ****
  #define S_N    0x0
  #define S_I    0x4
  #define S_F    0x8
! #define S_Z    0xC
  
  /* result_type: CMP: return diff; LEN: compare using len_diff/diff */
  #define CMP    2
  #define LEN    3
  
  
  /* ISDIGIT differs from isdigit, as follows:
     - Its arg may be any int or unsigned int; it need not be an unsigned char.
--- 30,46 ----
  #define S_N    0x0
  #define S_I    0x4
  #define S_F    0x8
! /* #define S_Z    0xC */
! #define S_P    0xC
  
  /* result_type: CMP: return diff; LEN: compare using len_diff/diff */
  #define CMP    2
  #define LEN    3
+ #define DG1    4
+ #define DG2    5
  
+ /* Use DUM to fill spaces. */
+ #define DUM    0
  
  /* ISDIGIT differs from isdigit, as follows:
     - Its arg may be any int or unsigned int; it need not be an unsigned char.
***************
*** 59,64 ****
--- 64,154 ----
     equal to or greater than S2 (for more info, see the texinfo doc).
  */
  
+ /* MODIFIED VERSION:
+    The following is the ordering of the original version sorting.
+ 
+    a.x
+    a00.x
+    a01.x
+    a012.x
+    a01a.x
+    a0a.x
+    aa.x
+    abc-1.007.1.tgz
+    abc-1.007.tgz
+    abc-1.007a.tgz
+    abc-1.012b.tgz
+    abc-1.01a.tgz
+    xyz-4.0.41.tgz
+    xyz-4.0.tgz
+    xyz-4.0a.5.tgz
+    xyz-4.0a.41.tgz
+    xyz-4.0a.tgz
+    xyz.tgz
+ 
+    But I think this is not so good, and suggest a new clear definition
+    of order.  And, please note that the last three filenames are
+    examples in `doc/fileutils.texi' and that the order of them is
+    different from the description there.
+ 
+    1. Treat each locally maximal successive sequence of digits and
+       periods (\X[0-9]+(\.[0-9]+)*X) as a `character'.  We call it as
+       LISTDIGSEQ, and each sequence of digits in LISTDIGSEQ as DIGSEQ.
+ 
+    2. The numerical value of a DIGSEQ is considered as a fractional
+       one if it begins with `0'. But in fractional cases, there exists
+       ambiguity due to last successive `0's in DIGSEQ. So, when the
+       numerical values of them are equal, we regard the longer one as
+       the larger one.
+ 
+       (This is equivalent to consider as following: we define
+       numerical values of `0' (resp. `1',... ,`9') as 1 (resp. 2,... ,
+       10). Consider a DIGSEQ is an undecimal, or eleven-adic
+       expression (both in integer case and in fractional case).
+ 
+    3. The order between a DIGSEQ and a non-digit character CHAR is
+       defined to be the same order between '0' and CHAR.
+    
+    4. The order between two DIGSEQs is defined to be the same order as
+       numerical values of two DIGSEQs.
+ 
+    5. The order between two LISTDIGSEQs is defined as follows:
+       Compare each DIGSEQs in LISTDIGSEQs from the beginning.
+       When you meet first differnet DIGSEQs, then the larger is the
+       one which have the larger DIGSEQs. And when you find a
+       LISTDIGSEQ is a concatenated list of the other and some list, 
+       then the larger LISTDIGSEQ is the longer one.
+ 
+    6. Compare strings according to the order of `character's.
+ 
+    Original          Modified
+    version sorting.  version sorting.
+ 
+    a.x               a.x
+    a00.x             a0a.x
+    a01.x             a00.x
+    a012.x            a01.x
+    a01a.x            a01a.x
+    a0a.x             a012.x
+    aa.x              aa.x
+    abc-1.007.1.tgz   abc-1.007.tgz
+    abc-1.007.tgz     abc-1.007a.tgz
+    abc-1.007a.tgz    abc-1.007.1.tgz
+    abc-1.012b.tgz    abc-1.01a.tgz
+    abc-1.01a.tgz     abc-1.012b.tgz
+    xyz-4.0.41.tgz    xyz-4.0.tgz
+    xyz-4.0.tgz       xyz-4.0a.5.tgz
+    xyz-4.0a.5.tgz    xyz-4.0a.41.tgz
+    xyz-4.0a.41.tgz   xyz-4.0a.tgz
+    xyz-4.0a.tgz      xyz-4.0.41.tgz
+    xyz.tgz           xyz.tgz
+ 
+    I think this modified version does somewhat better, though not
+    completely.  But, there is still a problem, if you use version
+    numbering like `xyz-4.0a.5.tgz', `xyz-4.0a.41.tgz'...
+ 
+ */
+ 
  int
  __strverscmp (const char *s1, const char *s2)
  {
***************
*** 72,89 ****
       Transition   (10) 0  (01) d  (00) x  (11) -   */
    static const unsigned int next_state[] =
    {
!       /* state    x    d    0    - */
        /* S_N */  S_N, S_I, S_Z, S_N,
        /* S_I */  S_N, S_I, S_I, S_I,
        /* S_F */  S_N, S_F, S_F, S_F,
        /* S_Z */  S_N, S_F, S_Z, S_Z
    };
  
    static const int result_type[] =
    {
!       /* state   x/x  x/d  x/0  x/-  d/x  d/d  d/0  d/-
!                  0/x  0/d  0/0  0/-  -/x  -/d  -/0  -/- */
! 
        /* S_N */  CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
                   CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
        /* S_I */  CMP, -1,  -1,  CMP,  1,  LEN, LEN, CMP,
--- 162,201 ----
       Transition   (10) 0  (01) d  (00) x  (11) -   */
    static const unsigned int next_state[] =
    {
!       /* state    x    d    0    . */
!     /* The first zero digit means fractional state,
!        the first non-zero digit means integral state. */
!       /* S_N */  S_N, S_I, S_F, S_P,
!     /* The successive digit means integral in integral state. */
!     /* The period after integral should be remembered. */
!       /* S_I */  S_N, S_I, S_I, S_P,
!     /* The successive digit means fractional in fractional state. */
!     /* The period after fractional should be remembered. */
!       /* S_F */  S_N, S_F, S_F, S_P,
!     /* After the period, same as S_N. */
!       /* S_P */  S_N, S_I, S_F, S_P
! #if 0
        /* S_N */  S_N, S_I, S_Z, S_N,
        /* S_I */  S_N, S_I, S_I, S_I,
        /* S_F */  S_N, S_F, S_F, S_F,
        /* S_Z */  S_N, S_F, S_Z, S_Z
+ #endif
    };
  
    static const int result_type[] =
    {
!       /* state   x/x  x/d  x/0  x/.  d/x  d/d  d/0  d/.
!                  0/x  0/d  0/0  0/.  ./x  ./d  ./0  ./. */
!       /* Hint: Note we never refer to 0/0 and ./. */
!       /* S_N */  CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP, /* Hint: int > frac 
*/
!                  CMP, CMP, DUM, CMP, CMP, CMP, CMP, DUM,
!       /* S_I */  CMP, -1,  -1,  DG2,  1,  LEN, LEN,  1,
!                   1,  LEN, DUM,  1,  DG1, -1,  -1,  DUM,
!       /* S_F */  CMP, -1,  -1,  DG2,  1,  CMP, CMP,  1,  /* No need LEN cmp */
!                   1,  CMP, DUM,  1,  DG1, -1,  -1,  DUM, 
!       /* S_P */  CMP, -1,  -1,  DG2,  1,  LEN, CMP,  1,  /* DGx to check if */
!                   1,  CMP, DUM,  1,  DG1, -1,  -1,  DUM  /* a digit follows. 
*/
! #if 0
        /* S_N */  CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
                   CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
        /* S_I */  CMP, -1,  -1,  CMP,  1,  LEN, LEN, CMP,
***************
*** 92,97 ****
--- 204,210 ----
                   CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
        /* S_Z */  CMP,  1,   1,  CMP, -1,  CMP, CMP, CMP,
                   -1,  CMP, CMP, CMP
+ #endif
    };
  
    if (p1 == p2)
***************
*** 100,116 ****
    c1 = *p1++;
    c2 = *p2++;
    /* Hint: '0' is a digit too.  */
!   state = S_N | ((c1 == '0') + (ISDIGIT (c1) != 0));
  
    while ((diff = c1 - c2) == 0 && c1 != '\0')
      {
        state = next_state[state];
        c1 = *p1++;
        c2 = *p2++;
!       state |= (c1 == '0') + (ISDIGIT (c1) != 0);
      }
  
!   state = result_type[state << 2 | ((c2 == '0') + (ISDIGIT (c2) != 0))];
  
    switch (state)
      {
--- 213,230 ----
    c1 = *p1++;
    c2 = *p2++;
    /* Hint: '0' is a digit too.  */
!   state = S_N | ((c1 == '0') + (ISDIGIT (c1) != 0) + ((c1 == '.')? 3: 0));
  
    while ((diff = c1 - c2) == 0 && c1 != '\0')
      {
        state = next_state[state];
        c1 = *p1++;
        c2 = *p2++;
!       state |= (c1 == '0') + (ISDIGIT (c1) != 0) + ((c1 == '.')? 3: 0);
      }
  
!   state = result_type [state << 2 |
!               ((c2 == '0') + (ISDIGIT (c2) != 0) + ((c2 == '.')? 3: 0))];
  
    switch (state)
      {
***************
*** 123,128 ****
--- 237,247 ----
          return 1;
  
        return ISDIGIT (*p2) ? -1 : diff;
+ 
+     case DG1:
+       return (ISDIGIT (*p1))?  1: diff;
+     case DG2:
+       return (ISDIGIT (*p2))? -1: diff;
  
      default:
        return state;


#### The second patch ####

diff -rc fileutils-4.0.41-mod/lib/strverscmp.c 
fileutils-4.0.41-mod2/lib/strverscmp.c
*** fileutils-4.0.41-mod/lib/strverscmp.c       Thu Mar  8 03:45:45 2001
--- fileutils-4.0.41-mod2/lib/strverscmp.c      Thu Mar  8 03:48:43 2001
***************
*** 32,37 ****
--- 32,38 ----
  #define S_F    0x8
  /* #define S_Z    0xC */
  #define S_P    0xC
+ #define S_X    0x10  /* Leading zeros which has no period preceding */
  
  /* result_type: CMP: return diff; LEN: compare using len_diff/diff */
  #define CMP    2
***************
*** 64,70 ****
     equal to or greater than S2 (for more info, see the texinfo doc).
  */
  
! /* MODIFIED VERSION:
     The following is the ordering of the original version sorting.
  
     a.x
--- 65,71 ----
     equal to or greater than S2 (for more info, see the texinfo doc).
  */
  
! /* Second MODIFIED VERSION:
     The following is the ordering of the original version sorting.
  
     a.x
***************
*** 86,99 ****
     xyz-4.0a.tgz
     xyz.tgz
  
!    But I think this is not so good, and suggest a new clear definition
!    of order.  And, please note that the last three filenames are
!    examples in `doc/fileutils.texi' and that the order of them is
!    different from the description there.
  
     1. Treat each locally maximal successive sequence of digits and
!       periods (\X[0-9]+(\.[0-9]+)*X) as a `character'.  We call it as
!       LISTDIGSEQ, and each sequence of digits in LISTDIGSEQ as DIGSEQ.
  
     2. The numerical value of a DIGSEQ is considered as a fractional
        one if it begins with `0'. But in fractional cases, there exists
--- 87,99 ----
     xyz-4.0a.tgz
     xyz.tgz
  
!   I suggest a new somewhat complicated ordering which gives more
!   heuristic results.
  
     1. Treat each locally maximal successive sequence of digits and
!       periods \X((|\.)[0-9]+(\.[0-9]+)*)X as a `character'. 
!       We call it as LISTDIGSEQ, and each sequence of digits in
!       LISTDIGSEQ as DIGSEQ.
  
     2. The numerical value of a DIGSEQ is considered as a fractional
        one if it begins with `0'. But in fractional cases, there exists
***************
*** 106,152 ****
        10). Consider a DIGSEQ is an undecimal, or eleven-adic
        expression (both in integer case and in fractional case).
  
!    3. The order between a DIGSEQ and a non-digit character CHAR is
!       defined to be the same order between '0' and CHAR.
!    
!    4. The order between two DIGSEQs is defined to be the same order as
        numerical values of two DIGSEQs.
  
!    5. The order between two LISTDIGSEQs is defined as follows:
!       Compare each DIGSEQs in LISTDIGSEQs from the beginning.
!       When you meet first differnet DIGSEQs, then the larger is the
!       one which have the larger DIGSEQs. And when you find a
!       LISTDIGSEQ is a concatenated list of the other and some list, 
!       then the larger LISTDIGSEQ is the longer one.
  
     6. Compare strings according to the order of `character's.
  
!    Original          Modified
!    version sorting.  version sorting.
! 
!    a.x               a.x
!    a00.x             a0a.x
!    a01.x             a00.x
!    a012.x            a01.x
!    a01a.x            a01a.x
!    a0a.x             a012.x
!    aa.x              aa.x
!    abc-1.007.1.tgz   abc-1.007.tgz
!    abc-1.007.tgz     abc-1.007a.tgz
!    abc-1.007a.tgz    abc-1.007.1.tgz
!    abc-1.012b.tgz    abc-1.01a.tgz
!    abc-1.01a.tgz     abc-1.012b.tgz
!    xyz-4.0.41.tgz    xyz-4.0.tgz
!    xyz-4.0.tgz       xyz-4.0a.5.tgz
!    xyz-4.0a.5.tgz    xyz-4.0a.41.tgz
!    xyz-4.0a.41.tgz   xyz-4.0a.tgz
!    xyz-4.0a.tgz      xyz-4.0.41.tgz
!    xyz.tgz           xyz.tgz
! 
!    I think this modified version does somewhat better, though not
!    completely.  But, there is still a problem, if you use version
!    numbering like `xyz-4.0a.5.tgz', `xyz-4.0a.41.tgz'...
  
  */
  
  int
--- 106,185 ----
        10). Consider a DIGSEQ is an undecimal, or eleven-adic
        expression (both in integer case and in fractional case).
  
!    3. The order between two DIGSEQs is defined to be the same order as
        numerical values of two DIGSEQs.
  
!    4. The order between a LISTDIGSEQ and a non-digit character CHAR is
!       defined to be the same order between the first character of the
!       LISTDIGSEG and the CHAR except both ones are periods. When both
!       ones are periods, the LISTDIGSEQ is the larger.
!    
!    5. The order between two LISTDIGSEQs is defined as follows. If one
!       LISTDIGSEQ begins with a period and the other does not, the
!       larger is the second one (Note that '.' < '0').
!       If both LISTDIGSEQ do not begin with any period, and the first
!       DIGSEQ of one of them consists only of '0's and is a prefix of
!       the other one, then the larger one is the first one.
! 
!       Otherwise, compare each DIGSEQs in LISTDIGSEQs from the beginning.
!       When you meet different DIGSEQs, then the larger is the one which
!       have the larger DIGSEQs. And when you find a LISTDIGSEQ is a
!       concatenated list of the other and some list, then the larger
!       LISTDIGSEQ is the longer one.
  
     6. Compare strings according to the order of `character's.
  
!    Original          Modified          Second modified
!    version sorting.  version sorting.  version sorting.
  
+    000README         0README           000README       
+    00README          00README          00README        
+    01README          000README         01README        
+    0README           01README          0README         
+    a.x               a.x               a.x             
+    a000.1.tgz        a0a.x             a000.1.tgz      
+    a00.1.tgz         a00.x             a00.x           
+    a00.x             a00.1.tgz         a00.1.tgz       
+    a01.x             a000.1.tgz        a01.x           
+    a012.x            a01.x             a01a.x          
+    a01a.x            a01a.x            a012.x          
+    a0a.x             a012.x            a0a.x           
+    abc-1.007.1.tgz   abc-1.00.8.tgz    abc-1.00.8.tgz  
+    abc-1.007.tgz     abc-1.007.tgz     abc-1.007.tgz   
+    abc-1.007a.tgz    abc-1.007a.tgz    abc-1.007a.tgz  
+    abc-1.0081.tgz    abc-1.007.1.tgz   abc-1.007.1.tgz 
+    abc-1.00813.tgz   abc-1.0081.tgz    abc-1.0081.tgz  
+    abc-1.00.8.tgz    abc-1.00813.tgz   abc-1.00813.tgz 
+    abc-1.012b.tgz    abc-1.01a.tgz     abc-1.01a.tgz   
+    abc-1.01a.tgz     abc-1.012b.tgz    abc-1.012b.tgz  
+    xyz-4.0.          xyz-4.0.          xyz-4.0.        
+    xyz-4.0.41.tgz    xyz-4.0.tgz       xyz-4.0.tgz     
+    xyz-4.0.tgz       xyz-4.0a          xyz-4.0a        
+    xyz-4.0a          xyz-4.0a+09.tgz   xyz-4.0a+09.tgz 
+    xyz-4.0a+09.tgz   xyz-4.0a.tgz      xyz-4.0a.tgz    
+    xyz-4.0a.00.tgz   xyz-4.0a.00.tgz   xyz-4.0a.00.tgz 
+    xyz-4.0a.01.tgz   xyz-4.0a.01.tgz   xyz-4.0a.01.tgz 
+    xyz-4.0a.09.tgz   xyz-4.0a.09.tgz   xyz-4.0a.09.tgz 
+    xyz-4.0a.091.tgz  xyz-4.0a.09a.tgz  xyz-4.0a.09a.tgz
+    xyz-4.0a.09a.tgz  xyz-4.0a.091.tgz  xyz-4.0a.091.tgz
+    xyz-4.0a.5.tgz    xyz-4.0a.5.tgz    xyz-4.0a.5.tgz  
+    xyz-4.0a.41.tgz   xyz-4.0a.41.tgz   xyz-4.0a.41.tgz 
+    xyz-4.0a.tgz      xyz-4.0a0.tgz     xyz-4.0a00.tgz  
+    xyz-4.0a00.tgz    xyz-4.0a00.tgz    xyz-4.0a01.tgz  
+    xyz-4.0a01.tgz    xyz-4.0a01.tgz    xyz-4.0a09.tgz  
+    xyz-4.0a09.tgz    xyz-4.0a09.tgz    xyz-4.0a09a.tgz 
+    xyz-4.0a091.tgz   xyz-4.0a09a.tgz   xyz-4.0a091.tgz 
+    xyz-4.0a09a.tgz   xyz-4.0a091.tgz   xyz-4.0a0.tgz   
+    xyz-4.0a0.tgz     xyz-4.0a5.tgz     xyz-4.0a5.tgz   
+    xyz-4.0a5.tgz     xyz-4.0a13.tgz    xyz-4.0a13.tgz  
+    xyz-4.0a13.tgz    xyz-4.0.41.tgz    xyz-4.0.41.tgz  
+    xyz.tgz           xyz.tgz           xyz.tgz         
+ 
+    I feel this modified version does somewhat better though the rule
+    is complicated...  How do you think?
+    I guess that the special treatment around leading zeros in original
+    strverscmp is implemented to make ordering like 00README < 0README.
+    Now I take this into account.
  */
  
  int
***************
*** 165,179 ****
        /* state    x    d    0    . */
      /* The first zero digit means fractional state,
         the first non-zero digit means integral state. */
!       /* S_N */  S_N, S_I, S_F, S_P,
      /* The successive digit means integral in integral state. */
-     /* The period after integral should be remembered. */
        /* S_I */  S_N, S_I, S_I, S_P,
      /* The successive digit means fractional in fractional state. */
-     /* The period after fractional should be remembered. */
        /* S_F */  S_N, S_F, S_F, S_P,
!     /* After the period, same as S_N. */
!       /* S_P */  S_N, S_I, S_F, S_P
  #if 0
        /* S_N */  S_N, S_I, S_Z, S_N,
        /* S_I */  S_N, S_I, S_I, S_I,
--- 198,212 ----
        /* state    x    d    0    . */
      /* The first zero digit means fractional state,
         the first non-zero digit means integral state. */
!       /* S_N */  S_N, S_I, S_X, S_P,
      /* The successive digit means integral in integral state. */
        /* S_I */  S_N, S_I, S_I, S_P,
      /* The successive digit means fractional in fractional state. */
        /* S_F */  S_N, S_F, S_F, S_P,
!     /* After a period */
!       /* S_P */  S_N, S_I, S_F, S_P,
!     /* Leading zeros which has no period preceding */
!       /* S_X */  S_N, S_F, S_X, S_P
  #if 0
        /* S_N */  S_N, S_I, S_Z, S_N,
        /* S_I */  S_N, S_I, S_I, S_I,
***************
*** 194,200 ****
        /* S_F */  CMP, -1,  -1,  DG2,  1,  CMP, CMP,  1,  /* No need LEN cmp */
                    1,  CMP, DUM,  1,  DG1, -1,  -1,  DUM, 
        /* S_P */  CMP, -1,  -1,  DG2,  1,  LEN, CMP,  1,  /* DGx to check if */
!                   1,  CMP, DUM,  1,  DG1, -1,  -1,  DUM  /* a digit follows. 
*/
  #if 0
        /* S_N */  CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
                   CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
--- 227,235 ----
        /* S_F */  CMP, -1,  -1,  DG2,  1,  CMP, CMP,  1,  /* No need LEN cmp */
                    1,  CMP, DUM,  1,  DG1, -1,  -1,  DUM, 
        /* S_P */  CMP, -1,  -1,  DG2,  1,  LEN, CMP,  1,  /* DGx to check if */
!                   1,  CMP, DUM,  1,  DG1, -1,  -1,  DUM, /* a digit follows. 
*/
!       /* S_X */  CMP,  1,   1,  CMP, -1,  CMP, CMP, -1,
!                  -1,  CMP, DUM, -1,  CMP,  1,   1,  DUM
  #if 0
        /* S_N */  CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
                   CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,

#### End ####



reply via email to

[Prev in Thread] Current Thread [Next in Thread]