[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: About version sorting of `ls'. -- strverscmp.c
From: |
ISONO Tomoatsu 磯野友厚 |
Subject: |
Re: About version sorting of `ls'. -- strverscmp.c |
Date: |
Thu, 08 Mar 2001 04:21:34 +0900 |
Thank you for redirecting.
About << Re: About version sorting of `ls'. >>
address@hidden said:
jim> Thank you for the report and patch.
jim> Since strverscmp.c is part of the GNU C library,
jim> I'll pass this on to them.
Now, I propose two new definitions of ordering for version sorting.
Below there are two patches:
the first one is for fileutils-4.0.41/lib/strverscmp.c
and the other one is to be applied after the first one.
The reason I made patches for fileutils is that strverscmp.c in
fileutils-4.0.41 is newer than that in glibc-2.2.2.
ISONO Tomoatsu
-------------------------------------------------------------------
ISONO Tomoatsu <address@hidden>
-------------------------------------------------------------------
#### The first patch ####
diff -rc fileutils-4.0.41/lib/strverscmp.c fileutils-4.0.41-mod/lib/strverscmp.c
*** fileutils-4.0.41/lib/strverscmp.c Thu Nov 16 18:08:53 2000
--- fileutils-4.0.41-mod/lib/strverscmp.c Thu Mar 8 03:45:45 2001
***************
*** 30,41 ****
#define S_N 0x0
#define S_I 0x4
#define S_F 0x8
! #define S_Z 0xC
/* result_type: CMP: return diff; LEN: compare using len_diff/diff */
#define CMP 2
#define LEN 3
/* ISDIGIT differs from isdigit, as follows:
- Its arg may be any int or unsigned int; it need not be an unsigned char.
--- 30,46 ----
#define S_N 0x0
#define S_I 0x4
#define S_F 0x8
! /* #define S_Z 0xC */
! #define S_P 0xC
/* result_type: CMP: return diff; LEN: compare using len_diff/diff */
#define CMP 2
#define LEN 3
+ #define DG1 4
+ #define DG2 5
+ /* Use DUM to fill spaces. */
+ #define DUM 0
/* ISDIGIT differs from isdigit, as follows:
- Its arg may be any int or unsigned int; it need not be an unsigned char.
***************
*** 59,64 ****
--- 64,154 ----
equal to or greater than S2 (for more info, see the texinfo doc).
*/
+ /* MODIFIED VERSION:
+ The following is the ordering of the original version sorting.
+
+ a.x
+ a00.x
+ a01.x
+ a012.x
+ a01a.x
+ a0a.x
+ aa.x
+ abc-1.007.1.tgz
+ abc-1.007.tgz
+ abc-1.007a.tgz
+ abc-1.012b.tgz
+ abc-1.01a.tgz
+ xyz-4.0.41.tgz
+ xyz-4.0.tgz
+ xyz-4.0a.5.tgz
+ xyz-4.0a.41.tgz
+ xyz-4.0a.tgz
+ xyz.tgz
+
+ But I think this is not so good, and suggest a new clear definition
+ of order. And, please note that the last three filenames are
+ examples in `doc/fileutils.texi' and that the order of them is
+ different from the description there.
+
+ 1. Treat each locally maximal successive sequence of digits and
+ periods (\X[0-9]+(\.[0-9]+)*X) as a `character'. We call it as
+ LISTDIGSEQ, and each sequence of digits in LISTDIGSEQ as DIGSEQ.
+
+ 2. The numerical value of a DIGSEQ is considered as a fractional
+ one if it begins with `0'. But in fractional cases, there exists
+ ambiguity due to last successive `0's in DIGSEQ. So, when the
+ numerical values of them are equal, we regard the longer one as
+ the larger one.
+
+ (This is equivalent to consider as following: we define
+ numerical values of `0' (resp. `1',... ,`9') as 1 (resp. 2,... ,
+ 10). Consider a DIGSEQ is an undecimal, or eleven-adic
+ expression (both in integer case and in fractional case).
+
+ 3. The order between a DIGSEQ and a non-digit character CHAR is
+ defined to be the same order between '0' and CHAR.
+
+ 4. The order between two DIGSEQs is defined to be the same order as
+ numerical values of two DIGSEQs.
+
+ 5. The order between two LISTDIGSEQs is defined as follows:
+ Compare each DIGSEQs in LISTDIGSEQs from the beginning.
+ When you meet first differnet DIGSEQs, then the larger is the
+ one which have the larger DIGSEQs. And when you find a
+ LISTDIGSEQ is a concatenated list of the other and some list,
+ then the larger LISTDIGSEQ is the longer one.
+
+ 6. Compare strings according to the order of `character's.
+
+ Original Modified
+ version sorting. version sorting.
+
+ a.x a.x
+ a00.x a0a.x
+ a01.x a00.x
+ a012.x a01.x
+ a01a.x a01a.x
+ a0a.x a012.x
+ aa.x aa.x
+ abc-1.007.1.tgz abc-1.007.tgz
+ abc-1.007.tgz abc-1.007a.tgz
+ abc-1.007a.tgz abc-1.007.1.tgz
+ abc-1.012b.tgz abc-1.01a.tgz
+ abc-1.01a.tgz abc-1.012b.tgz
+ xyz-4.0.41.tgz xyz-4.0.tgz
+ xyz-4.0.tgz xyz-4.0a.5.tgz
+ xyz-4.0a.5.tgz xyz-4.0a.41.tgz
+ xyz-4.0a.41.tgz xyz-4.0a.tgz
+ xyz-4.0a.tgz xyz-4.0.41.tgz
+ xyz.tgz xyz.tgz
+
+ I think this modified version does somewhat better, though not
+ completely. But, there is still a problem, if you use version
+ numbering like `xyz-4.0a.5.tgz', `xyz-4.0a.41.tgz'...
+
+ */
+
int
__strverscmp (const char *s1, const char *s2)
{
***************
*** 72,89 ****
Transition (10) 0 (01) d (00) x (11) - */
static const unsigned int next_state[] =
{
! /* state x d 0 - */
/* S_N */ S_N, S_I, S_Z, S_N,
/* S_I */ S_N, S_I, S_I, S_I,
/* S_F */ S_N, S_F, S_F, S_F,
/* S_Z */ S_N, S_F, S_Z, S_Z
};
static const int result_type[] =
{
! /* state x/x x/d x/0 x/- d/x d/d d/0 d/-
! 0/x 0/d 0/0 0/- -/x -/d -/0 -/- */
!
/* S_N */ CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
/* S_I */ CMP, -1, -1, CMP, 1, LEN, LEN, CMP,
--- 162,201 ----
Transition (10) 0 (01) d (00) x (11) - */
static const unsigned int next_state[] =
{
! /* state x d 0 . */
! /* The first zero digit means fractional state,
! the first non-zero digit means integral state. */
! /* S_N */ S_N, S_I, S_F, S_P,
! /* The successive digit means integral in integral state. */
! /* The period after integral should be remembered. */
! /* S_I */ S_N, S_I, S_I, S_P,
! /* The successive digit means fractional in fractional state. */
! /* The period after fractional should be remembered. */
! /* S_F */ S_N, S_F, S_F, S_P,
! /* After the period, same as S_N. */
! /* S_P */ S_N, S_I, S_F, S_P
! #if 0
/* S_N */ S_N, S_I, S_Z, S_N,
/* S_I */ S_N, S_I, S_I, S_I,
/* S_F */ S_N, S_F, S_F, S_F,
/* S_Z */ S_N, S_F, S_Z, S_Z
+ #endif
};
static const int result_type[] =
{
! /* state x/x x/d x/0 x/. d/x d/d d/0 d/.
! 0/x 0/d 0/0 0/. ./x ./d ./0 ./. */
! /* Hint: Note we never refer to 0/0 and ./. */
! /* S_N */ CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP, /* Hint: int > frac
*/
! CMP, CMP, DUM, CMP, CMP, CMP, CMP, DUM,
! /* S_I */ CMP, -1, -1, DG2, 1, LEN, LEN, 1,
! 1, LEN, DUM, 1, DG1, -1, -1, DUM,
! /* S_F */ CMP, -1, -1, DG2, 1, CMP, CMP, 1, /* No need LEN cmp */
! 1, CMP, DUM, 1, DG1, -1, -1, DUM,
! /* S_P */ CMP, -1, -1, DG2, 1, LEN, CMP, 1, /* DGx to check if */
! 1, CMP, DUM, 1, DG1, -1, -1, DUM /* a digit follows.
*/
! #if 0
/* S_N */ CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
/* S_I */ CMP, -1, -1, CMP, 1, LEN, LEN, CMP,
***************
*** 92,97 ****
--- 204,210 ----
CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
/* S_Z */ CMP, 1, 1, CMP, -1, CMP, CMP, CMP,
-1, CMP, CMP, CMP
+ #endif
};
if (p1 == p2)
***************
*** 100,116 ****
c1 = *p1++;
c2 = *p2++;
/* Hint: '0' is a digit too. */
! state = S_N | ((c1 == '0') + (ISDIGIT (c1) != 0));
while ((diff = c1 - c2) == 0 && c1 != '\0')
{
state = next_state[state];
c1 = *p1++;
c2 = *p2++;
! state |= (c1 == '0') + (ISDIGIT (c1) != 0);
}
! state = result_type[state << 2 | ((c2 == '0') + (ISDIGIT (c2) != 0))];
switch (state)
{
--- 213,230 ----
c1 = *p1++;
c2 = *p2++;
/* Hint: '0' is a digit too. */
! state = S_N | ((c1 == '0') + (ISDIGIT (c1) != 0) + ((c1 == '.')? 3: 0));
while ((diff = c1 - c2) == 0 && c1 != '\0')
{
state = next_state[state];
c1 = *p1++;
c2 = *p2++;
! state |= (c1 == '0') + (ISDIGIT (c1) != 0) + ((c1 == '.')? 3: 0);
}
! state = result_type [state << 2 |
! ((c2 == '0') + (ISDIGIT (c2) != 0) + ((c2 == '.')? 3: 0))];
switch (state)
{
***************
*** 123,128 ****
--- 237,247 ----
return 1;
return ISDIGIT (*p2) ? -1 : diff;
+
+ case DG1:
+ return (ISDIGIT (*p1))? 1: diff;
+ case DG2:
+ return (ISDIGIT (*p2))? -1: diff;
default:
return state;
#### The second patch ####
diff -rc fileutils-4.0.41-mod/lib/strverscmp.c
fileutils-4.0.41-mod2/lib/strverscmp.c
*** fileutils-4.0.41-mod/lib/strverscmp.c Thu Mar 8 03:45:45 2001
--- fileutils-4.0.41-mod2/lib/strverscmp.c Thu Mar 8 03:48:43 2001
***************
*** 32,37 ****
--- 32,38 ----
#define S_F 0x8
/* #define S_Z 0xC */
#define S_P 0xC
+ #define S_X 0x10 /* Leading zeros which has no period preceding */
/* result_type: CMP: return diff; LEN: compare using len_diff/diff */
#define CMP 2
***************
*** 64,70 ****
equal to or greater than S2 (for more info, see the texinfo doc).
*/
! /* MODIFIED VERSION:
The following is the ordering of the original version sorting.
a.x
--- 65,71 ----
equal to or greater than S2 (for more info, see the texinfo doc).
*/
! /* Second MODIFIED VERSION:
The following is the ordering of the original version sorting.
a.x
***************
*** 86,99 ****
xyz-4.0a.tgz
xyz.tgz
! But I think this is not so good, and suggest a new clear definition
! of order. And, please note that the last three filenames are
! examples in `doc/fileutils.texi' and that the order of them is
! different from the description there.
1. Treat each locally maximal successive sequence of digits and
! periods (\X[0-9]+(\.[0-9]+)*X) as a `character'. We call it as
! LISTDIGSEQ, and each sequence of digits in LISTDIGSEQ as DIGSEQ.
2. The numerical value of a DIGSEQ is considered as a fractional
one if it begins with `0'. But in fractional cases, there exists
--- 87,99 ----
xyz-4.0a.tgz
xyz.tgz
! I suggest a new somewhat complicated ordering which gives more
! heuristic results.
1. Treat each locally maximal successive sequence of digits and
! periods \X((|\.)[0-9]+(\.[0-9]+)*)X as a `character'.
! We call it as LISTDIGSEQ, and each sequence of digits in
! LISTDIGSEQ as DIGSEQ.
2. The numerical value of a DIGSEQ is considered as a fractional
one if it begins with `0'. But in fractional cases, there exists
***************
*** 106,152 ****
10). Consider a DIGSEQ is an undecimal, or eleven-adic
expression (both in integer case and in fractional case).
! 3. The order between a DIGSEQ and a non-digit character CHAR is
! defined to be the same order between '0' and CHAR.
!
! 4. The order between two DIGSEQs is defined to be the same order as
numerical values of two DIGSEQs.
! 5. The order between two LISTDIGSEQs is defined as follows:
! Compare each DIGSEQs in LISTDIGSEQs from the beginning.
! When you meet first differnet DIGSEQs, then the larger is the
! one which have the larger DIGSEQs. And when you find a
! LISTDIGSEQ is a concatenated list of the other and some list,
! then the larger LISTDIGSEQ is the longer one.
6. Compare strings according to the order of `character's.
! Original Modified
! version sorting. version sorting.
!
! a.x a.x
! a00.x a0a.x
! a01.x a00.x
! a012.x a01.x
! a01a.x a01a.x
! a0a.x a012.x
! aa.x aa.x
! abc-1.007.1.tgz abc-1.007.tgz
! abc-1.007.tgz abc-1.007a.tgz
! abc-1.007a.tgz abc-1.007.1.tgz
! abc-1.012b.tgz abc-1.01a.tgz
! abc-1.01a.tgz abc-1.012b.tgz
! xyz-4.0.41.tgz xyz-4.0.tgz
! xyz-4.0.tgz xyz-4.0a.5.tgz
! xyz-4.0a.5.tgz xyz-4.0a.41.tgz
! xyz-4.0a.41.tgz xyz-4.0a.tgz
! xyz-4.0a.tgz xyz-4.0.41.tgz
! xyz.tgz xyz.tgz
!
! I think this modified version does somewhat better, though not
! completely. But, there is still a problem, if you use version
! numbering like `xyz-4.0a.5.tgz', `xyz-4.0a.41.tgz'...
*/
int
--- 106,185 ----
10). Consider a DIGSEQ is an undecimal, or eleven-adic
expression (both in integer case and in fractional case).
! 3. The order between two DIGSEQs is defined to be the same order as
numerical values of two DIGSEQs.
! 4. The order between a LISTDIGSEQ and a non-digit character CHAR is
! defined to be the same order between the first character of the
! LISTDIGSEG and the CHAR except both ones are periods. When both
! ones are periods, the LISTDIGSEQ is the larger.
!
! 5. The order between two LISTDIGSEQs is defined as follows. If one
! LISTDIGSEQ begins with a period and the other does not, the
! larger is the second one (Note that '.' < '0').
! If both LISTDIGSEQ do not begin with any period, and the first
! DIGSEQ of one of them consists only of '0's and is a prefix of
! the other one, then the larger one is the first one.
!
! Otherwise, compare each DIGSEQs in LISTDIGSEQs from the beginning.
! When you meet different DIGSEQs, then the larger is the one which
! have the larger DIGSEQs. And when you find a LISTDIGSEQ is a
! concatenated list of the other and some list, then the larger
! LISTDIGSEQ is the longer one.
6. Compare strings according to the order of `character's.
! Original Modified Second modified
! version sorting. version sorting. version sorting.
+ 000README 0README 000README
+ 00README 00README 00README
+ 01README 000README 01README
+ 0README 01README 0README
+ a.x a.x a.x
+ a000.1.tgz a0a.x a000.1.tgz
+ a00.1.tgz a00.x a00.x
+ a00.x a00.1.tgz a00.1.tgz
+ a01.x a000.1.tgz a01.x
+ a012.x a01.x a01a.x
+ a01a.x a01a.x a012.x
+ a0a.x a012.x a0a.x
+ abc-1.007.1.tgz abc-1.00.8.tgz abc-1.00.8.tgz
+ abc-1.007.tgz abc-1.007.tgz abc-1.007.tgz
+ abc-1.007a.tgz abc-1.007a.tgz abc-1.007a.tgz
+ abc-1.0081.tgz abc-1.007.1.tgz abc-1.007.1.tgz
+ abc-1.00813.tgz abc-1.0081.tgz abc-1.0081.tgz
+ abc-1.00.8.tgz abc-1.00813.tgz abc-1.00813.tgz
+ abc-1.012b.tgz abc-1.01a.tgz abc-1.01a.tgz
+ abc-1.01a.tgz abc-1.012b.tgz abc-1.012b.tgz
+ xyz-4.0. xyz-4.0. xyz-4.0.
+ xyz-4.0.41.tgz xyz-4.0.tgz xyz-4.0.tgz
+ xyz-4.0.tgz xyz-4.0a xyz-4.0a
+ xyz-4.0a xyz-4.0a+09.tgz xyz-4.0a+09.tgz
+ xyz-4.0a+09.tgz xyz-4.0a.tgz xyz-4.0a.tgz
+ xyz-4.0a.00.tgz xyz-4.0a.00.tgz xyz-4.0a.00.tgz
+ xyz-4.0a.01.tgz xyz-4.0a.01.tgz xyz-4.0a.01.tgz
+ xyz-4.0a.09.tgz xyz-4.0a.09.tgz xyz-4.0a.09.tgz
+ xyz-4.0a.091.tgz xyz-4.0a.09a.tgz xyz-4.0a.09a.tgz
+ xyz-4.0a.09a.tgz xyz-4.0a.091.tgz xyz-4.0a.091.tgz
+ xyz-4.0a.5.tgz xyz-4.0a.5.tgz xyz-4.0a.5.tgz
+ xyz-4.0a.41.tgz xyz-4.0a.41.tgz xyz-4.0a.41.tgz
+ xyz-4.0a.tgz xyz-4.0a0.tgz xyz-4.0a00.tgz
+ xyz-4.0a00.tgz xyz-4.0a00.tgz xyz-4.0a01.tgz
+ xyz-4.0a01.tgz xyz-4.0a01.tgz xyz-4.0a09.tgz
+ xyz-4.0a09.tgz xyz-4.0a09.tgz xyz-4.0a09a.tgz
+ xyz-4.0a091.tgz xyz-4.0a09a.tgz xyz-4.0a091.tgz
+ xyz-4.0a09a.tgz xyz-4.0a091.tgz xyz-4.0a0.tgz
+ xyz-4.0a0.tgz xyz-4.0a5.tgz xyz-4.0a5.tgz
+ xyz-4.0a5.tgz xyz-4.0a13.tgz xyz-4.0a13.tgz
+ xyz-4.0a13.tgz xyz-4.0.41.tgz xyz-4.0.41.tgz
+ xyz.tgz xyz.tgz xyz.tgz
+
+ I feel this modified version does somewhat better though the rule
+ is complicated... How do you think?
+ I guess that the special treatment around leading zeros in original
+ strverscmp is implemented to make ordering like 00README < 0README.
+ Now I take this into account.
*/
int
***************
*** 165,179 ****
/* state x d 0 . */
/* The first zero digit means fractional state,
the first non-zero digit means integral state. */
! /* S_N */ S_N, S_I, S_F, S_P,
/* The successive digit means integral in integral state. */
- /* The period after integral should be remembered. */
/* S_I */ S_N, S_I, S_I, S_P,
/* The successive digit means fractional in fractional state. */
- /* The period after fractional should be remembered. */
/* S_F */ S_N, S_F, S_F, S_P,
! /* After the period, same as S_N. */
! /* S_P */ S_N, S_I, S_F, S_P
#if 0
/* S_N */ S_N, S_I, S_Z, S_N,
/* S_I */ S_N, S_I, S_I, S_I,
--- 198,212 ----
/* state x d 0 . */
/* The first zero digit means fractional state,
the first non-zero digit means integral state. */
! /* S_N */ S_N, S_I, S_X, S_P,
/* The successive digit means integral in integral state. */
/* S_I */ S_N, S_I, S_I, S_P,
/* The successive digit means fractional in fractional state. */
/* S_F */ S_N, S_F, S_F, S_P,
! /* After a period */
! /* S_P */ S_N, S_I, S_F, S_P,
! /* Leading zeros which has no period preceding */
! /* S_X */ S_N, S_F, S_X, S_P
#if 0
/* S_N */ S_N, S_I, S_Z, S_N,
/* S_I */ S_N, S_I, S_I, S_I,
***************
*** 194,200 ****
/* S_F */ CMP, -1, -1, DG2, 1, CMP, CMP, 1, /* No need LEN cmp */
1, CMP, DUM, 1, DG1, -1, -1, DUM,
/* S_P */ CMP, -1, -1, DG2, 1, LEN, CMP, 1, /* DGx to check if */
! 1, CMP, DUM, 1, DG1, -1, -1, DUM /* a digit follows.
*/
#if 0
/* S_N */ CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
--- 227,235 ----
/* S_F */ CMP, -1, -1, DG2, 1, CMP, CMP, 1, /* No need LEN cmp */
1, CMP, DUM, 1, DG1, -1, -1, DUM,
/* S_P */ CMP, -1, -1, DG2, 1, LEN, CMP, 1, /* DGx to check if */
! 1, CMP, DUM, 1, DG1, -1, -1, DUM, /* a digit follows.
*/
! /* S_X */ CMP, 1, 1, CMP, -1, CMP, CMP, -1,
! -1, CMP, DUM, -1, CMP, 1, 1, DUM
#if 0
/* S_N */ CMP, CMP, CMP, CMP, CMP, LEN, CMP, CMP,
CMP, CMP, CMP, CMP, CMP, CMP, CMP, CMP,
#### End ####