findutils-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Findutils-patches] [PATCH] updatedb: run in the C locale, don't do


From: Eric Blake
Subject: Re: [Findutils-patches] [PATCH] updatedb: run in the C locale, don't do case-folding.
Date: Wed, 13 Jan 2016 14:39:14 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

On 01/09/2016 02:18 PM, James Youngman wrote:
> * locate/updatedb.sh: Set LC_ALL to C to avoid unexpected character
> encodings in path names causing sort to fail (idea from Clarence
> Risher).  Don't do case-folding, since the character set in now C,
> which is likely inconsistent with the user's expectations anyway.
> Honour $TMPDIR. Correct the error message you get if you specify
> both --old-format and --dbformat.
> * NEWS: Explain these changes.
> ---
>  NEWS               |  7 +++++++
>  locate/updatedb.sh | 33 ++++++++++++++++++++++++---------
>  2 files changed, 31 insertions(+), 9 deletions(-)
> 

> +++ b/locate/updatedb.sh
> @@ -31,6 +31,19 @@ There is NO WARRANTY, to the extent permitted by law.
>  Written by Eric B. Decker, James Youngman, and Kevin Dalley.
>  '
>  
> +# File path names are not actually text, anyway (since there is no
> +# mechanism to enforce any constraint that the basename of a
> +# subdirectory has the same character encoding as the basename of its
> +# parent).  The practical effect is that, depending on the way a
> +# oarticular system is configured and the content of its filesystem,
> +# passing all the file names in the system through "sort" may generate
> +# character encoding errors in text-based tools like "sort".  To avoid
> +# this, we set LC_ALL=C.  This will, presumably, not work perfectly on
> +# systems where LC_ALL is not the way to do locale configuration or
> +# some other seting can override this.
> +LC_ALL=C
> +export LC_ALL
> +
>  
>  usage="\
>  Usage: $0 [--findoptions='-option1 -option2...']
> @@ -75,7 +88,7 @@ done
>  
>  case "${dbformat:+yes}_${old}" in
>      yes_yes)
> -     echo "The --dbformat and --old cannot both be specified." >&2
> +     echo "The --dbformat and --old-format cannot both be specified." >&2

Do we ever want to allow translation of error messages spat out by
updatedb?  If so, we can't globally set LC_ALL=C, but instead have to do
it piecemeal at a time on any operations we are doing that should not
leak to the user, while honoring the user's locale for operations that
produce text back to the user.  Which is obviously trickier to do.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]