[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [groff] address@hidden: mom: PDF Author, pdfmom: needs C locale?]
From: |
Ralph Corderoy |
Subject: |
Re: [groff] address@hidden: mom: PDF Author, pdfmom: needs C locale?] |
Date: |
Fri, 09 Mar 2018 16:09:35 +0000 |
Hi Deri,
> I've got an example which is meant to show the problem (camus.mom),
> but unfortunately I can't make it generate the error which others are
> seeing. Camus.mom is a utf-8 file and I have used -k in a utf-8 user
> account (LC_CTYPE=en_GB.UTF-8) and with -Kutf8 in an old style account
> (LC_CTYPE=en_GB), neither produced an error from grep.
>
> This leads me to suspect there is something in my version of grep
> which "understands" that UTF-8 files are not binary data.
That seems unlikely. grep thinks files are binary if they contain ASCII
NUL, or have a byte sequence that's invalid for the locale, and it only
emits that `Binary file ... matches' if such a line matches the regexp.
Does your grep behave like this? I used a UTF-8 terminal.
$ od -tx1z <<<$'x\xa0\xa0y'
0000000 78 a0 a0 79 0a >x..y.<
0000005
$ LC_ALL=en_GB.utf8 grep z <<<$'x\xa0\xa0y'
$ LC_ALL=en_GB.utf8 grep z <<<$'x\xa0\xa0y\nz'
z
$ LC_ALL=en_GB.utf8 grep y <<<$'x\xa0\xa0y'
Binary file (standard input) matches
$ LC_ALL=en_GB.iso88591 grep y <<<$'x\xa0\xa0y'
x��y
$ LC_ALL=C grep y <<<$'x\xa0\xa0y'
x��y
$
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy