|
From: | Paul Eggert |
Subject: | bug#21604: grep doesn't match diacritical chars in ISO-8859 files |
Date: | Fri, 2 Oct 2015 13:01:04 -0700 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 |
On 10/02/2015 02:43 AM, Santiago Ruano Rincón wrote:
grep doesn't match characters with diacritical marks in ISO-8859 files, inside a Unicode enviroment
That is normal and expected behavior. In a UTF-8 locale, "á" is represented by the two bytes 0xC3 and 0xA1. In an ISO-8859 file, the same character is represented by the single byte 0xE1. The UTF-8 pattern won't match the ISO-8859 representation.
To avoid this problem, switch to an ISO-8859 locale before using grep to read ISO-8859 text files. This is true for pretty much any standard utility, not just grep. Alternatively, you can translate the text files from ISO-8859 to UTF-8, before giving the resulting text to grep or to other utilities.
[Prev in Thread] | Current Thread | [Next in Thread] |