grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to html_node/Matching-Non_002dASCII.html


From: Jim Meyering
Subject: Changes to html_node/Matching-Non_002dASCII.html
Date: Sun, 27 Sep 2020 23:36:54 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 20/09/27 23:36:49

Index: html_node/Matching-Non_002dASCII.html
===================================================================
RCS file: html_node/Matching-Non_002dASCII.html
diff -N html_node/Matching-Non_002dASCII.html
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ html_node/Matching-Non_002dASCII.html       28 Sep 2020 03:36:49 -0000      
1.1
@@ -0,0 +1,116 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<!-- This manual is for grep, a pattern matching engine.
+
+Copyright (C) 1999-2002, 2005, 2008-2020 Free Software Foundation,
+Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts.  A copy of the license is included in the section entitled
+"GNU Free Documentation License". -->
+<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<title>Matching Non-ASCII (GNU Grep 3.5)</title>
+
+<meta name="description" content="Matching Non-ASCII (GNU Grep 3.5)">
+<meta name="keywords" content="Matching Non-ASCII (GNU Grep 3.5)">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Index.html#Index" rel="index" title="Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="Regular-Expressions.html#Regular-Expressions" rel="up" 
title="Regular Expressions">
+<link href="Usage.html#Usage" rel="next" title="Usage">
+<link href="Character-Encoding.html#Character-Encoding" rel="prev" 
title="Character Encoding">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.indentedblock {margin-right: 0em}
+blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style: oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nolinebreak {white-space: nowrap}
+span.roman {font-family: initial; font-weight: normal}
+span.sansserif {font-family: sans-serif; font-weight: normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css">
+
+
+</head>
+
+<body lang="en">
+<a name="Matching-Non_002dASCII"></a>
+<div class="header">
+<p>
+Previous: <a href="Character-Encoding.html#Character-Encoding" accesskey="p" 
rel="prev">Character Encoding</a>, Up: <a 
href="Regular-Expressions.html#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="index.html#SEC_Contents" 
title="Table of contents" rel="contents">Contents</a>][<a 
href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></a>
+<h3 class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3>
+<a name="index-non_002dASCII-matching"></a>
+<a name="index-non_002dprintable-matching"></a>
+
+<p>In a regular expression, non-ASCII and non-printable characters other
+than newline are not special, and represent themselves.  For example,
+in a locale using UTF-8 the command &lsquo;<samp>grep 
'Λ&nbsp;ω'</samp>&rsquo; (where the
+white space between &lsquo;<samp>Λ</samp>&rsquo; and the 
&lsquo;<samp>ω</samp>&rsquo; is a tab character)
+searches for &lsquo;<samp>Λ</samp>&rsquo; (Unicode character U+039B GREEK 
CAPITAL LETTER
+LAMBDA), followed by a tab (U+0009 TAB), followed by 
&lsquo;<samp>ω</samp>&rsquo; (U+03C9
+GREEK SMALL LETTER OMEGA).
+</p>
+<p>Suppose you want to limit your pattern to only printable characters
+(or even only printable ASCII characters) to keep your script readable
+or portable, but you also want to match specific non-ASCII or non-null
+non-printable characters.  If you are using the <samp>-P</samp>
+(<samp>--perl-regexp</samp>) option, PCREs give you several ways to do
+this.  Otherwise, if you are using Bash, the GNU project&rsquo;s shell, you
+can represent these characters via ANSI-C quoting.  For example, the
+Bash commands &lsquo;<samp>grep $'Λ\tω'</samp>&rsquo; and &lsquo;<samp>grep 
$'\u039B\t\u03C9'</samp>&rsquo;
+both search for the same three-character string 
&lsquo;<samp>Λ&nbsp;ω</samp>&rsquo;
+mentioned earlier.  However, because Bash translates ANSI-C quoting
+before <code>grep</code> sees the pattern, this technique should not be
+used to match printable ASCII characters; for example, &lsquo;<samp>grep
+$'\u005E'</samp>&rsquo; is equivalent to &lsquo;<samp>grep '^'</samp>&rsquo; 
and matches any line, not
+just lines containing the character &lsquo;<samp>^</samp>&rsquo; (U+005E 
CIRCUMFLEX
+ACCENT).
+</p>
+<p>Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
+shell scripts written in ASCII should use other methods to match
+specific non-ASCII characters.  For example, in a UTF-8 locale the
+command &lsquo;<samp>grep &quot;$(printf 
'\316\233\t\317\211\n')&quot;</samp>&rsquo; is a portable
+albeit hard-to-read alternative to Bash&rsquo;s &lsquo;<samp>grep 
$'Λ\tω'</samp>&rsquo;.
+However, none of these techniques will let you put a null character
+directly into a command-line pattern; null characters can appear only
+in a pattern specified via the <samp>-f</samp> (<samp>--file</samp>) option.
+</p>
+<hr>
+<div class="header">
+<p>
+Previous: <a href="Character-Encoding.html#Character-Encoding" accesskey="p" 
rel="prev">Character Encoding</a>, Up: <a 
href="Regular-Expressions.html#Regular-Expressions" accesskey="u" 
rel="up">Regular Expressions</a> &nbsp; [<a href="index.html#SEC_Contents" 
title="Table of contents" rel="contents">Contents</a>][<a 
href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]