[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4115-g696533f
From: |
Arnold Robbins |
Subject: |
[SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4115-g696533f |
Date: |
Sun, 13 Sep 2020 14:02:32 -0400 (EDT) |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".
The branch, gawk-5.1-stable has been updated
via 696533f3f4a3b2dd36042eac5e1dce6f9bcec129 (commit)
from 7d08b2cd4dd4af16bac395bf1b93d7ed03cdca09 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=696533f3f4a3b2dd36042eac5e1dce6f9bcec129
commit 696533f3f4a3b2dd36042eac5e1dce6f9bcec129
Author: Arnold D. Robbins <arnold@skeeve.com>
Date: Sun Sep 13 21:02:08 2020 +0300
Update egrep program to be POSIX compliant.
diff --git a/awklib/eg/prog/egrep.awk b/awklib/eg/prog/egrep.awk
index a4165a9..8ec5105 100644
--- a/awklib/eg/prog/egrep.awk
+++ b/awklib/eg/prog/egrep.awk
@@ -2,58 +2,67 @@
#
# Arnold Robbins, arnold@skeeve.com, Public Domain
# May 1993
+# Revised September 2020
# Options:
# -c count of lines
-# -s silent - use exit value
-# -v invert test, success if no match
+# -e argument is pattern
# -i ignore case
# -l print filenames only
-# -e argument is pattern
+# -n add line number to output
+# -q quiet - use exit value
+# -s silent - don't print errors
+# -v invert test, success if no match
+# -x the entire line must match
#
-# Requires getopt and file transition library functions
+# Requires getopt library function
+# Uses IGNORECASE, BEGINFILE and ENDFILE
+# Invoke using gawk -f egrep.awk -- options ...
BEGIN {
- while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
+ while ((c = getopt(ARGC, ARGV, "ce:ilnqsvx")) != -1) {
if (c == "c")
count_only++
- else if (c == "s")
- no_print++
- else if (c == "v")
- invert++
+ else if (c == "e")
+ pattern = Optarg
else if (c == "i")
IGNORECASE = 1
else if (c == "l")
filenames_only++
- else if (c == "e")
- pattern = Optarg
+ else if (c == "n")
+ line_numbers++
+ else if (c == "q")
+ no_print++
+ else if (c == "s")
+ no_errors++
+ else if (c == "v")
+ invert++
+ else if (c == "x")
+ full_line++
else
usage()
}
if (pattern == "")
pattern = ARGV[Optind++]
+ if (pattern == "")
+ usage()
+
for (i = 1; i < Optind; i++)
ARGV[i] = ""
+
if (Optind >= ARGC) {
ARGV[1] = "-"
ARGC = 2
} else if (ARGC - Optind > 1)
do_filenames++
-
-# if (IGNORECASE)
-# pattern = tolower(pattern)
}
-#{
-# if (IGNORECASE)
-# $0 = tolower($0)
-#}
-function beginfile(junk)
-{
+BEGINFILE {
fcount = 0
+ if (ERRNO && no_errors)
+ nextfile
}
-function endfile(file)
-{
+ENDFILE {
if (! no_print && count_only) {
if (do_filenames)
print file ":" fcount
@@ -64,7 +73,10 @@ function endfile(file)
total += fcount
}
{
- matches = ($0 ~ pattern)
+ matches = match($0, pattern)
+ if (matches && full_line && (RSTART != 1 || RLENGTH != length()))
+ matches = 0
+
if (invert)
matches = ! matches
@@ -83,7 +95,10 @@ function endfile(file)
}
if (do_filenames)
- print FILENAME ":" $0
+ if (line_numbers)
+ print FILENAME ":" FNR ":" $0
+ else
+ print FILENAME ":" $0
else
print
}
@@ -93,7 +108,7 @@ END {
}
function usage()
{
- print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
- print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
+ print("Usage:\tegrep [-cilnqsvx] [-e pat] [files ...]") > "/dev/stderr"
+ print("\tegrep [-cilnqsvx] pat [files ...]") > "/dev/stderr"
exit 1
}
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 765ce40..8ecc89e 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,8 @@
+2020-09-13 Arnold D. Robbins <arnold@skeeve.com>
+
+ * gawktexi.in (Egrep Program): Improve to be POSIX compliant.
+ Update explanatory text as well.
+
2020-09-11 Arnold D. Robbins <arnold@skeeve.com>
* gawktexi.in (Id Program): Rewrite to be POSIX compliant.
diff --git a/doc/gawk.info b/doc/gawk.info
index 4e90d5b..3624b8c 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -17877,9 +17877,24 @@ File: gawk.info, Node: Egrep Program, Next: Id
Program, Prev: Cut Program, U
11.2.2 Searching for Regular Expressions in Files
-------------------------------------------------
-The 'egrep' utility searches files for patterns. It uses regular
-expressions that are almost identical to those available in 'awk' (*note
-Regexp::). You invoke it as follows:
+The 'grep' family of programs searches files for patterns. These
+programs have an unusual history. Initially there was 'grep' (Global
+Regular Expression Print), which used what are now called Basic Regular
+Expressions (BREs). Later there was 'egrep' (Extended 'grep') which
+used what are now called Extended Regular Expressions (EREs). (These
+are almost identical to those available in 'awk'; *note Regexp::).
+There was also 'fgrep' (Fast 'grep'), which searched for matches of one
+more fixed strings.
+
+ POSIX chose to combine these three programs into one, simply named
+'grep'. On a POSIX system, 'grep''s default behavior is to search using
+BREs. You use '-E' to specify the use of EREs, and '-F' to specify
+searching for fixed strings.
+
+ In practice, systems continue to come with separate 'egrep' and
+'fgrep' utilities, for backwards compatibilty. This minor node provides
+an 'awk' implementation of 'egrep', which supports all of the
+POSIX-mandated options. You invoke it as follows:
'egrep' [OPTIONS] ''PATTERN'' FILES ...
@@ -17892,17 +17907,12 @@ line, each output line is preceded by the name of the
file and a colon.
The options to 'egrep' are as follows:
'-c'
- Print out a count of the lines that matched the pattern, instead of
- the lines themselves.
+ Print a count of the lines that matched the pattern, instead of the
+ lines themselves.
-'-s'
- Be silent. No output is produced and the exit value indicates
- whether the pattern was matched.
-
-'-v'
- Invert the sense of the test. 'egrep' prints the lines that do
- _not_ match the pattern and exits successfully if the pattern is
- not matched.
+'-e PATTERN'
+ Use PATTERN as the regexp to match. The purpose of the '-e' option
+ is to allow patterns that start with a '-'.
'-i'
Ignore case distinctions in both the pattern and the input data.
@@ -17911,15 +17921,28 @@ line, each output line is preceded by the name of the
file and a colon.
Only print (list) the names of the files that matched, not the
lines that matched.
-'-e PATTERN'
- Use PATTERN as the regexp to match. The purpose of the '-e' option
- is to allow patterns that start with a '-'.
+'-q'
+ Be quiet. No output is produced and the exit value indicates
+ whether the pattern was matched.
+
+'-s'
+ Be silent. Do not print error messages for files that could not be
+ opened.
+
+'-v'
+ Invert the sense of the test. 'egrep' prints the lines that do
+ _not_ match the pattern and exits successfully if the pattern is
+ not matched.
+
+'-x'
+ Match the entire input line in order to consider the match as
+ having succeeded.
This version uses the 'getopt()' library function (*note Getopt
-Function::) and the file transition library program (*note Filetrans
-Function::).
+Function::) and 'gawk''s 'BEGINFILE' and 'ENDFILE' special patterns
+(*note BEGINFILE/ENDFILE::).
- The program begins with a descriptive comment and then a 'BEGIN' rule
+ The program begins with descriptive comments and then a 'BEGIN' rule
that processes the command-line arguments with 'getopt()'. The '-i'
(ignore case) option is particularly easy with 'gawk'; we just use the
'IGNORECASE' predefined variable (*note Built-in Variables::):
@@ -17928,92 +17951,98 @@ that processes the command-line arguments with
'getopt()'. The '-i'
#
# Options:
# -c count of lines
- # -s silent - use exit value
- # -v invert test, success if no match
+ # -e argument is pattern
# -i ignore case
# -l print filenames only
- # -e argument is pattern
+ # -n add line number to output
+ # -q quiet - use exit value
+ # -s silent - don't print errors
+ # -v invert test, success if no match
+ # -x the entire line must match
#
- # Requires getopt and file transition library functions
+ # Requires getopt library function
+ # Uses IGNORECASE, BEGINFILE and ENDFILE
+ # Invoke using gawk -f egrep.awk -- options ...
BEGIN {
- while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
+ while ((c = getopt(ARGC, ARGV, "ce:ilnqsvx")) != -1) {
if (c == "c")
count_only++
- else if (c == "s")
- no_print++
- else if (c == "v")
- invert++
+ else if (c == "e")
+ pattern = Optarg
else if (c == "i")
IGNORECASE = 1
else if (c == "l")
filenames_only++
- else if (c == "e")
- pattern = Optarg
+ else if (c == "n")
+ line_numbers++
+ else if (c == "q")
+ no_print++
+ else if (c == "s")
+ no_errors++
+ else if (c == "v")
+ invert++
+ else if (c == "x")
+ full_line++
else
usage()
}
+Note the comment about invocation: Because several of the options
+overlap with 'gawk''s, a '--' is needed to tell 'gawk' to stop looking
+for options.
+
Next comes the code that handles the 'egrep'-specific behavior. If
no pattern is supplied with '-e', the first nonoption on the command
-line is used. The 'awk' command-line arguments up to 'ARGV[Optind]' are
-cleared, so that 'awk' won't try to process them as files. If no files
-are specified, the standard input is used, and if multiple files are
-specified, we make sure to note this so that the file names can precede
-the matched lines in the output:
+line is used. If the pattern is empty, that means no pattern was
+supplied, so it's necessary to print an error message and exit. The
+'awk' command-line arguments up to 'ARGV[Optind]' are cleared, so that
+'awk' won't try to process them as files. If no files are specified,
+the standard input is used, and if multiple files are specified, we make
+sure to note this so that the file names can precede the matched lines
+in the output:
if (pattern == "")
pattern = ARGV[Optind++]
+ if (pattern == "")
+ usage()
+
for (i = 1; i < Optind; i++)
ARGV[i] = ""
+
if (Optind >= ARGC) {
ARGV[1] = "-"
ARGC = 2
} else if (ARGC - Optind > 1)
do_filenames++
-
- # if (IGNORECASE)
- # pattern = tolower(pattern)
}
- The last two lines are commented out, as they are not needed in
-'gawk'. They should be uncommented if you have to use another version
-of 'awk'.
-
- The next set of lines should be uncommented if you are not using
-'gawk'. This rule translates all the characters in the input line into
-lowercase if the '-i' option is specified.(1) The rule is commented out
-as it is not necessary with 'gawk':
-
- #{
- # if (IGNORECASE)
- # $0 = tolower($0)
- #}
+ The 'BEGINFILE' rule executes when each new file is processed. In
+this case, it is fairly simple; it initializes a variable 'fcount' to
+zero. 'fcount' tracks how many lines in the current file matched the
+pattern.
- The 'beginfile()' function is called by the rule in 'ftrans.awk' when
-each new file is processed. In this case, it is very simple; all it
-does is initialize a variable 'fcount' to zero. 'fcount' tracks how
-many lines in the current file matched the pattern. Naming the
-parameter 'junk' shows we know that 'beginfile()' is called with a
-parameter, but that we're not interested in its value:
+ Here also is where we implement the '-s' option. We check if 'ERRNO'
+has been set, and if '-s' was supplied. In that case, it's necessary to
+move on to the next file. Otherwise 'gawk' would exit with an error:
- function beginfile(junk)
- {
+ BEGINFILE {
fcount = 0
+ if (ERRNO && no_errors)
+ nextfile
}
- The 'endfile()' function is called after each file has been
-processed. It affects the output only when the user wants a count of
-the number of lines that matched. 'no_print' is true only if the exit
-status is desired. 'count_only' is true if line counts are desired.
-'egrep' therefore only prints line counts if printing and counting are
-enabled. The output format must be adjusted depending upon the number
-of files to process. Finally, 'fcount' is added to 'total', so that we
-know the total number of lines that matched the pattern:
+ The 'ENDFILE' rule executes after each file has been processed. It
+affects the output only when the user wants a count of the number of
+lines that matched. 'no_print' is true only if the exit status is
+desired. 'count_only' is true if line counts are desired. 'egrep'
+therefore only prints line counts if printing and counting are enabled.
+The output format must be adjusted depending upon the number of files to
+process. Finally, 'fcount' is added to 'total', so that we know the
+total number of lines that matched the pattern:
- function endfile(file)
- {
+ ENDFILE {
if (! no_print && count_only) {
if (do_filenames)
print file ":" fcount
@@ -18024,18 +18053,18 @@ know the total number of lines that matched the
pattern:
total += fcount
}
- The 'BEGINFILE' and 'ENDFILE' special patterns (*note
-BEGINFILE/ENDFILE::) could be used, but then the program would be
-'gawk'-specific. Additionally, this example was written before 'gawk'
-acquired 'BEGINFILE' and 'ENDFILE'.
-
The following rule does most of the work of matching lines. The
-variable 'matches' is true if the line matched the pattern. If the user
-wants lines that did not match, the sense of 'matches' is inverted using
-the '!' operator. 'fcount' is incremented with the value of 'matches',
-which is either one or zero, depending upon a successful or unsuccessful
-match. If the line does not match, the 'next' statement just moves on
-to the next record.
+variable 'matches' is true (non-zero) if the line matched the pattern.
+If the user specified that the entire line must match (with '-x'), the
+code checks this condition by looking at the values of 'RSTART' and
+'RLENGTH'. If those indicate that the match is not over the full line,
+'matches' is set to zero (false).
+
+ If the user wants lines that did not match, the sense of 'matches' is
+inverted using the '!' operator. 'fcount' is incremented with the value
+of 'matches', which is either one or zero, depending upon a successful
+or unsuccessful match. If the line does not match, the 'next' statement
+just moves on to the next input line.
A number of additional tests are made, but they are only done if we
are not counting lines. First, if the user only wants the exit status
@@ -18043,10 +18072,14 @@ are not counting lines. First, if the user only
wants the exit status
file matched, and we can skip on to the next file with 'nextfile'.
Similarly, if we are only printing file names, we can print the file
name, and then skip to the next file with 'nextfile'. Finally, each
-line is printed, with a leading file name and colon if necessary:
+line is printed, with a leading file name, optional colon and line
+number, and the final colon if necessary:
{
- matches = ($0 ~ pattern)
+ matches = match($0, pattern)
+ if (matches && full_line && (RSTART != 1 || RLENGTH != length()))
+ matches = 0
+
if (invert)
matches = ! matches
@@ -18065,7 +18098,10 @@ line is printed, with a leading file name and colon if
necessary:
}
if (do_filenames)
- print FILENAME ":" $0
+ if (line_numbers)
+ print FILENAME ":" FNR ":" $0
+ else
+ print FILENAME ":" $0
else
print
}
@@ -18083,16 +18119,11 @@ options, and then exits:
function usage()
{
- print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
- print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
+ print("Usage:\tegrep [-cilnqsvx] [-e pat] [files ...]") >
"/dev/stderr"
+ print("\tegrep [-cilnqsvx] pat [files ...]") > "/dev/stderr"
exit 1
}
- ---------- Footnotes ----------
-
- (1) It also introduces a subtle bug; if a match happens, we output
-the translated line, not the original.
-
File: gawk.info, Node: Id Program, Next: Split Program, Prev: Egrep
Program, Up: Clones
@@ -34152,7 +34183,7 @@ Index
* ! (exclamation point), !~ operator <7>: Expression Patterns.
(line 24)
* ! (exclamation point), ! operator <2>: Ranges. (line 47)
-* ! (exclamation point), ! operator <3>: Egrep Program. (line 174)
+* ! (exclamation point), ! operator <3>: Egrep Program. (line 204)
* " (double quote), in shell commands: Quoting. (line 54)
* " (double quote), in regexp constants: Computed Regexps. (line 30)
* # (number sign), #! (executable scripts): Executable Scripts.
@@ -35464,7 +35495,7 @@ Index
* effective user ID of gawk user: Auto-set. (line 180)
* egrep utility: Bracket Expressions. (line 34)
* egrep utility <1>: Egrep Program. (line 6)
-* egrep.awk program: Egrep Program. (line 53)
+* egrep.awk program: Egrep Program. (line 76)
* elements in arrays: Reference to Elements.
(line 6)
* elements in arrays, assigning values: Assigning Elements. (line 6)
@@ -35567,7 +35598,7 @@ Index
* exclamation point (!), !~ operator <6>: Precedence. (line 79)
* exclamation point (!), !~ operator <7>: Expression Patterns.
(line 24)
-* exclamation point (!), ! operator <2>: Egrep Program. (line 174)
+* exclamation point (!), ! operator <2>: Egrep Program. (line 204)
* exit debugger command: Miscellaneous Debugger Commands.
(line 64)
* exit statement: Exit Statement. (line 6)
@@ -37925,273 +37956,272 @@ Node: Running Examples726294
Node: Clones727022
Node: Cut Program728246
Node: Egrep Program738175
-Ref: Egrep Program-Footnote-1745687
-Node: Id Program745797
-Node: Split Program755743
-Ref: Split Program-Footnote-1759201
-Node: Tee Program759330
-Node: Uniq Program762120
-Node: Wc Program769684
-Ref: Wc Program-Footnote-1773939
-Node: Miscellaneous Programs774033
-Node: Dupword Program775246
-Node: Alarm Program777276
-Node: Translate Program782131
-Ref: Translate Program-Footnote-1786696
-Node: Labels Program786966
-Ref: Labels Program-Footnote-1790317
-Node: Word Sorting790401
-Node: History Sorting794473
-Node: Extract Program796698
-Node: Simple Sed804752
-Node: Igawk Program807826
-Ref: Igawk Program-Footnote-1822157
-Ref: Igawk Program-Footnote-2822359
-Ref: Igawk Program-Footnote-3822481
-Node: Anagram Program822596
-Node: Signature Program825658
-Node: Programs Summary826905
-Node: Programs Exercises828119
-Ref: Programs Exercises-Footnote-1832249
-Node: Advanced Features832335
-Node: Nondecimal Data834325
-Node: Array Sorting835916
-Node: Controlling Array Traversal836616
-Ref: Controlling Array Traversal-Footnote-1844984
-Node: Array Sorting Functions845102
-Ref: Array Sorting Functions-Footnote-1850193
-Node: Two-way I/O850389
-Ref: Two-way I/O-Footnote-1858110
-Ref: Two-way I/O-Footnote-2858297
-Node: TCP/IP Networking858379
-Node: Profiling861497
-Node: Advanced Features Summary870811
-Node: Internationalization872655
-Node: I18N and L10N874135
-Node: Explaining gettext874822
-Ref: Explaining gettext-Footnote-1880714
-Ref: Explaining gettext-Footnote-2880899
-Node: Programmer i18n881064
-Ref: Programmer i18n-Footnote-1886013
-Node: Translator i18n886062
-Node: String Extraction886856
-Ref: String Extraction-Footnote-1887988
-Node: Printf Ordering888074
-Ref: Printf Ordering-Footnote-1890860
-Node: I18N Portability890924
-Ref: I18N Portability-Footnote-1893380
-Node: I18N Example893443
-Ref: I18N Example-Footnote-1896718
-Ref: I18N Example-Footnote-2896791
-Node: Gawk I18N896900
-Node: I18N Summary897549
-Node: Debugger898890
-Node: Debugging899890
-Node: Debugging Concepts900331
-Node: Debugging Terms902140
-Node: Awk Debugging904715
-Ref: Awk Debugging-Footnote-1905660
-Node: Sample Debugging Session905792
-Node: Debugger Invocation906326
-Node: Finding The Bug907712
-Node: List of Debugger Commands914186
-Node: Breakpoint Control915519
-Node: Debugger Execution Control919213
-Node: Viewing And Changing Data922575
-Node: Execution Stack926116
-Node: Debugger Info927753
-Node: Miscellaneous Debugger Commands931824
-Node: Readline Support936886
-Node: Limitations937782
-Node: Debugging Summary940336
-Node: Namespaces941615
-Node: Global Namespace942726
-Node: Qualified Names944124
-Node: Default Namespace945123
-Node: Changing The Namespace945864
-Node: Naming Rules947478
-Node: Internal Name Management949326
-Node: Namespace Example950368
-Node: Namespace And Features952930
-Node: Namespace Summary954365
-Node: Arbitrary Precision Arithmetic955842
-Node: Computer Arithmetic957329
-Ref: table-numeric-ranges961095
-Ref: table-floating-point-ranges961588
-Ref: Computer Arithmetic-Footnote-1962246
-Node: Math Definitions962303
-Ref: table-ieee-formats965619
-Ref: Math Definitions-Footnote-1966222
-Node: MPFR features966327
-Node: FP Math Caution968045
-Ref: FP Math Caution-Footnote-1969117
-Node: Inexactness of computations969486
-Node: Inexact representation970446
-Node: Comparing FP Values971806
-Node: Errors accumulate973047
-Node: Getting Accuracy974480
-Node: Try To Round977190
-Node: Setting precision978089
-Ref: table-predefined-precision-strings978786
-Node: Setting the rounding mode980616
-Ref: table-gawk-rounding-modes980990
-Ref: Setting the rounding mode-Footnote-1984921
-Node: Arbitrary Precision Integers985100
-Ref: Arbitrary Precision Integers-Footnote-1988275
-Node: Checking for MPFR988424
-Node: POSIX Floating Point Problems989898
-Ref: POSIX Floating Point Problems-Footnote-1994183
-Node: Floating point summary994221
-Node: Dynamic Extensions996411
-Node: Extension Intro997964
-Node: Plugin License999230
-Node: Extension Mechanism Outline1000027
-Ref: figure-load-extension1000466
-Ref: figure-register-new-function1002031
-Ref: figure-call-new-function1003123
-Node: Extension API Description1005185
-Node: Extension API Functions Introduction1006898
-Ref: table-api-std-headers1008734
-Node: General Data Types1012983
-Ref: General Data Types-Footnote-11021613
-Node: Memory Allocation Functions1021912
-Ref: Memory Allocation Functions-Footnote-11026413
-Node: Constructor Functions1026512
-Node: API Ownership of MPFR and GMP Values1029978
-Node: Registration Functions1031291
-Node: Extension Functions1031991
-Node: Exit Callback Functions1037313
-Node: Extension Version String1038563
-Node: Input Parsers1039226
-Node: Output Wrappers1051947
-Node: Two-way processors1056459
-Node: Printing Messages1058724
-Ref: Printing Messages-Footnote-11059895
-Node: Updating ERRNO1060048
-Node: Requesting Values1060787
-Ref: table-value-types-returned1061524
-Node: Accessing Parameters1062460
-Node: Symbol Table Access1063697
-Node: Symbol table by name1064209
-Ref: Symbol table by name-Footnote-11067233
-Node: Symbol table by cookie1067361
-Ref: Symbol table by cookie-Footnote-11071546
-Node: Cached values1071610
-Ref: Cached values-Footnote-11075146
-Node: Array Manipulation1075299
-Ref: Array Manipulation-Footnote-11076390
-Node: Array Data Types1076427
-Ref: Array Data Types-Footnote-11079085
-Node: Array Functions1079177
-Node: Flattening Arrays1083675
-Node: Creating Arrays1090651
-Node: Redirection API1095418
-Node: Extension API Variables1098251
-Node: Extension Versioning1098962
-Ref: gawk-api-version1099391
-Node: Extension GMP/MPFR Versioning1101122
-Node: Extension API Informational Variables1102750
-Node: Extension API Boilerplate1103823
-Node: Changes from API V11107797
-Node: Finding Extensions1109369
-Node: Extension Example1109928
-Node: Internal File Description1110726
-Node: Internal File Ops1114806
-Ref: Internal File Ops-Footnote-11126156
-Node: Using Internal File Ops1126296
-Ref: Using Internal File Ops-Footnote-11128679
-Node: Extension Samples1128953
-Node: Extension Sample File Functions1130482
-Node: Extension Sample Fnmatch1138131
-Node: Extension Sample Fork1139618
-Node: Extension Sample Inplace1140836
-Node: Extension Sample Ord1144461
-Node: Extension Sample Readdir1145297
-Ref: table-readdir-file-types1146186
-Node: Extension Sample Revout1147253
-Node: Extension Sample Rev2way1147842
-Node: Extension Sample Read write array1148582
-Node: Extension Sample Readfile1150524
-Node: Extension Sample Time1151619
-Node: Extension Sample API Tests1153371
-Node: gawkextlib1153863
-Node: Extension summary1156781
-Node: Extension Exercises1160483
-Node: Language History1161725
-Node: V7/SVR3.11163381
-Node: SVR41165533
-Node: POSIX1166967
-Node: BTL1168348
-Node: POSIX/GNU1169077
-Node: Feature History1174855
-Node: Common Extensions1191174
-Node: Ranges and Locales1192457
-Ref: Ranges and Locales-Footnote-11197073
-Ref: Ranges and Locales-Footnote-21197100
-Ref: Ranges and Locales-Footnote-31197335
-Node: Contributors1197558
-Node: History summary1203555
-Node: Installation1204935
-Node: Gawk Distribution1205879
-Node: Getting1206363
-Node: Extracting1207326
-Node: Distribution contents1208964
-Node: Unix Installation1215444
-Node: Quick Installation1216126
-Node: Shell Startup Files1218540
-Node: Additional Configuration Options1219629
-Node: Configuration Philosophy1221944
-Node: Non-Unix Installation1224313
-Node: PC Installation1224773
-Node: PC Binary Installation1225611
-Node: PC Compiling1226046
-Node: PC Using1227163
-Node: Cygwin1230716
-Node: MSYS1231940
-Node: VMS Installation1232542
-Node: VMS Compilation1233333
-Ref: VMS Compilation-Footnote-11234562
-Node: VMS Dynamic Extensions1234620
-Node: VMS Installation Details1236305
-Node: VMS Running1238558
-Node: VMS GNV1242837
-Node: VMS Old Gawk1243572
-Node: Bugs1244043
-Node: Bug address1244706
-Node: Usenet1247688
-Node: Maintainers1248692
-Node: Other Versions1249877
-Node: Installation summary1256965
-Node: Notes1258174
-Node: Compatibility Mode1258968
-Node: Additions1259750
-Node: Accessing The Source1260675
-Node: Adding Code1262112
-Node: New Ports1268331
-Node: Derived Files1272706
-Ref: Derived Files-Footnote-11278366
-Ref: Derived Files-Footnote-21278401
-Ref: Derived Files-Footnote-31278999
-Node: Future Extensions1279113
-Node: Implementation Limitations1279771
-Node: Extension Design1280981
-Node: Old Extension Problems1282125
-Ref: Old Extension Problems-Footnote-11283643
-Node: Extension New Mechanism Goals1283700
-Ref: Extension New Mechanism Goals-Footnote-11287064
-Node: Extension Other Design Decisions1287253
-Node: Extension Future Growth1289366
-Node: Notes summary1289972
-Node: Basic Concepts1291130
-Node: Basic High Level1291811
-Ref: figure-general-flow1292093
-Ref: figure-process-flow1292778
-Ref: Basic High Level-Footnote-11296079
-Node: Basic Data Typing1296264
-Node: Glossary1299592
-Node: Copying1331477
-Node: GNU Free Documentation License1369020
-Node: Index1394140
+Node: Id Program747185
+Node: Split Program757131
+Ref: Split Program-Footnote-1760589
+Node: Tee Program760718
+Node: Uniq Program763508
+Node: Wc Program771072
+Ref: Wc Program-Footnote-1775327
+Node: Miscellaneous Programs775421
+Node: Dupword Program776634
+Node: Alarm Program778664
+Node: Translate Program783519
+Ref: Translate Program-Footnote-1788084
+Node: Labels Program788354
+Ref: Labels Program-Footnote-1791705
+Node: Word Sorting791789
+Node: History Sorting795861
+Node: Extract Program798086
+Node: Simple Sed806140
+Node: Igawk Program809214
+Ref: Igawk Program-Footnote-1823545
+Ref: Igawk Program-Footnote-2823747
+Ref: Igawk Program-Footnote-3823869
+Node: Anagram Program823984
+Node: Signature Program827046
+Node: Programs Summary828293
+Node: Programs Exercises829507
+Ref: Programs Exercises-Footnote-1833637
+Node: Advanced Features833723
+Node: Nondecimal Data835713
+Node: Array Sorting837304
+Node: Controlling Array Traversal838004
+Ref: Controlling Array Traversal-Footnote-1846372
+Node: Array Sorting Functions846490
+Ref: Array Sorting Functions-Footnote-1851581
+Node: Two-way I/O851777
+Ref: Two-way I/O-Footnote-1859498
+Ref: Two-way I/O-Footnote-2859685
+Node: TCP/IP Networking859767
+Node: Profiling862885
+Node: Advanced Features Summary872199
+Node: Internationalization874043
+Node: I18N and L10N875523
+Node: Explaining gettext876210
+Ref: Explaining gettext-Footnote-1882102
+Ref: Explaining gettext-Footnote-2882287
+Node: Programmer i18n882452
+Ref: Programmer i18n-Footnote-1887401
+Node: Translator i18n887450
+Node: String Extraction888244
+Ref: String Extraction-Footnote-1889376
+Node: Printf Ordering889462
+Ref: Printf Ordering-Footnote-1892248
+Node: I18N Portability892312
+Ref: I18N Portability-Footnote-1894768
+Node: I18N Example894831
+Ref: I18N Example-Footnote-1898106
+Ref: I18N Example-Footnote-2898179
+Node: Gawk I18N898288
+Node: I18N Summary898937
+Node: Debugger900278
+Node: Debugging901278
+Node: Debugging Concepts901719
+Node: Debugging Terms903528
+Node: Awk Debugging906103
+Ref: Awk Debugging-Footnote-1907048
+Node: Sample Debugging Session907180
+Node: Debugger Invocation907714
+Node: Finding The Bug909100
+Node: List of Debugger Commands915574
+Node: Breakpoint Control916907
+Node: Debugger Execution Control920601
+Node: Viewing And Changing Data923963
+Node: Execution Stack927504
+Node: Debugger Info929141
+Node: Miscellaneous Debugger Commands933212
+Node: Readline Support938274
+Node: Limitations939170
+Node: Debugging Summary941724
+Node: Namespaces943003
+Node: Global Namespace944114
+Node: Qualified Names945512
+Node: Default Namespace946511
+Node: Changing The Namespace947252
+Node: Naming Rules948866
+Node: Internal Name Management950714
+Node: Namespace Example951756
+Node: Namespace And Features954318
+Node: Namespace Summary955753
+Node: Arbitrary Precision Arithmetic957230
+Node: Computer Arithmetic958717
+Ref: table-numeric-ranges962483
+Ref: table-floating-point-ranges962976
+Ref: Computer Arithmetic-Footnote-1963634
+Node: Math Definitions963691
+Ref: table-ieee-formats967007
+Ref: Math Definitions-Footnote-1967610
+Node: MPFR features967715
+Node: FP Math Caution969433
+Ref: FP Math Caution-Footnote-1970505
+Node: Inexactness of computations970874
+Node: Inexact representation971834
+Node: Comparing FP Values973194
+Node: Errors accumulate974435
+Node: Getting Accuracy975868
+Node: Try To Round978578
+Node: Setting precision979477
+Ref: table-predefined-precision-strings980174
+Node: Setting the rounding mode982004
+Ref: table-gawk-rounding-modes982378
+Ref: Setting the rounding mode-Footnote-1986309
+Node: Arbitrary Precision Integers986488
+Ref: Arbitrary Precision Integers-Footnote-1989663
+Node: Checking for MPFR989812
+Node: POSIX Floating Point Problems991286
+Ref: POSIX Floating Point Problems-Footnote-1995571
+Node: Floating point summary995609
+Node: Dynamic Extensions997799
+Node: Extension Intro999352
+Node: Plugin License1000618
+Node: Extension Mechanism Outline1001415
+Ref: figure-load-extension1001854
+Ref: figure-register-new-function1003419
+Ref: figure-call-new-function1004511
+Node: Extension API Description1006573
+Node: Extension API Functions Introduction1008286
+Ref: table-api-std-headers1010122
+Node: General Data Types1014371
+Ref: General Data Types-Footnote-11023001
+Node: Memory Allocation Functions1023300
+Ref: Memory Allocation Functions-Footnote-11027801
+Node: Constructor Functions1027900
+Node: API Ownership of MPFR and GMP Values1031366
+Node: Registration Functions1032679
+Node: Extension Functions1033379
+Node: Exit Callback Functions1038701
+Node: Extension Version String1039951
+Node: Input Parsers1040614
+Node: Output Wrappers1053335
+Node: Two-way processors1057847
+Node: Printing Messages1060112
+Ref: Printing Messages-Footnote-11061283
+Node: Updating ERRNO1061436
+Node: Requesting Values1062175
+Ref: table-value-types-returned1062912
+Node: Accessing Parameters1063848
+Node: Symbol Table Access1065085
+Node: Symbol table by name1065597
+Ref: Symbol table by name-Footnote-11068621
+Node: Symbol table by cookie1068749
+Ref: Symbol table by cookie-Footnote-11072934
+Node: Cached values1072998
+Ref: Cached values-Footnote-11076534
+Node: Array Manipulation1076687
+Ref: Array Manipulation-Footnote-11077778
+Node: Array Data Types1077815
+Ref: Array Data Types-Footnote-11080473
+Node: Array Functions1080565
+Node: Flattening Arrays1085063
+Node: Creating Arrays1092039
+Node: Redirection API1096806
+Node: Extension API Variables1099639
+Node: Extension Versioning1100350
+Ref: gawk-api-version1100779
+Node: Extension GMP/MPFR Versioning1102510
+Node: Extension API Informational Variables1104138
+Node: Extension API Boilerplate1105211
+Node: Changes from API V11109185
+Node: Finding Extensions1110757
+Node: Extension Example1111316
+Node: Internal File Description1112114
+Node: Internal File Ops1116194
+Ref: Internal File Ops-Footnote-11127544
+Node: Using Internal File Ops1127684
+Ref: Using Internal File Ops-Footnote-11130067
+Node: Extension Samples1130341
+Node: Extension Sample File Functions1131870
+Node: Extension Sample Fnmatch1139519
+Node: Extension Sample Fork1141006
+Node: Extension Sample Inplace1142224
+Node: Extension Sample Ord1145849
+Node: Extension Sample Readdir1146685
+Ref: table-readdir-file-types1147574
+Node: Extension Sample Revout1148641
+Node: Extension Sample Rev2way1149230
+Node: Extension Sample Read write array1149970
+Node: Extension Sample Readfile1151912
+Node: Extension Sample Time1153007
+Node: Extension Sample API Tests1154759
+Node: gawkextlib1155251
+Node: Extension summary1158169
+Node: Extension Exercises1161871
+Node: Language History1163113
+Node: V7/SVR3.11164769
+Node: SVR41166921
+Node: POSIX1168355
+Node: BTL1169736
+Node: POSIX/GNU1170465
+Node: Feature History1176243
+Node: Common Extensions1192562
+Node: Ranges and Locales1193845
+Ref: Ranges and Locales-Footnote-11198461
+Ref: Ranges and Locales-Footnote-21198488
+Ref: Ranges and Locales-Footnote-31198723
+Node: Contributors1198946
+Node: History summary1204943
+Node: Installation1206323
+Node: Gawk Distribution1207267
+Node: Getting1207751
+Node: Extracting1208714
+Node: Distribution contents1210352
+Node: Unix Installation1216832
+Node: Quick Installation1217514
+Node: Shell Startup Files1219928
+Node: Additional Configuration Options1221017
+Node: Configuration Philosophy1223332
+Node: Non-Unix Installation1225701
+Node: PC Installation1226161
+Node: PC Binary Installation1226999
+Node: PC Compiling1227434
+Node: PC Using1228551
+Node: Cygwin1232104
+Node: MSYS1233328
+Node: VMS Installation1233930
+Node: VMS Compilation1234721
+Ref: VMS Compilation-Footnote-11235950
+Node: VMS Dynamic Extensions1236008
+Node: VMS Installation Details1237693
+Node: VMS Running1239946
+Node: VMS GNV1244225
+Node: VMS Old Gawk1244960
+Node: Bugs1245431
+Node: Bug address1246094
+Node: Usenet1249076
+Node: Maintainers1250080
+Node: Other Versions1251265
+Node: Installation summary1258353
+Node: Notes1259562
+Node: Compatibility Mode1260356
+Node: Additions1261138
+Node: Accessing The Source1262063
+Node: Adding Code1263500
+Node: New Ports1269719
+Node: Derived Files1274094
+Ref: Derived Files-Footnote-11279754
+Ref: Derived Files-Footnote-21279789
+Ref: Derived Files-Footnote-31280387
+Node: Future Extensions1280501
+Node: Implementation Limitations1281159
+Node: Extension Design1282369
+Node: Old Extension Problems1283513
+Ref: Old Extension Problems-Footnote-11285031
+Node: Extension New Mechanism Goals1285088
+Ref: Extension New Mechanism Goals-Footnote-11288452
+Node: Extension Other Design Decisions1288641
+Node: Extension Future Growth1290754
+Node: Notes summary1291360
+Node: Basic Concepts1292518
+Node: Basic High Level1293199
+Ref: figure-general-flow1293481
+Ref: figure-process-flow1294166
+Ref: Basic High Level-Footnote-11297467
+Node: Basic Data Typing1297652
+Node: Glossary1300980
+Node: Copying1332865
+Node: GNU Free Documentation License1370408
+Node: Index1395528
End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 8057350..8a2917e 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -25321,9 +25321,25 @@ of picking the input line apart by characters.
@cindex searching @subentry files for regular expressions
@cindex files @subentry searching for regular expressions
@cindex @command{egrep} utility
-The @command{egrep} utility searches files for patterns. It uses regular
-expressions that are almost identical to those available in @command{awk}
-(@pxref{Regexp}).
+The @command{grep} family of programs searches files for patterns.
+These programs have an unusual history.
+Initially there was @command{grep} (Global Regular Expression Print),
+which used what are now called Basic Regular Expressions (BREs).
+Later there was @command{egrep} (Extended @command{grep}) which used
+what are now called Extended Regular Expressions (EREs). (These are almost
+identical to those available in @command{awk}; @pxref{Regexp}).
+There was also @command{fgrep} (Fast @command{grep}), which searched
+for matches of one more fixed strings.
+
+POSIX chose to combine these three programs into one, simply named
+@command{grep}. On a POSIX system, @command{grep}'s default behavior
+is to search using BREs. You use @command{-E} to specify the use
+of EREs, and @option{-F} to specify searching for fixed strings.
+
+In practice, systems continue to come with separate @command{egrep}
+and @command{fgrep} utilities, for backwards compatibilty. This
+@value{SECTION} provides an @command{awk} implementation of @command{egrep},
+which supports all of the POSIX-mandated options.
You invoke it as follows:
@display
@@ -25341,17 +25357,12 @@ The options to @command{egrep} are as follows:
@table @code
@item -c
-Print out a count of the lines that matched the pattern, instead of the
+Print a count of the lines that matched the pattern, instead of the
lines themselves.
-@item -s
-Be silent. No output is produced and the exit value indicates whether
-the pattern was matched.
-
-@item -v
-Invert the sense of the test. @command{egrep} prints the lines that do
-@emph{not} match the pattern and exits successfully if the pattern is not
-matched.
+@item -e @var{pattern}
+Use @var{pattern} as the regexp to match. The purpose of the @option{-e}
+option is to allow patterns that start with a @samp{-}.
@item -i
Ignore case distinctions in both the pattern and the input data.
@@ -25359,17 +25370,30 @@ Ignore case distinctions in both the pattern and the
input data.
@item -l
Only print (list) the names of the files that matched, not the lines that
matched.
-@item -e @var{pattern}
-Use @var{pattern} as the regexp to match. The purpose of the @option{-e}
-option is to allow patterns that start with a @samp{-}.
+@item -q
+Be quiet. No output is produced and the exit value indicates whether
+the pattern was matched.
+
+@item -s
+Be silent. Do not print error messages for files that could
+not be opened.
+
+@item -v
+Invert the sense of the test. @command{egrep} prints the lines that do
+@emph{not} match the pattern and exits successfully if the pattern is not
+matched.
+
+@item -x
+Match the entire input line in order to consider the match as having
+succeeded.
@end table
This version uses the @code{getopt()} library function
-(@pxref{Getopt Function})
-and the file transition library program
-(@pxref{Filetrans Function}).
+(@pxref{Getopt Function}) and @command{gawk}'s
+@code{BEGINFILE} and @code{ENDFILE} special patterns
+(@pxref{BEGINFILE/ENDFILE}).
-The program begins with a descriptive comment and then a @code{BEGIN} rule
+The program begins with descriptive comments and then a @code{BEGIN} rule
that processes the command-line arguments with @code{getopt()}. The
@option{-i}
(ignore case) option is particularly easy with @command{gawk}; we just use the
@code{IGNORECASE} predefined variable
@@ -25385,43 +25409,63 @@ that processes the command-line arguments with
@code{getopt()}. The @option{-i}
@c file eg/prog/egrep.awk
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# May 1993
+# Revised September 2020
@c endfile
@end ignore
@c file eg/prog/egrep.awk
# Options:
# -c count of lines
-# -s silent - use exit value
-# -v invert test, success if no match
+# -e argument is pattern
# -i ignore case
# -l print filenames only
-# -e argument is pattern
+# -n add line number to output
+# -q quiet - use exit value
+# -s silent - don't print errors
+# -v invert test, success if no match
+# -x the entire line must match
#
-# Requires getopt and file transition library functions
+# Requires getopt library function
+# Uses IGNORECASE, BEGINFILE and ENDFILE
+# Invoke using gawk -f egrep.awk -- options ...
BEGIN @{
- while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{
+ while ((c = getopt(ARGC, ARGV, "ce:ilnqsvx")) != -1) @{
if (c == "c")
count_only++
- else if (c == "s")
- no_print++
- else if (c == "v")
- invert++
+ else if (c == "e")
+ pattern = Optarg
else if (c == "i")
IGNORECASE = 1
else if (c == "l")
filenames_only++
- else if (c == "e")
- pattern = Optarg
+ else if (c == "n")
+ line_numbers++
+ else if (c == "q")
+ no_print++
+ else if (c == "s")
+ no_errors++
+ else if (c == "v")
+ invert++
+ else if (c == "x")
+ full_line++
else
usage()
@}
@c endfile
@end example
+@noindent
+Note the comment about invocation: Because several of the options overlap
+with @command{gawk}'s, a @option{--} is needed to tell @command{gawk}
+to stop looking for options.
+
Next comes the code that handles the @command{egrep}-specific behavior. If no
pattern is supplied with @option{-e}, the first nonoption on the
-command line is used. The @command{awk} command-line arguments up to
@code{ARGV[Optind]}
+command line is used.
+If the pattern is empty, that means no pattern was supplied, so it's
+necessary to print an error message and exit.
+The @command{awk} command-line arguments up to @code{ARGV[Optind]}
are cleared, so that @command{awk} won't try to process them as files. If no
files are specified, the standard input is used, and if multiple files are
specified, we make sure to note this so that the @value{FN}s can precede the
@@ -25432,58 +25476,42 @@ matched lines in the output:
if (pattern == "")
pattern = ARGV[Optind++]
+ if (pattern == "")
+ usage()
+
for (i = 1; i < Optind; i++)
ARGV[i] = ""
+
if (Optind >= ARGC) @{
ARGV[1] = "-"
ARGC = 2
@} else if (ARGC - Optind > 1)
do_filenames++
-
-# if (IGNORECASE)
-# pattern = tolower(pattern)
@}
@c endfile
@end example
-The last two lines are commented out, as they are not needed in
-@command{gawk}. They should be uncommented if you have to use another version
-of @command{awk}.
-
-The next set of lines should be uncommented if you are not using
-@command{gawk}. This rule translates all the characters in the input line
-into lowercase if the @option{-i} option is specified.@footnote{It
-also introduces a subtle bug;
-if a match happens, we output the translated line, not the original.}
-The rule is
-commented out as it is not necessary with @command{gawk}:
-
-@example
-@c file eg/prog/egrep.awk
-#@{
-# if (IGNORECASE)
-# $0 = tolower($0)
-#@}
-@c endfile
-@end example
-
-The @code{beginfile()} function is called by the rule in @file{ftrans.awk}
-when each new file is processed. In this case, it is very simple; all it
-does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
+The @code{BEGINFILE} rule executes
+when each new file is processed. In this case, it is fairly simple; it
+initializes a variable @code{fcount} to zero. @code{fcount} tracks
how many lines in the current file matched the pattern.
-Naming the parameter @code{junk} shows we know that @code{beginfile()}
-is called with a parameter, but that we're not interested in its value:
+
+Here also is where we implement the @option{-s} option. We check
+if @code{ERRNO} has been set, and if @option{-s} was supplied.
+In that case, it's necessary to move on to the next file. Otherwise
+@command{gawk} would exit with an error:
@example
@c file eg/prog/egrep.awk
-function beginfile(junk)
-@{
+BEGINFILE @{
fcount = 0
+ if (ERRNO && no_errors)
+ nextfile
@}
@c endfile
@end example
-The @code{endfile()} function is called after each file has been processed.
+The @code{ENDFILE} rule executes after each file has been processed.
It affects the output only when the user wants a count of the number of lines
that
matched. @code{no_print} is true only if the exit status is desired.
@code{count_only} is true if line counts are desired. @command{egrep}
@@ -25494,8 +25522,7 @@ know the total number of lines that matched the pattern:
@example
@c file eg/prog/egrep.awk
-function endfile(file)
-@{
+ENDFILE @{
if (! no_print && count_only) @{
if (do_filenames)
print file ":" fcount
@@ -25510,18 +25537,19 @@ function endfile(file)
@c endfile
@end example
-The @code{BEGINFILE} and @code{ENDFILE} special patterns
-(@pxref{BEGINFILE/ENDFILE}) could be used, but then the program would be
-@command{gawk}-specific. Additionally, this example was written before
-@command{gawk} acquired @code{BEGINFILE} and @code{ENDFILE}.
-
The following rule does most of the work of matching lines. The variable
-@code{matches} is true if the line matched the pattern. If the user
+@code{matches} is true (non-zero) if the line matched the pattern.
+If the user specified that the entire line must match (with @option{-x}),
+the code checks this condition by looking at the values of
+@code{RSTART} and @code{RLENGTH}. If those indicate that the match
+is not over the full line, @code{matches} is set to zero (false).
+
+If the user
wants lines that did not match, the sense of @code{matches} is inverted
using the @samp{!} operator. @code{fcount} is incremented with the value of
@code{matches}, which is either one or zero, depending upon a
successful or unsuccessful match. If the line does not match, the
-@code{next} statement just moves on to the next record.
+@code{next} statement just moves on to the next input line.
A number of additional tests are made, but they are only done if we
are not counting lines. First, if the user only wants the exit status
@@ -25529,7 +25557,8 @@ are not counting lines. First, if the user only wants
the exit status
line in this file matched, and we can skip on to the next file with
@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can
print the @value{FN}, and then skip to the next file with @code{nextfile}.
-Finally, each line is printed, with a leading @value{FN} and colon
+Finally, each line is printed, with a leading @value{FN},
+optional colon and line number, and the final colon
if necessary:
@cindex @code{!} (exclamation point) @subentry @code{!} operator
@@ -25537,7 +25566,10 @@ if necessary:
@example
@c file eg/prog/egrep.awk
@{
- matches = ($0 ~ pattern)
+ matches = match($0, pattern)
+ if (matches && full_line && (RSTART != 1 || RLENGTH != length()))
+ matches = 0
+
if (invert)
matches = ! matches
@@ -25556,7 +25588,10 @@ if necessary:
@}
if (do_filenames)
- print FILENAME ":" $0
+ if (line_numbers)
+ print FILENAME ":" FNR ":" $0
+ else
+ print FILENAME ":" $0
else
print
@}
@@ -25582,14 +25617,13 @@ and then exits:
@c file eg/prog/egrep.awk
function usage()
@{
- print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
- print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
+ print("Usage:\tegrep [-cilnqsvx] [-e pat] [files ...]") > "/dev/stderr"
+ print("\tegrep [-cilnqsvx] pat [files ...]") > "/dev/stderr"
exit 1
@}
@c endfile
@end example
-
@node Id Program
@subsection Printing Out User Information
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 5a11434..cc249fd 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -24331,9 +24331,25 @@ of picking the input line apart by characters.
@cindex searching @subentry files for regular expressions
@cindex files @subentry searching for regular expressions
@cindex @command{egrep} utility
-The @command{egrep} utility searches files for patterns. It uses regular
-expressions that are almost identical to those available in @command{awk}
-(@pxref{Regexp}).
+The @command{grep} family of programs searches files for patterns.
+These programs have an unusual history.
+Initially there was @command{grep} (Global Regular Expression Print),
+which used what are now called Basic Regular Expressions (BREs).
+Later there was @command{egrep} (Extended @command{grep}) which used
+what are now called Extended Regular Expressions (EREs). (These are almost
+identical to those available in @command{awk}; @pxref{Regexp}).
+There was also @command{fgrep} (Fast @command{grep}), which searched
+for matches of one more fixed strings.
+
+POSIX chose to combine these three programs into one, simply named
+@command{grep}. On a POSIX system, @command{grep}'s default behavior
+is to search using BREs. You use @command{-E} to specify the use
+of EREs, and @option{-F} to specify searching for fixed strings.
+
+In practice, systems continue to come with separate @command{egrep}
+and @command{fgrep} utilities, for backwards compatibilty. This
+@value{SECTION} provides an @command{awk} implementation of @command{egrep},
+which supports all of the POSIX-mandated options.
You invoke it as follows:
@display
@@ -24351,17 +24367,12 @@ The options to @command{egrep} are as follows:
@table @code
@item -c
-Print out a count of the lines that matched the pattern, instead of the
+Print a count of the lines that matched the pattern, instead of the
lines themselves.
-@item -s
-Be silent. No output is produced and the exit value indicates whether
-the pattern was matched.
-
-@item -v
-Invert the sense of the test. @command{egrep} prints the lines that do
-@emph{not} match the pattern and exits successfully if the pattern is not
-matched.
+@item -e @var{pattern}
+Use @var{pattern} as the regexp to match. The purpose of the @option{-e}
+option is to allow patterns that start with a @samp{-}.
@item -i
Ignore case distinctions in both the pattern and the input data.
@@ -24369,17 +24380,30 @@ Ignore case distinctions in both the pattern and the
input data.
@item -l
Only print (list) the names of the files that matched, not the lines that
matched.
-@item -e @var{pattern}
-Use @var{pattern} as the regexp to match. The purpose of the @option{-e}
-option is to allow patterns that start with a @samp{-}.
+@item -q
+Be quiet. No output is produced and the exit value indicates whether
+the pattern was matched.
+
+@item -s
+Be silent. Do not print error messages for files that could
+not be opened.
+
+@item -v
+Invert the sense of the test. @command{egrep} prints the lines that do
+@emph{not} match the pattern and exits successfully if the pattern is not
+matched.
+
+@item -x
+Match the entire input line in order to consider the match as having
+succeeded.
@end table
This version uses the @code{getopt()} library function
-(@pxref{Getopt Function})
-and the file transition library program
-(@pxref{Filetrans Function}).
+(@pxref{Getopt Function}) and @command{gawk}'s
+@code{BEGINFILE} and @code{ENDFILE} special patterns
+(@pxref{BEGINFILE/ENDFILE}).
-The program begins with a descriptive comment and then a @code{BEGIN} rule
+The program begins with descriptive comments and then a @code{BEGIN} rule
that processes the command-line arguments with @code{getopt()}. The
@option{-i}
(ignore case) option is particularly easy with @command{gawk}; we just use the
@code{IGNORECASE} predefined variable
@@ -24395,43 +24419,63 @@ that processes the command-line arguments with
@code{getopt()}. The @option{-i}
@c file eg/prog/egrep.awk
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# May 1993
+# Revised September 2020
@c endfile
@end ignore
@c file eg/prog/egrep.awk
# Options:
# -c count of lines
-# -s silent - use exit value
-# -v invert test, success if no match
+# -e argument is pattern
# -i ignore case
# -l print filenames only
-# -e argument is pattern
+# -n add line number to output
+# -q quiet - use exit value
+# -s silent - don't print errors
+# -v invert test, success if no match
+# -x the entire line must match
#
-# Requires getopt and file transition library functions
+# Requires getopt library function
+# Uses IGNORECASE, BEGINFILE and ENDFILE
+# Invoke using gawk -f egrep.awk -- options ...
BEGIN @{
- while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) @{
+ while ((c = getopt(ARGC, ARGV, "ce:ilnqsvx")) != -1) @{
if (c == "c")
count_only++
- else if (c == "s")
- no_print++
- else if (c == "v")
- invert++
+ else if (c == "e")
+ pattern = Optarg
else if (c == "i")
IGNORECASE = 1
else if (c == "l")
filenames_only++
- else if (c == "e")
- pattern = Optarg
+ else if (c == "n")
+ line_numbers++
+ else if (c == "q")
+ no_print++
+ else if (c == "s")
+ no_errors++
+ else if (c == "v")
+ invert++
+ else if (c == "x")
+ full_line++
else
usage()
@}
@c endfile
@end example
+@noindent
+Note the comment about invocation: Because several of the options overlap
+with @command{gawk}'s, a @option{--} is needed to tell @command{gawk}
+to stop looking for options.
+
Next comes the code that handles the @command{egrep}-specific behavior. If no
pattern is supplied with @option{-e}, the first nonoption on the
-command line is used. The @command{awk} command-line arguments up to
@code{ARGV[Optind]}
+command line is used.
+If the pattern is empty, that means no pattern was supplied, so it's
+necessary to print an error message and exit.
+The @command{awk} command-line arguments up to @code{ARGV[Optind]}
are cleared, so that @command{awk} won't try to process them as files. If no
files are specified, the standard input is used, and if multiple files are
specified, we make sure to note this so that the @value{FN}s can precede the
@@ -24442,58 +24486,42 @@ matched lines in the output:
if (pattern == "")
pattern = ARGV[Optind++]
+ if (pattern == "")
+ usage()
+
for (i = 1; i < Optind; i++)
ARGV[i] = ""
+
if (Optind >= ARGC) @{
ARGV[1] = "-"
ARGC = 2
@} else if (ARGC - Optind > 1)
do_filenames++
-
-# if (IGNORECASE)
-# pattern = tolower(pattern)
@}
@c endfile
@end example
-The last two lines are commented out, as they are not needed in
-@command{gawk}. They should be uncommented if you have to use another version
-of @command{awk}.
-
-The next set of lines should be uncommented if you are not using
-@command{gawk}. This rule translates all the characters in the input line
-into lowercase if the @option{-i} option is specified.@footnote{It
-also introduces a subtle bug;
-if a match happens, we output the translated line, not the original.}
-The rule is
-commented out as it is not necessary with @command{gawk}:
-
-@example
-@c file eg/prog/egrep.awk
-#@{
-# if (IGNORECASE)
-# $0 = tolower($0)
-#@}
-@c endfile
-@end example
-
-The @code{beginfile()} function is called by the rule in @file{ftrans.awk}
-when each new file is processed. In this case, it is very simple; all it
-does is initialize a variable @code{fcount} to zero. @code{fcount} tracks
+The @code{BEGINFILE} rule executes
+when each new file is processed. In this case, it is fairly simple; it
+initializes a variable @code{fcount} to zero. @code{fcount} tracks
how many lines in the current file matched the pattern.
-Naming the parameter @code{junk} shows we know that @code{beginfile()}
-is called with a parameter, but that we're not interested in its value:
+
+Here also is where we implement the @option{-s} option. We check
+if @code{ERRNO} has been set, and if @option{-s} was supplied.
+In that case, it's necessary to move on to the next file. Otherwise
+@command{gawk} would exit with an error:
@example
@c file eg/prog/egrep.awk
-function beginfile(junk)
-@{
+BEGINFILE @{
fcount = 0
+ if (ERRNO && no_errors)
+ nextfile
@}
@c endfile
@end example
-The @code{endfile()} function is called after each file has been processed.
+The @code{ENDFILE} rule executes after each file has been processed.
It affects the output only when the user wants a count of the number of lines
that
matched. @code{no_print} is true only if the exit status is desired.
@code{count_only} is true if line counts are desired. @command{egrep}
@@ -24504,8 +24532,7 @@ know the total number of lines that matched the pattern:
@example
@c file eg/prog/egrep.awk
-function endfile(file)
-@{
+ENDFILE @{
if (! no_print && count_only) @{
if (do_filenames)
print file ":" fcount
@@ -24520,18 +24547,19 @@ function endfile(file)
@c endfile
@end example
-The @code{BEGINFILE} and @code{ENDFILE} special patterns
-(@pxref{BEGINFILE/ENDFILE}) could be used, but then the program would be
-@command{gawk}-specific. Additionally, this example was written before
-@command{gawk} acquired @code{BEGINFILE} and @code{ENDFILE}.
-
The following rule does most of the work of matching lines. The variable
-@code{matches} is true if the line matched the pattern. If the user
+@code{matches} is true (non-zero) if the line matched the pattern.
+If the user specified that the entire line must match (with @option{-x}),
+the code checks this condition by looking at the values of
+@code{RSTART} and @code{RLENGTH}. If those indicate that the match
+is not over the full line, @code{matches} is set to zero (false).
+
+If the user
wants lines that did not match, the sense of @code{matches} is inverted
using the @samp{!} operator. @code{fcount} is incremented with the value of
@code{matches}, which is either one or zero, depending upon a
successful or unsuccessful match. If the line does not match, the
-@code{next} statement just moves on to the next record.
+@code{next} statement just moves on to the next input line.
A number of additional tests are made, but they are only done if we
are not counting lines. First, if the user only wants the exit status
@@ -24539,7 +24567,8 @@ are not counting lines. First, if the user only wants
the exit status
line in this file matched, and we can skip on to the next file with
@code{nextfile}. Similarly, if we are only printing @value{FN}s, we can
print the @value{FN}, and then skip to the next file with @code{nextfile}.
-Finally, each line is printed, with a leading @value{FN} and colon
+Finally, each line is printed, with a leading @value{FN},
+optional colon and line number, and the final colon
if necessary:
@cindex @code{!} (exclamation point) @subentry @code{!} operator
@@ -24547,7 +24576,10 @@ if necessary:
@example
@c file eg/prog/egrep.awk
@{
- matches = ($0 ~ pattern)
+ matches = match($0, pattern)
+ if (matches && full_line && (RSTART != 1 || RLENGTH != length()))
+ matches = 0
+
if (invert)
matches = ! matches
@@ -24566,7 +24598,10 @@ if necessary:
@}
if (do_filenames)
- print FILENAME ":" $0
+ if (line_numbers)
+ print FILENAME ":" FNR ":" $0
+ else
+ print FILENAME ":" $0
else
print
@}
@@ -24592,14 +24627,13 @@ and then exits:
@c file eg/prog/egrep.awk
function usage()
@{
- print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
- print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
+ print("Usage:\tegrep [-cilnqsvx] [-e pat] [files ...]") > "/dev/stderr"
+ print("\tegrep [-cilnqsvx] pat [files ...]") > "/dev/stderr"
exit 1
@}
@c endfile
@end example
-
@node Id Program
@subsection Printing Out User Information
-----------------------------------------------------------------------
Summary of changes:
awklib/eg/prog/egrep.awk | 67 +++--
doc/ChangeLog | 5 +
doc/gawk.info | 750 ++++++++++++++++++++++++-----------------------
doc/gawk.texi | 190 +++++++-----
doc/gawktexi.in | 190 +++++++-----
5 files changed, 660 insertions(+), 542 deletions(-)
hooks/post-receive
--
gawk
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4115-g696533f,
Arnold Robbins <=