[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4108-g374a2c1
From: |
Arnold Robbins |
Subject: |
[SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4108-g374a2c1 |
Date: |
Mon, 31 Aug 2020 04:39:07 -0400 (EDT) |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".
The branch, gawk-5.1-stable has been updated
via 374a2c195c5f2b656e503c3cc297dbcd50067f33 (commit)
from 16f6f902300a60ea9cbb6bf0b328033d11cabf8b (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=374a2c195c5f2b656e503c3cc297dbcd50067f33
commit 374a2c195c5f2b656e503c3cc297dbcd50067f33
Author: Arnold D. Robbins <arnold@skeeve.com>
Date: Mon Aug 31 11:38:49 2020 +0300
Update uniq.awk to current POSIX.
diff --git a/awklib/eg/prog/uniq.awk b/awklib/eg/prog/uniq.awk
index 7dd1609..57c98f2 100644
--- a/awklib/eg/prog/uniq.awk
+++ b/awklib/eg/prog/uniq.awk
@@ -4,23 +4,38 @@
#
# Arnold Robbins, arnold@skeeve.com, Public Domain
# May 1993
+# Updated August 2020 to current POSIX
function usage()
{
- print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr"
+ print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") >
"/dev/stderr"
exit 1
}
# -c count lines. overrides -d and -u
# -d only repeated lines
# -u only nonrepeated lines
-# -n skip n fields
-# +n skip n characters, skip fields first
+# -f n skip n fields
+# -s n skip n characters, skip fields first
+# As of 2020, '+' can be used as option character in addition to '-'
+# Previously allowed use of -N to skip fields and +N to skip
+# characters is no longer allowed, and not supported by this version.
BEGIN {
+ # Convert + to - so getopt can handle things
+ for (i = 1; i < ARGC; i++) {
+ first = substr(ARGV[i], 1, 1)
+ if (ARGV[i] == "--" || (first != "-" && first != "+"))
+ break
+ else if (first == "+")
+ # Replace "+" with "-"
+ ARGV[i] = "-" substr(ARGV[i], 2)
+ }
+}
+BEGIN {
count = 1
outputfile = "/dev/stdout"
- opts = "udc0:1:2:3:4:5:6:7:8:9:"
+ opts = "udcf:s:"
while ((c = getopt(ARGC, ARGV, opts)) != -1) {
if (c == "u")
non_repeated_only++
@@ -28,24 +43,14 @@ BEGIN {
repeated_only++
else if (c == "c")
do_count++
- else if (index("0123456789", c) != 0) {
- # getopt() requires args to options
- # this messes us up for things like -5
- if (Optarg ~ /^[[:digit:]]+$/)
- fcount = (c Optarg) + 0
- else {
- fcount = c + 0
- Optind--
- }
- } else
+ else if (c == "f")
+ fcount = Optarg + 0
+ else if (c == "s")
+ charcount = Optarg + 0
+ else
usage()
}
- if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) {
- charcount = substr(ARGV[Optind], 2) + 0
- Optind++
- }
-
for (i = 1; i < Optind; i++)
ARGV[i] = ""
diff --git a/doc/ChangeLog b/doc/ChangeLog
index dd40093..c70ca85 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2020-08-31 Arnold D. Robbins <arnold@skeeve.com>
+
+ * gawktexi.in (Uniq Program): Updated uniq.awk to follow 2020 POSIX.
+
2020-08-26 Arnold D. Robbins <arnold@skeeve.com>
* gawktexi.in: Fix some small mistakes / typos.
diff --git a/doc/gawk.info b/doc/gawk.info
index ad52db0..9d5d4ef 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -18399,7 +18399,7 @@ by default removes duplicate lines. In other words, it
only prints
unique lines--hence the name. 'uniq' has a number of options. The
usage is as follows:
- 'uniq' ['-udc' ['-N']] ['+N'] [INPUTFILE [OUTPUTFILE]]
+ 'uniq' ['-udc' ['-f N'] ['-s N']] [INPUTFILE [OUTPUTFILE]]
The options for 'uniq' are:
@@ -18413,14 +18413,14 @@ usage is as follows:
Count lines. This option overrides '-d' and '-u'. Both repeated
and nonrepeated lines are counted.
-'-N'
+'-f N'
Skip N fields before comparing lines. The definition of fields is
similar to 'awk''s default: nonwhitespace characters separated by
runs of spaces and/or TABs.
-'+N'
+'-s N'
Skip N characters before comparing lines. Any fields specified
- with '-N' are skipped first.
+ with '-f' are skipped first.
'INPUTFILE'
Data is read from the input file named on the command line, instead
@@ -18437,21 +18437,7 @@ provided.
and the 'join()' library function (*note Join Function::).
The program begins with a 'usage()' function and then a brief outline
-of the options and their meanings in comments. The 'BEGIN' rule deals
-with the command-line arguments and options. It uses a trick to get
-'getopt()' to handle options of the form '-25', treating such an option
-as the option letter '2' with an argument of '5'. If indeed two or more
-digits are supplied ('Optarg' looks like a number), 'Optarg' is
-concatenated with the option digit and then the result is added to zero
-to make it into a number. If there is only one digit in the option,
-then 'Optarg' is not needed. In this case, 'Optind' must be decremented
-so that 'getopt()' processes it next time. This code is admittedly a
-bit tricky.
-
- If no options are supplied, then the default is taken, to print both
-repeated and nonrepeated lines. The output file, if provided, is
-assigned to 'outputfile'. Early on, 'outputfile' is initialized to the
-standard output, '/dev/stdout':
+of the options and their meanings in comments:
# uniq.awk --- do uniq in awk
#
@@ -18459,20 +18445,47 @@ standard output, '/dev/stdout':
function usage()
{
- print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr"
+ print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") >
"/dev/stderr"
exit 1
}
# -c count lines. overrides -d and -u
# -d only repeated lines
# -u only nonrepeated lines
- # -n skip n fields
- # +n skip n characters, skip fields first
+ # -f n skip n fields
+ # -s n skip n characters, skip fields first
+
+ The POSIX standard for 'uniq' allows options to start with '+' as
+well as with '-'. An initial 'BEGIN' rule traverses the arguments
+changing any leading '+' to '-' so that the 'getopt()' function can
+parse the options:
+
+ # As of 2020, '+' can be used as option character in addition to '-'
+ # Previously allowed use of -N to skip fields and +N to skip
+ # characters is no longer allowed, and not supported by this version.
+
+ BEGIN {
+ # Convert + to - so getopt can handle things
+ for (i = 1; i < ARGC; i++) {
+ first = substr(ARGV[i], 1, 1)
+ if (ARGV[i] == "--" || (first != "-" && first != "+"))
+ break
+ else if (first == "+")
+ # Replace "+" with "-"
+ ARGV[i] = "-" substr(ARGV[i], 2)
+ }
+ }
+
+ The next 'BEGIN' rule deals with the command-line arguments and
+options. If no options are supplied, then the default is taken, to
+print both repeated and nonrepeated lines. The output file, if
+provided, is assigned to 'outputfile'. Early on, 'outputfile' is
+initialized to the standard output, '/dev/stdout':
BEGIN {
count = 1
outputfile = "/dev/stdout"
- opts = "udc0:1:2:3:4:5:6:7:8:9:"
+ opts = "udcf:s:"
while ((c = getopt(ARGC, ARGV, opts)) != -1) {
if (c == "u")
non_repeated_only++
@@ -18480,24 +18493,14 @@ standard output, '/dev/stdout':
repeated_only++
else if (c == "c")
do_count++
- else if (index("0123456789", c) != 0) {
- # getopt() requires args to options
- # this messes us up for things like -5
- if (Optarg ~ /^[[:digit:]]+$/)
- fcount = (c Optarg) + 0
- else {
- fcount = c + 0
- Optind--
- }
- } else
+ else if (c == "f")
+ fcount = Optarg + 0
+ else if (c == "s")
+ charcount = Optarg + 0
+ else
usage()
}
- if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) {
- charcount = substr(ARGV[Optind], 2) + 0
- Optind++
- }
-
for (i = 1; i < Optind; i++)
ARGV[i] = ""
@@ -18610,6 +18613,20 @@ line of input data:
convention of naming global variables with a leading capital letter.
Doing that would make the program a little easier to follow.
+ The logic for choosing which lines to print represents a "state
+machine", which is "a device which can be in one of a set number of
+stable conditions depending on its previous condition and on the present
+values of its inputs."(1) Brian Kernighan suggests that "an alternative
+approach to state machines is to just read the input into an array, then
+use indexing. It's almost always easier code, and for most inputs where
+you would use this, just as fast." Consider how to rewrite the logic to
+follow this suggestion.
+
+ ---------- Footnotes ----------
+
+ (1) This definition is from
+<https://www.lexico.com/en/definition/state_machine>.
+
File: gawk.info, Node: Wc Program, Prev: Uniq Program, Up: Clones
@@ -37247,7 +37264,7 @@ Index
* uninitialized variables, as array subscripts: Uninitialized Subscripts.
(line 6)
* uniq utility: Uniq Program. (line 6)
-* uniq.awk program: Uniq Program. (line 65)
+* uniq.awk program: Uniq Program. (line 51)
* Unix, awk scripts and: Executable Scripts. (line 6)
* Unix: Glossary. (line 747)
* Unix awk, backslashes in escape sequences: Escape Sequences.
@@ -37744,267 +37761,268 @@ Node: Split Program749505
Ref: Split Program-Footnote-1752963
Node: Tee Program753092
Node: Uniq Program755882
-Node: Wc Program763503
-Ref: Wc Program-Footnote-1767758
-Node: Miscellaneous Programs767852
-Node: Dupword Program769065
-Node: Alarm Program771095
-Node: Translate Program775950
-Ref: Translate Program-Footnote-1780515
-Node: Labels Program780785
-Ref: Labels Program-Footnote-1784136
-Node: Word Sorting784220
-Node: History Sorting788292
-Node: Extract Program790517
-Node: Simple Sed798571
-Node: Igawk Program801645
-Ref: Igawk Program-Footnote-1815976
-Ref: Igawk Program-Footnote-2816178
-Ref: Igawk Program-Footnote-3816300
-Node: Anagram Program816415
-Node: Signature Program819477
-Node: Programs Summary820724
-Node: Programs Exercises821938
-Ref: Programs Exercises-Footnote-1826067
-Node: Advanced Features826158
-Node: Nondecimal Data828148
-Node: Array Sorting829739
-Node: Controlling Array Traversal830439
-Ref: Controlling Array Traversal-Footnote-1838807
-Node: Array Sorting Functions838925
-Ref: Array Sorting Functions-Footnote-1844016
-Node: Two-way I/O844212
-Ref: Two-way I/O-Footnote-1851933
-Ref: Two-way I/O-Footnote-2852120
-Node: TCP/IP Networking852202
-Node: Profiling855320
-Node: Advanced Features Summary864634
-Node: Internationalization866478
-Node: I18N and L10N867958
-Node: Explaining gettext868645
-Ref: Explaining gettext-Footnote-1874537
-Ref: Explaining gettext-Footnote-2874722
-Node: Programmer i18n874887
-Ref: Programmer i18n-Footnote-1879836
-Node: Translator i18n879885
-Node: String Extraction880679
-Ref: String Extraction-Footnote-1881811
-Node: Printf Ordering881897
-Ref: Printf Ordering-Footnote-1884683
-Node: I18N Portability884747
-Ref: I18N Portability-Footnote-1887203
-Node: I18N Example887266
-Ref: I18N Example-Footnote-1890541
-Ref: I18N Example-Footnote-2890614
-Node: Gawk I18N890723
-Node: I18N Summary891372
-Node: Debugger892713
-Node: Debugging893713
-Node: Debugging Concepts894154
-Node: Debugging Terms895963
-Node: Awk Debugging898538
-Ref: Awk Debugging-Footnote-1899483
-Node: Sample Debugging Session899615
-Node: Debugger Invocation900149
-Node: Finding The Bug901535
-Node: List of Debugger Commands908009
-Node: Breakpoint Control909342
-Node: Debugger Execution Control913036
-Node: Viewing And Changing Data916398
-Node: Execution Stack919939
-Node: Debugger Info921576
-Node: Miscellaneous Debugger Commands925647
-Node: Readline Support930709
-Node: Limitations931605
-Node: Debugging Summary934159
-Node: Namespaces935438
-Node: Global Namespace936549
-Node: Qualified Names937947
-Node: Default Namespace938946
-Node: Changing The Namespace939687
-Node: Naming Rules941301
-Node: Internal Name Management943149
-Node: Namespace Example944191
-Node: Namespace And Features946753
-Node: Namespace Summary948188
-Node: Arbitrary Precision Arithmetic949665
-Node: Computer Arithmetic951152
-Ref: table-numeric-ranges954918
-Ref: table-floating-point-ranges955411
-Ref: Computer Arithmetic-Footnote-1956069
-Node: Math Definitions956126
-Ref: table-ieee-formats959442
-Ref: Math Definitions-Footnote-1960045
-Node: MPFR features960150
-Node: FP Math Caution961868
-Ref: FP Math Caution-Footnote-1962940
-Node: Inexactness of computations963309
-Node: Inexact representation964269
-Node: Comparing FP Values965629
-Node: Errors accumulate966870
-Node: Getting Accuracy968303
-Node: Try To Round971013
-Node: Setting precision971912
-Ref: table-predefined-precision-strings972609
-Node: Setting the rounding mode974439
-Ref: table-gawk-rounding-modes974813
-Ref: Setting the rounding mode-Footnote-1978744
-Node: Arbitrary Precision Integers978923
-Ref: Arbitrary Precision Integers-Footnote-1982098
-Node: Checking for MPFR982247
-Node: POSIX Floating Point Problems983721
-Ref: POSIX Floating Point Problems-Footnote-1988006
-Node: Floating point summary988044
-Node: Dynamic Extensions990234
-Node: Extension Intro991787
-Node: Plugin License993053
-Node: Extension Mechanism Outline993850
-Ref: figure-load-extension994289
-Ref: figure-register-new-function995854
-Ref: figure-call-new-function996946
-Node: Extension API Description999008
-Node: Extension API Functions Introduction1000721
-Ref: table-api-std-headers1002557
-Node: General Data Types1006806
-Ref: General Data Types-Footnote-11015436
-Node: Memory Allocation Functions1015735
-Ref: Memory Allocation Functions-Footnote-11020236
-Node: Constructor Functions1020335
-Node: API Ownership of MPFR and GMP Values1023801
-Node: Registration Functions1025114
-Node: Extension Functions1025814
-Node: Exit Callback Functions1031136
-Node: Extension Version String1032386
-Node: Input Parsers1033049
-Node: Output Wrappers1045770
-Node: Two-way processors1050282
-Node: Printing Messages1052547
-Ref: Printing Messages-Footnote-11053718
-Node: Updating ERRNO1053871
-Node: Requesting Values1054610
-Ref: table-value-types-returned1055347
-Node: Accessing Parameters1056283
-Node: Symbol Table Access1057520
-Node: Symbol table by name1058032
-Ref: Symbol table by name-Footnote-11061056
-Node: Symbol table by cookie1061184
-Ref: Symbol table by cookie-Footnote-11065369
-Node: Cached values1065433
-Ref: Cached values-Footnote-11068969
-Node: Array Manipulation1069122
-Ref: Array Manipulation-Footnote-11070213
-Node: Array Data Types1070250
-Ref: Array Data Types-Footnote-11072908
-Node: Array Functions1073000
-Node: Flattening Arrays1077498
-Node: Creating Arrays1084474
-Node: Redirection API1089241
-Node: Extension API Variables1092074
-Node: Extension Versioning1092785
-Ref: gawk-api-version1093214
-Node: Extension GMP/MPFR Versioning1094945
-Node: Extension API Informational Variables1096573
-Node: Extension API Boilerplate1097646
-Node: Changes from API V11101620
-Node: Finding Extensions1103192
-Node: Extension Example1103751
-Node: Internal File Description1104549
-Node: Internal File Ops1108629
-Ref: Internal File Ops-Footnote-11119979
-Node: Using Internal File Ops1120119
-Ref: Using Internal File Ops-Footnote-11122502
-Node: Extension Samples1122776
-Node: Extension Sample File Functions1124305
-Node: Extension Sample Fnmatch1131954
-Node: Extension Sample Fork1133441
-Node: Extension Sample Inplace1134659
-Node: Extension Sample Ord1138284
-Node: Extension Sample Readdir1139120
-Ref: table-readdir-file-types1140009
-Node: Extension Sample Revout1141076
-Node: Extension Sample Rev2way1141665
-Node: Extension Sample Read write array1142405
-Node: Extension Sample Readfile1144347
-Node: Extension Sample Time1145442
-Node: Extension Sample API Tests1147194
-Node: gawkextlib1147686
-Node: Extension summary1150604
-Node: Extension Exercises1154306
-Node: Language History1155548
-Node: V7/SVR3.11157204
-Node: SVR41159356
-Node: POSIX1160790
-Node: BTL1162171
-Node: POSIX/GNU1162900
-Node: Feature History1168678
-Node: Common Extensions1184997
-Node: Ranges and Locales1186280
-Ref: Ranges and Locales-Footnote-11190896
-Ref: Ranges and Locales-Footnote-21190923
-Ref: Ranges and Locales-Footnote-31191158
-Node: Contributors1191381
-Node: History summary1197378
-Node: Installation1198758
-Node: Gawk Distribution1199702
-Node: Getting1200186
-Node: Extracting1201149
-Node: Distribution contents1202787
-Node: Unix Installation1209267
-Node: Quick Installation1209949
-Node: Shell Startup Files1212363
-Node: Additional Configuration Options1213452
-Node: Configuration Philosophy1215767
-Node: Non-Unix Installation1218136
-Node: PC Installation1218596
-Node: PC Binary Installation1219434
-Node: PC Compiling1219869
-Node: PC Using1220986
-Node: Cygwin1224539
-Node: MSYS1225763
-Node: VMS Installation1226365
-Node: VMS Compilation1227156
-Ref: VMS Compilation-Footnote-11228385
-Node: VMS Dynamic Extensions1228443
-Node: VMS Installation Details1230128
-Node: VMS Running1232381
-Node: VMS GNV1236660
-Node: VMS Old Gawk1237395
-Node: Bugs1237866
-Node: Bug address1238529
-Node: Usenet1241511
-Node: Maintainers1242515
-Node: Other Versions1243700
-Node: Installation summary1250788
-Node: Notes1251997
-Node: Compatibility Mode1252791
-Node: Additions1253573
-Node: Accessing The Source1254498
-Node: Adding Code1255935
-Node: New Ports1262154
-Node: Derived Files1266529
-Ref: Derived Files-Footnote-11272189
-Ref: Derived Files-Footnote-21272224
-Ref: Derived Files-Footnote-31272822
-Node: Future Extensions1272936
-Node: Implementation Limitations1273594
-Node: Extension Design1274804
-Node: Old Extension Problems1275948
-Ref: Old Extension Problems-Footnote-11277466
-Node: Extension New Mechanism Goals1277523
-Ref: Extension New Mechanism Goals-Footnote-11280887
-Node: Extension Other Design Decisions1281076
-Node: Extension Future Growth1283189
-Node: Notes summary1283795
-Node: Basic Concepts1284953
-Node: Basic High Level1285634
-Ref: figure-general-flow1285916
-Ref: figure-process-flow1286601
-Ref: Basic High Level-Footnote-11289902
-Node: Basic Data Typing1290087
-Node: Glossary1293415
-Node: Copying1325300
-Node: GNU Free Documentation License1362843
-Node: Index1387963
+Ref: Uniq Program-Footnote-1764007
+Node: Wc Program764093
+Ref: Wc Program-Footnote-1768348
+Node: Miscellaneous Programs768442
+Node: Dupword Program769655
+Node: Alarm Program771685
+Node: Translate Program776540
+Ref: Translate Program-Footnote-1781105
+Node: Labels Program781375
+Ref: Labels Program-Footnote-1784726
+Node: Word Sorting784810
+Node: History Sorting788882
+Node: Extract Program791107
+Node: Simple Sed799161
+Node: Igawk Program802235
+Ref: Igawk Program-Footnote-1816566
+Ref: Igawk Program-Footnote-2816768
+Ref: Igawk Program-Footnote-3816890
+Node: Anagram Program817005
+Node: Signature Program820067
+Node: Programs Summary821314
+Node: Programs Exercises822528
+Ref: Programs Exercises-Footnote-1826657
+Node: Advanced Features826748
+Node: Nondecimal Data828738
+Node: Array Sorting830329
+Node: Controlling Array Traversal831029
+Ref: Controlling Array Traversal-Footnote-1839397
+Node: Array Sorting Functions839515
+Ref: Array Sorting Functions-Footnote-1844606
+Node: Two-way I/O844802
+Ref: Two-way I/O-Footnote-1852523
+Ref: Two-way I/O-Footnote-2852710
+Node: TCP/IP Networking852792
+Node: Profiling855910
+Node: Advanced Features Summary865224
+Node: Internationalization867068
+Node: I18N and L10N868548
+Node: Explaining gettext869235
+Ref: Explaining gettext-Footnote-1875127
+Ref: Explaining gettext-Footnote-2875312
+Node: Programmer i18n875477
+Ref: Programmer i18n-Footnote-1880426
+Node: Translator i18n880475
+Node: String Extraction881269
+Ref: String Extraction-Footnote-1882401
+Node: Printf Ordering882487
+Ref: Printf Ordering-Footnote-1885273
+Node: I18N Portability885337
+Ref: I18N Portability-Footnote-1887793
+Node: I18N Example887856
+Ref: I18N Example-Footnote-1891131
+Ref: I18N Example-Footnote-2891204
+Node: Gawk I18N891313
+Node: I18N Summary891962
+Node: Debugger893303
+Node: Debugging894303
+Node: Debugging Concepts894744
+Node: Debugging Terms896553
+Node: Awk Debugging899128
+Ref: Awk Debugging-Footnote-1900073
+Node: Sample Debugging Session900205
+Node: Debugger Invocation900739
+Node: Finding The Bug902125
+Node: List of Debugger Commands908599
+Node: Breakpoint Control909932
+Node: Debugger Execution Control913626
+Node: Viewing And Changing Data916988
+Node: Execution Stack920529
+Node: Debugger Info922166
+Node: Miscellaneous Debugger Commands926237
+Node: Readline Support931299
+Node: Limitations932195
+Node: Debugging Summary934749
+Node: Namespaces936028
+Node: Global Namespace937139
+Node: Qualified Names938537
+Node: Default Namespace939536
+Node: Changing The Namespace940277
+Node: Naming Rules941891
+Node: Internal Name Management943739
+Node: Namespace Example944781
+Node: Namespace And Features947343
+Node: Namespace Summary948778
+Node: Arbitrary Precision Arithmetic950255
+Node: Computer Arithmetic951742
+Ref: table-numeric-ranges955508
+Ref: table-floating-point-ranges956001
+Ref: Computer Arithmetic-Footnote-1956659
+Node: Math Definitions956716
+Ref: table-ieee-formats960032
+Ref: Math Definitions-Footnote-1960635
+Node: MPFR features960740
+Node: FP Math Caution962458
+Ref: FP Math Caution-Footnote-1963530
+Node: Inexactness of computations963899
+Node: Inexact representation964859
+Node: Comparing FP Values966219
+Node: Errors accumulate967460
+Node: Getting Accuracy968893
+Node: Try To Round971603
+Node: Setting precision972502
+Ref: table-predefined-precision-strings973199
+Node: Setting the rounding mode975029
+Ref: table-gawk-rounding-modes975403
+Ref: Setting the rounding mode-Footnote-1979334
+Node: Arbitrary Precision Integers979513
+Ref: Arbitrary Precision Integers-Footnote-1982688
+Node: Checking for MPFR982837
+Node: POSIX Floating Point Problems984311
+Ref: POSIX Floating Point Problems-Footnote-1988596
+Node: Floating point summary988634
+Node: Dynamic Extensions990824
+Node: Extension Intro992377
+Node: Plugin License993643
+Node: Extension Mechanism Outline994440
+Ref: figure-load-extension994879
+Ref: figure-register-new-function996444
+Ref: figure-call-new-function997536
+Node: Extension API Description999598
+Node: Extension API Functions Introduction1001311
+Ref: table-api-std-headers1003147
+Node: General Data Types1007396
+Ref: General Data Types-Footnote-11016026
+Node: Memory Allocation Functions1016325
+Ref: Memory Allocation Functions-Footnote-11020826
+Node: Constructor Functions1020925
+Node: API Ownership of MPFR and GMP Values1024391
+Node: Registration Functions1025704
+Node: Extension Functions1026404
+Node: Exit Callback Functions1031726
+Node: Extension Version String1032976
+Node: Input Parsers1033639
+Node: Output Wrappers1046360
+Node: Two-way processors1050872
+Node: Printing Messages1053137
+Ref: Printing Messages-Footnote-11054308
+Node: Updating ERRNO1054461
+Node: Requesting Values1055200
+Ref: table-value-types-returned1055937
+Node: Accessing Parameters1056873
+Node: Symbol Table Access1058110
+Node: Symbol table by name1058622
+Ref: Symbol table by name-Footnote-11061646
+Node: Symbol table by cookie1061774
+Ref: Symbol table by cookie-Footnote-11065959
+Node: Cached values1066023
+Ref: Cached values-Footnote-11069559
+Node: Array Manipulation1069712
+Ref: Array Manipulation-Footnote-11070803
+Node: Array Data Types1070840
+Ref: Array Data Types-Footnote-11073498
+Node: Array Functions1073590
+Node: Flattening Arrays1078088
+Node: Creating Arrays1085064
+Node: Redirection API1089831
+Node: Extension API Variables1092664
+Node: Extension Versioning1093375
+Ref: gawk-api-version1093804
+Node: Extension GMP/MPFR Versioning1095535
+Node: Extension API Informational Variables1097163
+Node: Extension API Boilerplate1098236
+Node: Changes from API V11102210
+Node: Finding Extensions1103782
+Node: Extension Example1104341
+Node: Internal File Description1105139
+Node: Internal File Ops1109219
+Ref: Internal File Ops-Footnote-11120569
+Node: Using Internal File Ops1120709
+Ref: Using Internal File Ops-Footnote-11123092
+Node: Extension Samples1123366
+Node: Extension Sample File Functions1124895
+Node: Extension Sample Fnmatch1132544
+Node: Extension Sample Fork1134031
+Node: Extension Sample Inplace1135249
+Node: Extension Sample Ord1138874
+Node: Extension Sample Readdir1139710
+Ref: table-readdir-file-types1140599
+Node: Extension Sample Revout1141666
+Node: Extension Sample Rev2way1142255
+Node: Extension Sample Read write array1142995
+Node: Extension Sample Readfile1144937
+Node: Extension Sample Time1146032
+Node: Extension Sample API Tests1147784
+Node: gawkextlib1148276
+Node: Extension summary1151194
+Node: Extension Exercises1154896
+Node: Language History1156138
+Node: V7/SVR3.11157794
+Node: SVR41159946
+Node: POSIX1161380
+Node: BTL1162761
+Node: POSIX/GNU1163490
+Node: Feature History1169268
+Node: Common Extensions1185587
+Node: Ranges and Locales1186870
+Ref: Ranges and Locales-Footnote-11191486
+Ref: Ranges and Locales-Footnote-21191513
+Ref: Ranges and Locales-Footnote-31191748
+Node: Contributors1191971
+Node: History summary1197968
+Node: Installation1199348
+Node: Gawk Distribution1200292
+Node: Getting1200776
+Node: Extracting1201739
+Node: Distribution contents1203377
+Node: Unix Installation1209857
+Node: Quick Installation1210539
+Node: Shell Startup Files1212953
+Node: Additional Configuration Options1214042
+Node: Configuration Philosophy1216357
+Node: Non-Unix Installation1218726
+Node: PC Installation1219186
+Node: PC Binary Installation1220024
+Node: PC Compiling1220459
+Node: PC Using1221576
+Node: Cygwin1225129
+Node: MSYS1226353
+Node: VMS Installation1226955
+Node: VMS Compilation1227746
+Ref: VMS Compilation-Footnote-11228975
+Node: VMS Dynamic Extensions1229033
+Node: VMS Installation Details1230718
+Node: VMS Running1232971
+Node: VMS GNV1237250
+Node: VMS Old Gawk1237985
+Node: Bugs1238456
+Node: Bug address1239119
+Node: Usenet1242101
+Node: Maintainers1243105
+Node: Other Versions1244290
+Node: Installation summary1251378
+Node: Notes1252587
+Node: Compatibility Mode1253381
+Node: Additions1254163
+Node: Accessing The Source1255088
+Node: Adding Code1256525
+Node: New Ports1262744
+Node: Derived Files1267119
+Ref: Derived Files-Footnote-11272779
+Ref: Derived Files-Footnote-21272814
+Ref: Derived Files-Footnote-31273412
+Node: Future Extensions1273526
+Node: Implementation Limitations1274184
+Node: Extension Design1275394
+Node: Old Extension Problems1276538
+Ref: Old Extension Problems-Footnote-11278056
+Node: Extension New Mechanism Goals1278113
+Ref: Extension New Mechanism Goals-Footnote-11281477
+Node: Extension Other Design Decisions1281666
+Node: Extension Future Growth1283779
+Node: Notes summary1284385
+Node: Basic Concepts1285543
+Node: Basic High Level1286224
+Ref: figure-general-flow1286506
+Ref: figure-process-flow1287191
+Ref: Basic High Level-Footnote-11290492
+Node: Basic Data Typing1290677
+Node: Glossary1294005
+Node: Copying1325890
+Node: GNU Free Documentation License1363433
+Node: Index1388553
End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 6d10e4e..60f129d 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -25998,8 +25998,6 @@ END @{
@node Uniq Program
@subsection Printing Nonduplicated Lines of Text
-@c FIXME: One day, update to current POSIX version of uniq
-
@cindex printing @subentry unduplicated lines of text
@cindex text, printing @subentry unduplicated lines of
@cindex @command{uniq} utility
@@ -26009,7 +26007,7 @@ prints unique lines---hence the name. @command{uniq}
has a number of
options. The usage is as follows:
@display
-@command{uniq} [@option{-udc} [@code{-@var{n}}]] [@code{+@var{n}}]
[@var{inputfile} [@var{outputfile}]]
+@command{uniq} [@option{-udc} [@code{-f @var{n}}] [@code{-s @var{n}}]]
[@var{inputfile} [@var{outputfile}]]
@end display
The options for @command{uniq} are:
@@ -26025,14 +26023,14 @@ Print only nonrepeated (unique) lines.
Count lines. This option overrides @option{-d} and @option{-u}. Both repeated
and nonrepeated lines are counted.
-@item -@var{n}
+@item -f @var{n}
Skip @var{n} fields before comparing lines. The definition of fields
is similar to @command{awk}'s default: nonwhitespace characters separated
by runs of spaces and/or TABs.
-@item +@var{n}
+@item -s @var{n}
Skip @var{n} characters before comparing lines. Any fields specified with
-@samp{-@var{n}} are skipped first.
+@option{-f} are skipped first.
@item @var{inputfile}
Data is read from the input file named on the command line, instead of from
@@ -26053,22 +26051,7 @@ and the @code{join()} library function
(@pxref{Join Function}).
The program begins with a @code{usage()} function and then a brief outline of
-the options and their meanings in comments.
-The @code{BEGIN} rule deals with the command-line arguments and options. It
-uses a trick to get @code{getopt()} to handle options of the form @samp{-25},
-treating such an option as the option letter @samp{2} with an argument of
-@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks
-like a number), @code{Optarg} is
-concatenated with the option digit and then the result is added to zero to make
-it into a number. If there is only one digit in the option, then
-@code{Optarg} is not needed. In this case, @code{Optind} must be decremented
so that
-@code{getopt()} processes it next time. This code is admittedly a bit
-tricky.
-
-If no options are supplied, then the default is taken, to print both
-repeated and nonrepeated lines. The output file, if provided, is assigned
-to @code{outputfile}. Early on, @code{outputfile} is initialized to the
-standard output, @file{/dev/stdout}:
+the options and their meanings in comments:
@cindex @code{uniq.awk} program
@example
@@ -26084,26 +26067,62 @@ standard output, @file{/dev/stdout}:
#
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# May 1993
+# Updated August 2020 to current POSIX
@c endfile
@end ignore
@c file eg/prog/uniq.awk
function usage()
@{
- print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr"
+ print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") >
"/dev/stderr"
exit 1
@}
# -c count lines. overrides -d and -u
# -d only repeated lines
# -u only nonrepeated lines
-# -n skip n fields
-# +n skip n characters, skip fields first
+# -f n skip n fields
+# -s n skip n characters, skip fields first
+@c endfile
+@end example
+
+The POSIX standard for @command{uniq} allows options to start with
+@samp{+} as well as with @samp{-}. An initial @code{BEGIN} rule
+traverses the arguments changing any leading @samp{+} to @samp{-}
+so that the @code{getopt()} function can parse the options:
+
+@example
+@c file eg/prog/uniq.awk
+# As of 2020, '+' can be used as option character in addition to '-'
+# Previously allowed use of -N to skip fields and +N to skip
+# characters is no longer allowed, and not supported by this version.
+
+BEGIN @{
+ # Convert + to - so getopt can handle things
+ for (i = 1; i < ARGC; i++) @{
+ first = substr(ARGV[i], 1, 1)
+ if (ARGV[i] == "--" || (first != "-" && first != "+"))
+ break
+ else if (first == "+")
+ # Replace "+" with "-"
+ ARGV[i] = "-" substr(ARGV[i], 2)
+ @}
+@}
+@c endfile
+@end example
+
+The next @code{BEGIN} rule deals with the command-line arguments and options.
+If no options are supplied, then the default is taken, to print both
+repeated and nonrepeated lines. The output file, if provided, is assigned
+to @code{outputfile}. Early on, @code{outputfile} is initialized to the
+standard output, @file{/dev/stdout}:
+@example
+@c file eg/prog/uniq.awk
BEGIN @{
count = 1
outputfile = "/dev/stdout"
- opts = "udc0:1:2:3:4:5:6:7:8:9:"
+ opts = "udcf:s:"
while ((c = getopt(ARGC, ARGV, opts)) != -1) @{
if (c == "u")
non_repeated_only++
@@ -26111,26 +26130,14 @@ BEGIN @{
repeated_only++
else if (c == "c")
do_count++
- else if (index("0123456789", c) != 0) @{
- # getopt() requires args to options
- # this messes us up for things like -5
- if (Optarg ~ /^[[:digit:]]+$/)
- fcount = (c Optarg) + 0
- else @{
- fcount = c + 0
- Optind--
- @}
- @} else
+ else if (c == "f")
+ fcount = Optarg + 0
+ else if (c == "s")
+ charcount = Optarg + 0
+ else
usage()
@}
-@group
- if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{
- charcount = substr(ARGV[Optind], 2) + 0
- Optind++
- @}
-@end group
-
for (i = 1; i < Optind; i++)
ARGV[i] = ""
@@ -26260,20 +26267,15 @@ As a side note, this program does not follow our
recommended convention of namin
global variables with a leading capital letter. Doing that would
make the program a little easier to follow.
-@ifset FOR_PRINT
The logic for choosing which lines to print represents a @dfn{state
-machine}, which is ``a device that can be in one of a set number of stable
-conditions depending on its previous condition and on the present values
-of its inputs.''@footnote{This is the definition returned from entering
-@code{define: state machine} into Google.}
-Brian Kernighan suggests that
-``an alternative approach to state machines is to just read
-the input into an array, then use indexing. It's almost always
-easier code, and for most inputs where you would use this, just
-as fast.'' Consider how to rewrite the logic to follow this
-suggestion.
-@end ifset
-
+machine}, which is ``a device which can be in one of a set number
+of stable conditions depending on its previous condition and on the
+present values of its inputs.''@footnote{This definition is from
+@uref{https://www.lexico.com/en/definition/state_machine}.} Brian
+Kernighan suggests that ``an alternative approach to state machines is
+to just read the input into an array, then use indexing. It's almost
+always easier code, and for most inputs where you would use this, just
+as fast.'' Consider how to rewrite the logic to follow this suggestion.
@node Wc Program
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 2d13f51..6a9dfe0 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -25008,8 +25008,6 @@ END @{
@node Uniq Program
@subsection Printing Nonduplicated Lines of Text
-@c FIXME: One day, update to current POSIX version of uniq
-
@cindex printing @subentry unduplicated lines of text
@cindex text, printing @subentry unduplicated lines of
@cindex @command{uniq} utility
@@ -25019,7 +25017,7 @@ prints unique lines---hence the name. @command{uniq}
has a number of
options. The usage is as follows:
@display
-@command{uniq} [@option{-udc} [@code{-@var{n}}]] [@code{+@var{n}}]
[@var{inputfile} [@var{outputfile}]]
+@command{uniq} [@option{-udc} [@code{-f @var{n}}] [@code{-s @var{n}}]]
[@var{inputfile} [@var{outputfile}]]
@end display
The options for @command{uniq} are:
@@ -25035,14 +25033,14 @@ Print only nonrepeated (unique) lines.
Count lines. This option overrides @option{-d} and @option{-u}. Both repeated
and nonrepeated lines are counted.
-@item -@var{n}
+@item -f @var{n}
Skip @var{n} fields before comparing lines. The definition of fields
is similar to @command{awk}'s default: nonwhitespace characters separated
by runs of spaces and/or TABs.
-@item +@var{n}
+@item -s @var{n}
Skip @var{n} characters before comparing lines. Any fields specified with
-@samp{-@var{n}} are skipped first.
+@option{-f} are skipped first.
@item @var{inputfile}
Data is read from the input file named on the command line, instead of from
@@ -25063,22 +25061,7 @@ and the @code{join()} library function
(@pxref{Join Function}).
The program begins with a @code{usage()} function and then a brief outline of
-the options and their meanings in comments.
-The @code{BEGIN} rule deals with the command-line arguments and options. It
-uses a trick to get @code{getopt()} to handle options of the form @samp{-25},
-treating such an option as the option letter @samp{2} with an argument of
-@samp{5}. If indeed two or more digits are supplied (@code{Optarg} looks
-like a number), @code{Optarg} is
-concatenated with the option digit and then the result is added to zero to make
-it into a number. If there is only one digit in the option, then
-@code{Optarg} is not needed. In this case, @code{Optind} must be decremented
so that
-@code{getopt()} processes it next time. This code is admittedly a bit
-tricky.
-
-If no options are supplied, then the default is taken, to print both
-repeated and nonrepeated lines. The output file, if provided, is assigned
-to @code{outputfile}. Early on, @code{outputfile} is initialized to the
-standard output, @file{/dev/stdout}:
+the options and their meanings in comments:
@cindex @code{uniq.awk} program
@example
@@ -25094,26 +25077,62 @@ standard output, @file{/dev/stdout}:
#
# Arnold Robbins, arnold@@skeeve.com, Public Domain
# May 1993
+# Updated August 2020 to current POSIX
@c endfile
@end ignore
@c file eg/prog/uniq.awk
function usage()
@{
- print("Usage: uniq [-udc [-n]] [+n] [ in [ out ]]") > "/dev/stderr"
+ print("Usage: uniq [-udc [-f fields] [-s chars]] [ in [ out ]]") >
"/dev/stderr"
exit 1
@}
# -c count lines. overrides -d and -u
# -d only repeated lines
# -u only nonrepeated lines
-# -n skip n fields
-# +n skip n characters, skip fields first
+# -f n skip n fields
+# -s n skip n characters, skip fields first
+@c endfile
+@end example
+
+The POSIX standard for @command{uniq} allows options to start with
+@samp{+} as well as with @samp{-}. An initial @code{BEGIN} rule
+traverses the arguments changing any leading @samp{+} to @samp{-}
+so that the @code{getopt()} function can parse the options:
+
+@example
+@c file eg/prog/uniq.awk
+# As of 2020, '+' can be used as option character in addition to '-'
+# Previously allowed use of -N to skip fields and +N to skip
+# characters is no longer allowed, and not supported by this version.
+
+BEGIN @{
+ # Convert + to - so getopt can handle things
+ for (i = 1; i < ARGC; i++) @{
+ first = substr(ARGV[i], 1, 1)
+ if (ARGV[i] == "--" || (first != "-" && first != "+"))
+ break
+ else if (first == "+")
+ # Replace "+" with "-"
+ ARGV[i] = "-" substr(ARGV[i], 2)
+ @}
+@}
+@c endfile
+@end example
+
+The next @code{BEGIN} rule deals with the command-line arguments and options.
+If no options are supplied, then the default is taken, to print both
+repeated and nonrepeated lines. The output file, if provided, is assigned
+to @code{outputfile}. Early on, @code{outputfile} is initialized to the
+standard output, @file{/dev/stdout}:
+@example
+@c file eg/prog/uniq.awk
BEGIN @{
count = 1
outputfile = "/dev/stdout"
- opts = "udc0:1:2:3:4:5:6:7:8:9:"
+ opts = "udcf:s:"
while ((c = getopt(ARGC, ARGV, opts)) != -1) @{
if (c == "u")
non_repeated_only++
@@ -25121,26 +25140,14 @@ BEGIN @{
repeated_only++
else if (c == "c")
do_count++
- else if (index("0123456789", c) != 0) @{
- # getopt() requires args to options
- # this messes us up for things like -5
- if (Optarg ~ /^[[:digit:]]+$/)
- fcount = (c Optarg) + 0
- else @{
- fcount = c + 0
- Optind--
- @}
- @} else
+ else if (c == "f")
+ fcount = Optarg + 0
+ else if (c == "s")
+ charcount = Optarg + 0
+ else
usage()
@}
-@group
- if (ARGV[Optind] ~ /^\+[[:digit:]]+$/) @{
- charcount = substr(ARGV[Optind], 2) + 0
- Optind++
- @}
-@end group
-
for (i = 1; i < Optind; i++)
ARGV[i] = ""
@@ -25270,20 +25277,15 @@ As a side note, this program does not follow our
recommended convention of namin
global variables with a leading capital letter. Doing that would
make the program a little easier to follow.
-@ifset FOR_PRINT
The logic for choosing which lines to print represents a @dfn{state
-machine}, which is ``a device that can be in one of a set number of stable
-conditions depending on its previous condition and on the present values
-of its inputs.''@footnote{This is the definition returned from entering
-@code{define: state machine} into Google.}
-Brian Kernighan suggests that
-``an alternative approach to state machines is to just read
-the input into an array, then use indexing. It's almost always
-easier code, and for most inputs where you would use this, just
-as fast.'' Consider how to rewrite the logic to follow this
-suggestion.
-@end ifset
-
+machine}, which is ``a device which can be in one of a set number
+of stable conditions depending on its previous condition and on the
+present values of its inputs.''@footnote{This definition is from
+@uref{https://www.lexico.com/en/definition/state_machine}.} Brian
+Kernighan suggests that ``an alternative approach to state machines is
+to just read the input into an array, then use indexing. It's almost
+always easier code, and for most inputs where you would use this, just
+as fast.'' Consider how to rewrite the logic to follow this suggestion.
@node Wc Program
-----------------------------------------------------------------------
Summary of changes:
awklib/eg/prog/uniq.awk | 43 ++--
doc/ChangeLog | 4 +
doc/gawk.info | 618 +++++++++++++++++++++++++-----------------------
doc/gawk.texi | 114 ++++-----
doc/gawktexi.in | 114 ++++-----
5 files changed, 462 insertions(+), 431 deletions(-)
hooks/post-receive
--
gawk
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [SCM] gawk branch, gawk-5.1-stable, updated. gawk-4.1.0-4108-g374a2c1,
Arnold Robbins <=