[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gawk-diffs] [SCM] gawk branch, feature/namespaces, updated. gawk-4.1.0-
From: |
Arnold Robbins |
Subject: |
[gawk-diffs] [SCM] gawk branch, feature/namespaces, updated. gawk-4.1.0-2608-gd4ec803 |
Date: |
Fri, 30 Jun 2017 05:47:44 -0400 (EDT) |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gawk".
The branch, feature/namespaces has been updated
via d4ec80371e61d6404b02541503b642ddb93c45cb (commit)
from 12c79b4b14717c2046d9c5863382b829f31324aa (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.sv.gnu.org/cgit/gawk.git/commit/?id=d4ec80371e61d6404b02541503b642ddb93c45cb
commit d4ec80371e61d6404b02541503b642ddb93c45cb
Author: Arnold D. Robbins <address@hidden>
Date: Fri Jun 30 12:47:26 2017 +0300
Move namespace chapter to later in the manual.
diff --git a/doc/ChangeLog b/doc/ChangeLog
index f7e839f..f7dbb8f 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2017-06-30 Arnold D. Robbins <address@hidden>
+
+ * gawktexi.in (Namespaces): Move to later in the book.
+
2017-06-23 Arnold D. Robbins <address@hidden>
* gawktexi.in (Namespaces): More minor doc edits.
diff --git a/doc/gawk.info b/doc/gawk.info
index 3ba9d06..c5aee74 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -83,12 +83,12 @@ in (a) below. A copy of the license is included in the
section entitled
* Library Functions:: A Library of 'awk' Functions.
* Sample Programs:: Many 'awk' programs with complete
explanations.
-* Namespaces:: How namespaces work in 'gawk'.
* Advanced Features:: Stuff for advanced users, specific to
'gawk'.
* Internationalization:: Getting 'gawk' to speak your
language.
* Debugger:: The 'gawk' debugger.
+* Namespaces:: How namespaces work in 'gawk'.
* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
'gawk'.
* Dynamic Extensions:: Adding new built-in functions to
@@ -480,12 +480,6 @@ in (a) below. A copy of the license is included in the
section entitled
time on their hands.
* Programs Summary:: Summary of programs.
* Programs Exercises:: Exercises.
-* Global Namespace:: The global namespace in standard 'awk'.
-* Qualified Names:: How to qualify names with a namespace.
-* Default Namespace:: The default namespace.
-* Changing The Namespace:: How to change the namespace.
-* Namespace Example:: An example of code using a namespace.
-* Namespace Misc:: Namespace notes for developers.
* Nondecimal Data:: Allowing nondecimal input data.
* Array Sorting:: Facilities for controlling array
traversal and sorting arrays.
@@ -529,6 +523,12 @@ in (a) below. A copy of the license is included in the
section entitled
* Readline Support:: Readline support.
* Limitations:: Limitations and future plans.
* Debugging Summary:: Debugging summary.
+* Global Namespace:: The global namespace in standard 'awk'.
+* Qualified Names:: How to qualify names with a namespace.
+* Default Namespace:: The default namespace.
+* Changing The Namespace:: How to change the namespace.
+* Namespace Example:: An example of code using a namespace.
+* Namespace Misc:: Namespace notes for developers.
* Computer Arithmetic:: A quick intro to computer math.
* Math Definitions:: Defining terms used.
* MPFR features:: The MPFR features in 'gawk'.
@@ -17037,7 +17037,7 @@ File: gawk.info, Node: Library Exercises, Prev:
Library Functions Summary, Up
an intervening value in 'ARGV' is a variable assignment.
-File: gawk.info, Node: Sample Programs, Next: Namespaces, Prev: Library
Functions, Up: Top
+File: gawk.info, Node: Sample Programs, Next: Advanced Features, Prev:
Library Functions, Up: Top
11 Practical 'awk' Programs
***************************
@@ -19740,312 +19740,9 @@ File: gawk.info, Node: Programs Exercises, Prev:
Programs Summary, Up: Sample
machine' into Google.
-File: gawk.info, Node: Namespaces, Next: Advanced Features, Prev: Sample
Programs, Up: Top
-
-12 Namespaces in 'gawk'
-***********************
-
-This major node describes a feature that is specific to 'gawk'.
-
-* Menu:
-
-* Global Namespace:: The global namespace in standard 'awk'.
-* Qualified Names:: How to qualify names with a namespace.
-* Default Namespace:: The default namespace.
-* Changing The Namespace:: How to change the namespace.
-* Internal Name Management:: How names are stored internally.
-* Namespace Example:: An example of code using a namespace.
-* Namespace Misc:: Namespace notes for developers.
-
-
-File: gawk.info, Node: Global Namespace, Next: Qualified Names, Up:
Namespaces
-
-12.1 Standard 'awk''s Single Namespace
-======================================
-
-In standard 'awk', there is a single, global, "namespace". This means
-that _all_ function names and global variable names must be unique. For
-example, two different 'awk' source files cannot both define a function
-named 'min()', or define an array named 'data'.
-
- This situation is okay when programs are small, say a few hundred
-lines, or even a few thousand, but it prevents the development of
-reusable libraries of 'awk' functions, and can inadvertently cause
-independently-developed library files to accidentally step on each
-other's "private" global variables (*note Library Names::).
-
- Most other programming languages solve this issue by providing some
-kind of namespace control: a way to say "this function is in namespace
-XXX, and that function is in namespace YYY." (Of course, there is then
-still a single namespace for the namespaces, but the hope is that there
-are much fewer namespaces in use by any given program, and thus much
-less chance for collisions.) These facilities are sometimes referred to
-as "packages" or "modules".
-
- Starting with version *FIXME* 5.0, 'gawk' provides a mechanism to put
-functions and global variables into separate namespaces.
-
-
-File: gawk.info, Node: Qualified Names, Next: Default Namespace, Prev:
Global Namespace, Up: Namespaces
-
-12.2 Qualified Names
-====================
-
-A "qualified name" is an identifier that includes a namespace name and
-the namespace separator, '::'. For example, one might have a function
-named 'posix::getpid()'. Here, the namespace is 'posix' and the
-function name within the namespace is 'getpid()'. The namespace and
-variable or function name are separated by a double-colon. Only one
-such separator is allowed in a qualified name.
-
- NOTE: Unlike C++, the '::' is _not_ an operator. No spaces are
- allowed between the namespace name, the '::', and the rest of the
- name.
-
- You must use fully qualified names from one namespace to access
-variables and functions in another. This is especially important when
-using variable names to index the special 'SYMTAB' array (*note
-Auto-set::), and when making indirect function calls (*note Indirect
-Calls::).
-
- It is a syntax error to use any 'gawk' reserved word (such as 'if' or
-'for'), or the name of any built-in function (such as 'sin()' or
-'gsub()') as the second part of a fully qualified name. Using such an
-identifier as a namespace name (currently) _is_ allowed, but produces a
-lint warning.
-
- 'gawk' pre-defined variable names may be used: 'NF::NR' is valid, if
-possibly not all that useful.
-
-
-File: gawk.info, Node: Default Namespace, Next: Changing The Namespace,
Prev: Qualified Names, Up: Namespaces
-
-12.3 The Default Namespace
-==========================
-
-The default namespace, not surprisingly, is 'awk'. All of the
-predefined 'awk' and 'gawk' variables are in this namespace, and thus
-have qualified names like 'awk::ARGC', 'awk::NF', and so on.
-
- Furthermore, even when you have changed the namespace for your
-current source file (*note Changing The Namespace::), 'gawk' forces
-unqualified identifiers whose names are all uppercase letters to be in
-the 'awk' namespace. This makes it possible for you to easily reference
-'gawk''s global variables from different namespaces.
-
- It is a syntax error to use qualified names for function parameter
-names.
-
-
-File: gawk.info, Node: Changing The Namespace, Next: Internal Name
Management, Prev: Default Namespace, Up: Namespaces
-
-12.4 Changing The Namespace
-===========================
-
-In order to set the current namespace, use an '@namespace' directive at
-the top level of your program:
-
- @namespace "passwd"
-
- BEGIN { ... }
- ...
-
- After this directive, all simple non-completely-uppercase identifiers
-are placed into the 'passwd' namespace.
-
- You can change the namespace multiple times within a single source
-file, although this is likely to become confusing if you do it too much.
-
- NOTE: Association of unqualified identifiers to a namespace is
- handled while your program is being parsed by 'gawk' and before it
- starts to run. There is no concept of a "current" namespace once
- your program starts executing. Be sure you understand this.
-
- Each source file for '-i' and '-f' starts out with an implicit
-'@namespace "awk"'. Similarly, each chunk of command-line code supplied
-with '-e' has such an implicit initial statement (*note Options::).
-
- The use of '@namespace' has no influence upon the order of execution
-of 'BEGIN', 'BEGINFILE', 'END', and 'ENDFILE' rules.
-
-
-File: gawk.info, Node: Internal Name Management, Next: Namespace Example,
Prev: Changing The Namespace, Up: Namespaces
-
-12.5 Internal Name Management
-=============================
-
-For backwards compatibility, all identifiers in the 'awk' namespace are
-stored internally as unadorned identifiers. This is mainly relevant
-when using such identifiers as indices for 'SYMTAB', 'FUNCTAB', and
-'PROCINFO["identifiers"]' (*note Auto-set::), and for use in indirect
-function calls (*note Indirect Calls::).
-
- In program code, to refer to variables and functions in the 'awk'
-namespace from another namespace, you must still use the 'awk::' prefix.
-For example:
-
- @namespace "awk" This is the default namespace
-
- BEGIN {
- Title = "My Report" Fully qualified name is awk::Title
- }
-
- @namespace "report" Now in report namespace
-
- function compute() This is really report::compute()
- {
- print awk::Title But would be SYMTAB["Title"]
- ...
- }
-
-
-File: gawk.info, Node: Namespace Example, Next: Namespace Misc, Prev:
Internal Name Management, Up: Namespaces
-
-12.6 Namespace Example
-======================
-
- # FIXME: fix this up for real, dates etc
- #
- # passwd.awk --- access password file information
- #
- # Arnold Robbins, address@hidden, Public Domain
- # May 1993
- # Revised October 2000
- # Revised December 2010
- #
- # Reworked for namespaces May 2017
-
- @namespace "passwd"
-
- BEGIN {
- # tailor this to suit your system
- Awklib = "/usr/local/libexec/awk/"
- }
-
- function Init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
- {
- if (Inited)
- return
-
- oldfs = FS
- oldrs = RS
- olddol0 = $0
- using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
- using_fpat = (PROCINFO["FS"] == "FPAT")
- FS = ":"
- RS = "\n"
-
- pwcat = Awklib "pwcat"
- while ((pwcat | getline) > 0) {
- Byname[$1] = $0
- Byuid[$3] = $0
- Bycount[++Total] = $0
- }
- close(pwcat)
- Count = 0
- Inited = 1
- FS = oldfs
- if (using_fw)
- FIELDWIDTHS = FIELDWIDTHS
- else if (using_fpat)
- FPAT = FPAT
- RS = oldrs
- $0 = olddol0
- }
-
- function Getpwnam(name)
- {
- Init()
- return Byname[name]
- }
-
- function Getpwuid(uid)
- {
- Init()
- return Byuid[uid]
- }
-
- function Getpwent()
- {
- Init()
- if (Count < Total)
- return Bycount[++Count]
- return ""
- }
-
- function Endpwent()
- {
- Count = 0
- }
-
- # Compatibility:
-
- @namespace "awk"
-
- function getpwnam(name)
- {
- return passwd::Getpwnam(name)
- }
-
- function getpwuid(uid)
- {
- return passwd::Getpwuid(uid)
- }
-
- function getpwent()
- {
- return passwd::Getpwent()
- }
-
- function endpwent()
- {
- passwd::Endpwent()
- }
-
-
-File: gawk.info, Node: Namespace Misc, Prev: Namespace Example, Up:
Namespaces
-
-12.7 Miscellaneous Notes
-========================
-
-Other notes for reviewers:
-
-Profiler:
- When profiling, we can add an 'Op_Namespace' to the start of each
- rule and function definition. If this is different than the
- previous one, output an '@namespace' statement. For each
- identifier, if it starts with the current namespace, output only
- the simple part. For all 'awk::XXX' if 'XXX' is all uppercase,
- strip off the 'awk::' part.
-
-Debugger:
- Simply print fully qualified names all the time. Maybe allow a
- 'namespace XXX' command in the debugger to set the namespace and it
- will use that to create fully qualified names? Have to be careful
- about all uppercase names though.
-
-How does this affect '@include'?
- Basically '@include' should push and pop the namespace. Each
- '@include' saves the current namespace and starts over with
- namespace 'awk' until an '@namespace' is seen.
-
-Extension functions
- Revise the current macros to pass '"awk"' as the namespace argument
- and add new macros with '_ns' or some such in the name that pass
- the namespace of the extension. This preserves backwards
- compatibility at the source level while providing access to
- namespaces as needed.
-
- Actually, since we've decided that 'awk' namespace variables and
- function are stored unadorned, the current macros that pass '""'
- would continue to work. Internally, we need to recognize '"awk"'
- and _not_ fully qualify the name before storing it in the symbol
- table.
-
-
-File: gawk.info, Node: Advanced Features, Next: Internationalization, Prev:
Namespaces, Up: Top
+File: gawk.info, Node: Advanced Features, Next: Internationalization, Prev:
Sample Programs, Up: Top
-13 Advanced Features of 'gawk'
+12 Advanced Features of 'gawk'
******************************
Write documentation as if whoever reads it is a violent psychopath
@@ -20091,7 +19788,7 @@ their own:
File: gawk.info, Node: Nondecimal Data, Next: Array Sorting, Up: Advanced
Features
-13.1 Allowing Nondecimal Input Data
+12.1 Allowing Nondecimal Input Data
===================================
If you run 'gawk' with the '--non-decimal-data' option, you can have
@@ -20134,7 +19831,7 @@ request it.
File: gawk.info, Node: Array Sorting, Next: Two-way I/O, Prev: Nondecimal
Data, Up: Advanced Features
-13.2 Controlling Array Traversal and Array Sorting
+12.2 Controlling Array Traversal and Array Sorting
==================================================
'gawk' lets you control the order in which a 'for (INDX in ARRAY)' loop
@@ -20153,7 +19850,7 @@ to order the elements during sorting.
File: gawk.info, Node: Controlling Array Traversal, Next: Array Sorting
Functions, Up: Array Sorting
-13.2.1 Controlling Array Traversal
+12.2.1 Controlling Array Traversal
----------------------------------
By default, the order in which a 'for (INDX in ARRAY)' loop scans an
@@ -20391,7 +20088,7 @@ character, which cannot be part of an identifier.
File: gawk.info, Node: Array Sorting Functions, Prev: Controlling Array
Traversal, Up: Array Sorting
-13.2.2 Sorting Array Values and Indices with 'gawk'
+12.2.2 Sorting Array Values and Indices with 'gawk'
---------------------------------------------------
In most 'awk' implementations, sorting an array requires writing a
@@ -20531,7 +20228,7 @@ POSIX-compatibility mode, and because 'asort()' and
'asorti()' are
File: gawk.info, Node: Two-way I/O, Next: TCP/IP Networking, Prev: Array
Sorting, Up: Advanced Features
-13.3 Two-Way Communications with Another Process
+12.3 Two-Way Communications with Another Process
================================================
It is often useful to be able to send data to a separate program for
@@ -20686,7 +20383,7 @@ in Bash.
File: gawk.info, Node: TCP/IP Networking, Next: Profiling, Prev: Two-way
I/O, Up: Advanced Features
-13.4 Using 'gawk' for Network Programming
+12.4 Using 'gawk' for Network Programming
=========================================
'EMRED':
@@ -20766,7 +20463,7 @@ complete introduction and discussion, as well as
extensive examples.
File: gawk.info, Node: Profiling, Next: Advanced Features Summary, Prev:
TCP/IP Networking, Up: Advanced Features
-13.5 Profiling Your 'awk' Programs
+12.5 Profiling Your 'awk' Programs
==================================
You may produce execution traces of your 'awk' programs. This is done
@@ -21025,7 +20722,7 @@ improve this in a subsequent release.
File: gawk.info, Node: Advanced Features Summary, Prev: Profiling, Up:
Advanced Features
-13.6 Summary
+12.6 Summary
============
* The '--non-decimal-data' option causes 'gawk' to treat octal- and
@@ -21066,7 +20763,7 @@ File: gawk.info, Node: Advanced Features Summary,
Prev: Profiling, Up: Advanc
File: gawk.info, Node: Internationalization, Next: Debugger, Prev: Advanced
Features, Up: Top
-14 Internationalization with 'gawk'
+13 Internationalization with 'gawk'
***********************************
Once upon a time, computer makers wrote software that worked only in
@@ -21098,7 +20795,7 @@ requirement.
File: gawk.info, Node: I18N and L10N, Next: Explaining gettext, Up:
Internationalization
-14.1 Internationalization and Localization
+13.1 Internationalization and Localization
==========================================
"Internationalization" means writing (or modifying) a program once, in
@@ -21113,7 +20810,7 @@ read.
File: gawk.info, Node: Explaining gettext, Next: Programmer i18n, Prev:
I18N and L10N, Up: Internationalization
-14.2 GNU 'gettext'
+13.2 GNU 'gettext'
==================
'gawk' uses GNU 'gettext' to provide its internationalization features.
@@ -21260,7 +20957,7 @@ the decimal point, while many Europeans do exactly the
opposite:
File: gawk.info, Node: Programmer i18n, Next: Translator i18n, Prev:
Explaining gettext, Up: Internationalization
-14.3 Internationalizing 'awk' Programs
+13.3 Internationalizing 'awk' Programs
======================================
'gawk' provides the following variables for internationalization:
@@ -21385,7 +21082,7 @@ create and use translations from 'awk'.
File: gawk.info, Node: Translator i18n, Next: I18N Example, Prev:
Programmer i18n, Up: Internationalization
-14.4 Translating 'awk' Programs
+13.4 Translating 'awk' Programs
===============================
Once a program's translatable strings have been marked, they must be
@@ -21406,7 +21103,7 @@ for 'printf' arguments at runtime is covered.
File: gawk.info, Node: String Extraction, Next: Printf Ordering, Up:
Translator i18n
-14.4.1 Extracting Marked Strings
+13.4.1 Extracting Marked Strings
--------------------------------
Once your 'awk' program is working, and all the strings have been marked
@@ -21435,7 +21132,7 @@ for 'guide'.
File: gawk.info, Node: Printf Ordering, Next: I18N Portability, Prev:
String Extraction, Up: Translator i18n
-14.4.2 Rearranging 'printf' Arguments
+13.4.2 Rearranging 'printf' Arguments
-------------------------------------
Format strings for 'printf' and 'sprintf()' (*note Printf::) present a
@@ -21512,7 +21209,7 @@ which the program is first written.
File: gawk.info, Node: I18N Portability, Prev: Printf Ordering, Up:
Translator i18n
-14.4.3 'awk' Portability Issues
+13.4.3 'awk' Portability Issues
-------------------------------
'gawk''s internationalization features were purposely chosen to have as
@@ -21577,7 +21274,7 @@ actually almost portable, requiring very little change:
File: gawk.info, Node: I18N Example, Next: Gawk I18N, Prev: Translator
i18n, Up: Internationalization
-14.5 A Simple Internationalization Example
+13.5 A Simple Internationalization Example
==========================================
Now let's look at a step-by-step example of how to internationalize and
@@ -21670,7 +21367,7 @@ and 'bindtextdomain()' (*note I18N Portability::) are
in a file named
File: gawk.info, Node: Gawk I18N, Next: I18N Summary, Prev: I18N Example,
Up: Internationalization
-14.6 'gawk' Can Speak Your Language
+13.6 'gawk' Can Speak Your Language
===================================
'gawk' itself has been internationalized using the GNU 'gettext'
@@ -21685,7 +21382,7 @@ usage messages, warnings, and fatal errors in the local
language.
File: gawk.info, Node: I18N Summary, Prev: Gawk I18N, Up:
Internationalization
-14.7 Summary
+13.7 Summary
============
* Internationalization means writing a program such that it can use
@@ -21715,9 +21412,9 @@ File: gawk.info, Node: I18N Summary, Prev: Gawk I18N,
Up: Internationalizatio
translations for its messages.
-File: gawk.info, Node: Debugger, Next: Arbitrary Precision Arithmetic,
Prev: Internationalization, Up: Top
+File: gawk.info, Node: Debugger, Next: Namespaces, Prev:
Internationalization, Up: Top
-15 Debugging 'awk' Programs
+14 Debugging 'awk' Programs
***************************
It would be nice if computer programs worked perfectly the first time
@@ -21742,7 +21439,7 @@ is easy.
File: gawk.info, Node: Debugging, Next: Sample Debugging Session, Up:
Debugger
-15.1 Introduction to the 'gawk' Debugger
+14.1 Introduction to the 'gawk' Debugger
========================================
This minor node introduces debugging in general and begins the
@@ -21757,7 +21454,7 @@ discussion of debugging in 'gawk'.
File: gawk.info, Node: Debugging Concepts, Next: Debugging Terms, Up:
Debugging
-15.1.1 Debugging in General
+14.1.1 Debugging in General
---------------------------
(If you have used debuggers in other languages, you may want to skip
@@ -21796,7 +21493,7 @@ functional program that you or someone else wrote).
File: gawk.info, Node: Debugging Terms, Next: Awk Debugging, Prev:
Debugging Concepts, Up: Debugging
-15.1.2 Debugging Concepts
+14.1.2 Debugging Concepts
-------------------------
Before diving in to the details, we need to introduce several important
@@ -21848,7 +21545,7 @@ defines terms used throughout the rest of this major
node:
File: gawk.info, Node: Awk Debugging, Prev: Debugging Terms, Up: Debugging
-15.1.3 'awk' Debugging
+14.1.3 'awk' Debugging
----------------------
Debugging an 'awk' program has some specific aspects that are not shared
@@ -21870,7 +21567,7 @@ commands.
File: gawk.info, Node: Sample Debugging Session, Next: List of Debugger
Commands, Prev: Debugging, Up: Debugger
-15.2 Sample 'gawk' Debugging Session
+14.2 Sample 'gawk' Debugging Session
====================================
In order to illustrate the use of 'gawk' as a debugger, let's look at a
@@ -21886,7 +21583,7 @@ example.
File: gawk.info, Node: Debugger Invocation, Next: Finding The Bug, Up:
Sample Debugging Session
-15.2.1 How to Start the Debugger
+14.2.1 How to Start the Debugger
--------------------------------
Starting the debugger is almost exactly like running 'gawk' normally,
@@ -21918,7 +21615,7 @@ code has been executed.
File: gawk.info, Node: Finding The Bug, Prev: Debugger Invocation, Up:
Sample Debugging Session
-15.2.2 Finding the Bug
+14.2.2 Finding the Bug
----------------------
Let's say that we are having a problem using (a faulty version of)
@@ -22112,7 +21809,7 @@ and problem solved!
File: gawk.info, Node: List of Debugger Commands, Next: Readline Support,
Prev: Sample Debugging Session, Up: Debugger
-15.3 Main Debugger Commands
+14.3 Main Debugger Commands
===========================
The 'gawk' debugger command set can be divided into the following
@@ -22151,7 +21848,7 @@ just by hitting 'Enter'. This works for the commands
'list', 'next',
File: gawk.info, Node: Breakpoint Control, Next: Debugger Execution Control,
Up: List of Debugger Commands
-15.3.1 Control of Breakpoints
+14.3.1 Control of Breakpoints
-----------------------------
As we saw earlier, the first thing you probably want to do in a
@@ -22246,7 +21943,7 @@ The commands for controlling breakpoints are:
File: gawk.info, Node: Debugger Execution Control, Next: Viewing And
Changing Data, Prev: Breakpoint Control, Up: List of Debugger Commands
-15.3.2 Control of Execution
+14.3.2 Control of Execution
---------------------------
Now that your breakpoints are ready, you can start running the program
@@ -22335,7 +22032,7 @@ execution of the program than we saw in our earlier
example:
File: gawk.info, Node: Viewing And Changing Data, Next: Execution Stack,
Prev: Debugger Execution Control, Up: List of Debugger Commands
-15.3.3 Viewing and Changing Data
+14.3.3 Viewing and Changing Data
--------------------------------
The commands for viewing and changing variables inside of 'gawk' are:
@@ -22422,7 +22119,7 @@ AWK STATEMENTS
File: gawk.info, Node: Execution Stack, Next: Debugger Info, Prev: Viewing
And Changing Data, Up: List of Debugger Commands
-15.3.4 Working with the Stack
+14.3.4 Working with the Stack
-----------------------------
Whenever you run a program that contains any function calls, 'gawk'
@@ -22462,7 +22159,7 @@ are:
File: gawk.info, Node: Debugger Info, Next: Miscellaneous Debugger Commands,
Prev: Execution Stack, Up: List of Debugger Commands
-15.3.5 Obtaining Information About the Program and the Debugger State
+14.3.5 Obtaining Information About the Program and the Debugger State
---------------------------------------------------------------------
Besides looking at the values of variables, there is often a need to get
@@ -22572,7 +22269,7 @@ from a file. The commands are:
File: gawk.info, Node: Miscellaneous Debugger Commands, Prev: Debugger Info,
Up: List of Debugger Commands
-15.3.6 Miscellaneous Commands
+14.3.6 Miscellaneous Commands
-----------------------------
There are a few more commands that do not fit into the previous
@@ -22691,7 +22388,7 @@ categories, as follows:
File: gawk.info, Node: Readline Support, Next: Limitations, Prev: List of
Debugger Commands, Up: Debugger
-15.4 Readline Support
+14.4 Readline Support
=====================
If 'gawk' is compiled with the GNU Readline library
@@ -22718,7 +22415,7 @@ Variable name completion
File: gawk.info, Node: Limitations, Next: Debugging Summary, Prev: Readline
Support, Up: Debugger
-15.5 Limitations
+14.5 Limitations
================
We hope you find the 'gawk' debugger useful and enjoyable to work with,
@@ -22761,7 +22458,7 @@ some limitations. A few that it's worth being aware of
are:
File: gawk.info, Node: Debugging Summary, Prev: Limitations, Up: Debugger
-15.6 Summary
+14.6 Summary
============
* Programs rarely work correctly the first time. Finding bugs is
@@ -22790,7 +22487,309 @@ File: gawk.info, Node: Debugging Summary, Prev:
Limitations, Up: Debugger
debugged, but occasionally it can.
-File: gawk.info, Node: Arbitrary Precision Arithmetic, Next: Dynamic
Extensions, Prev: Debugger, Up: Top
+File: gawk.info, Node: Namespaces, Next: Arbitrary Precision Arithmetic,
Prev: Debugger, Up: Top
+
+15 Namespaces in 'gawk'
+***********************
+
+This major node describes a feature that is specific to 'gawk'.
+
+* Menu:
+
+* Global Namespace:: The global namespace in standard 'awk'.
+* Qualified Names:: How to qualify names with a namespace.
+* Default Namespace:: The default namespace.
+* Changing The Namespace:: How to change the namespace.
+* Internal Name Management:: How names are stored internally.
+* Namespace Example:: An example of code using a namespace.
+* Namespace Misc:: Namespace notes for developers.
+
+
+File: gawk.info, Node: Global Namespace, Next: Qualified Names, Up:
Namespaces
+
+15.1 Standard 'awk''s Single Namespace
+======================================
+
+In standard 'awk', there is a single, global, "namespace". This means
+that _all_ function names and global variable names must be unique. For
+example, two different 'awk' source files cannot both define a function
+named 'min()', or define an array named 'data'.
+
+ This situation is okay when programs are small, say a few hundred
+lines, or even a few thousand, but it prevents the development of
+reusable libraries of 'awk' functions, and can inadvertently cause
+independently-developed library files to accidentally step on each
+other's "private" global variables (*note Library Names::).
+
+ Most other programming languages solve this issue by providing some
+kind of namespace control: a way to say "this function is in namespace
+XXX, and that function is in namespace YYY." (Of course, there is then
+still a single namespace for the namespaces, but the hope is that there
+are much fewer namespaces in use by any given program, and thus much
+less chance for collisions.) These facilities are sometimes referred to
+as "packages" or "modules".
+
+ Starting with version *FIXME* 5.0, 'gawk' provides a mechanism to put
+functions and global variables into separate namespaces.
+
+
+File: gawk.info, Node: Qualified Names, Next: Default Namespace, Prev:
Global Namespace, Up: Namespaces
+
+15.2 Qualified Names
+====================
+
+A "qualified name" is an identifier that includes a namespace name and
+the namespace separator, '::'. For example, one might have a function
+named 'posix::getpid()'. Here, the namespace is 'posix' and the
+function name within the namespace is 'getpid()'. The namespace and
+variable or function name are separated by a double-colon. Only one
+such separator is allowed in a qualified name.
+
+ NOTE: Unlike C++, the '::' is _not_ an operator. No spaces are
+ allowed between the namespace name, the '::', and the rest of the
+ name.
+
+ You must use fully qualified names from one namespace to access
+variables and functions in another. This is especially important when
+using variable names to index the special 'SYMTAB' array (*note
+Auto-set::), and when making indirect function calls (*note Indirect
+Calls::).
+
+ It is a syntax error to use any 'gawk' reserved word (such as 'if' or
+'for'), or the name of any built-in function (such as 'sin()' or
+'gsub()') as the second part of a fully qualified name. Using such an
+identifier as a namespace name (currently) _is_ allowed, but produces a
+lint warning.
+
+ 'gawk' pre-defined variable names may be used: 'NF::NR' is valid, if
+possibly not all that useful.
+
+
+File: gawk.info, Node: Default Namespace, Next: Changing The Namespace,
Prev: Qualified Names, Up: Namespaces
+
+15.3 The Default Namespace
+==========================
+
+The default namespace, not surprisingly, is 'awk'. All of the
+predefined 'awk' and 'gawk' variables are in this namespace, and thus
+have qualified names like 'awk::ARGC', 'awk::NF', and so on.
+
+ Furthermore, even when you have changed the namespace for your
+current source file (*note Changing The Namespace::), 'gawk' forces
+unqualified identifiers whose names are all uppercase letters to be in
+the 'awk' namespace. This makes it possible for you to easily reference
+'gawk''s global variables from different namespaces.
+
+ It is a syntax error to use qualified names for function parameter
+names.
+
+
+File: gawk.info, Node: Changing The Namespace, Next: Internal Name
Management, Prev: Default Namespace, Up: Namespaces
+
+15.4 Changing The Namespace
+===========================
+
+In order to set the current namespace, use an '@namespace' directive at
+the top level of your program:
+
+ @namespace "passwd"
+
+ BEGIN { ... }
+ ...
+
+ After this directive, all simple non-completely-uppercase identifiers
+are placed into the 'passwd' namespace.
+
+ You can change the namespace multiple times within a single source
+file, although this is likely to become confusing if you do it too much.
+
+ NOTE: Association of unqualified identifiers to a namespace is
+ handled while your program is being parsed by 'gawk' and before it
+ starts to run. There is no concept of a "current" namespace once
+ your program starts executing. Be sure you understand this.
+
+ Each source file for '-i' and '-f' starts out with an implicit
+'@namespace "awk"'. Similarly, each chunk of command-line code supplied
+with '-e' has such an implicit initial statement (*note Options::).
+
+ The use of '@namespace' has no influence upon the order of execution
+of 'BEGIN', 'BEGINFILE', 'END', and 'ENDFILE' rules.
+
+
+File: gawk.info, Node: Internal Name Management, Next: Namespace Example,
Prev: Changing The Namespace, Up: Namespaces
+
+15.5 Internal Name Management
+=============================
+
+For backwards compatibility, all identifiers in the 'awk' namespace are
+stored internally as unadorned identifiers. This is mainly relevant
+when using such identifiers as indices for 'SYMTAB', 'FUNCTAB', and
+'PROCINFO["identifiers"]' (*note Auto-set::), and for use in indirect
+function calls (*note Indirect Calls::).
+
+ In program code, to refer to variables and functions in the 'awk'
+namespace from another namespace, you must still use the 'awk::' prefix.
+For example:
+
+ @namespace "awk" This is the default namespace
+
+ BEGIN {
+ Title = "My Report" Fully qualified name is awk::Title
+ }
+
+ @namespace "report" Now in report namespace
+
+ function compute() This is really report::compute()
+ {
+ print awk::Title But would be SYMTAB["Title"]
+ ...
+ }
+
+
+File: gawk.info, Node: Namespace Example, Next: Namespace Misc, Prev:
Internal Name Management, Up: Namespaces
+
+15.6 Namespace Example
+======================
+
+ # FIXME: fix this up for real, dates etc
+ #
+ # passwd.awk --- access password file information
+ #
+ # Arnold Robbins, address@hidden, Public Domain
+ # May 1993
+ # Revised October 2000
+ # Revised December 2010
+ #
+ # Reworked for namespaces May 2017
+
+ @namespace "passwd"
+
+ BEGIN {
+ # tailor this to suit your system
+ Awklib = "/usr/local/libexec/awk/"
+ }
+
+ function Init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
+ {
+ if (Inited)
+ return
+
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+ using_fpat = (PROCINFO["FS"] == "FPAT")
+ FS = ":"
+ RS = "\n"
+
+ pwcat = Awklib "pwcat"
+ while ((pwcat | getline) > 0) {
+ Byname[$1] = $0
+ Byuid[$3] = $0
+ Bycount[++Total] = $0
+ }
+ close(pwcat)
+ Count = 0
+ Inited = 1
+ FS = oldfs
+ if (using_fw)
+ FIELDWIDTHS = FIELDWIDTHS
+ else if (using_fpat)
+ FPAT = FPAT
+ RS = oldrs
+ $0 = olddol0
+ }
+
+ function Getpwnam(name)
+ {
+ Init()
+ return Byname[name]
+ }
+
+ function Getpwuid(uid)
+ {
+ Init()
+ return Byuid[uid]
+ }
+
+ function Getpwent()
+ {
+ Init()
+ if (Count < Total)
+ return Bycount[++Count]
+ return ""
+ }
+
+ function Endpwent()
+ {
+ Count = 0
+ }
+
+ # Compatibility:
+
+ @namespace "awk"
+
+ function getpwnam(name)
+ {
+ return passwd::Getpwnam(name)
+ }
+
+ function getpwuid(uid)
+ {
+ return passwd::Getpwuid(uid)
+ }
+
+ function getpwent()
+ {
+ return passwd::Getpwent()
+ }
+
+ function endpwent()
+ {
+ passwd::Endpwent()
+ }
+
+
+File: gawk.info, Node: Namespace Misc, Prev: Namespace Example, Up:
Namespaces
+
+15.7 Miscellaneous Notes
+========================
+
+Other notes for reviewers:
+
+Profiler:
+ When profiling, we include the namespace in the 'Op_Rule' and
+ 'Op_Func' instructions. If the namespace is different from the
+ previous one, output an '@namespace' statement. For each
+ identifier, if it starts with the current namespace, output only
+ the simple part.
+
+Debugger:
+ Simply print fully qualified names all the time. Maybe allow a
+ 'namespace XXX' command in the debugger to set the namespace and it
+ will use that to create fully qualified names? Have to be careful
+ about all uppercase names though.
+
+How does this affect '@include'?
+ Basically '@include' should push and pop the namespace. Each
+ '@include' saves the current namespace and starts over with
+ namespace 'awk' until an '@namespace' is seen.
+
+Extension functions
+ Revise the current macros to pass '"awk"' as the namespace argument
+ and add new macros with '_ns' or some such in the name that pass
+ the namespace of the extension. This preserves backwards
+ compatibility at the source level while providing access to
+ namespaces as needed.
+
+ Actually, since we've decided that 'awk' namespace variables and
+ function are stored unadorned, the current macros that pass '""'
+ would continue to work. Internally, we need to recognize '"awk"'
+ and _not_ fully qualify the name before storing it in the symbol
+ table.
+
+
+File: gawk.info, Node: Arbitrary Precision Arithmetic, Next: Dynamic
Extensions, Prev: Namespaces, Up: Top
16 Arithmetic and Arbitrary-Precision Arithmetic with 'gawk'
************************************************************
@@ -36379,269 +36378,269 @@ Node: Walking Arrays700426
Node: Library Functions Summary703434
Node: Library Exercises704840
Node: Sample Programs705305
-Node: Running Examples706068
-Node: Clones706796
-Node: Cut Program708020
-Node: Egrep Program717949
-Ref: Egrep Program-Footnote-1725461
-Node: Id Program725571
-Node: Split Program729251
-Ref: Split Program-Footnote-1732710
-Node: Tee Program732839
-Node: Uniq Program735629
-Node: Wc Program743055
-Ref: Wc Program-Footnote-1747310
-Node: Miscellaneous Programs747404
-Node: Dupword Program748617
-Node: Alarm Program750647
-Node: Translate Program755502
-Ref: Translate Program-Footnote-1760067
-Node: Labels Program760337
-Ref: Labels Program-Footnote-1763688
-Node: Word Sorting763772
-Node: History Sorting767844
-Node: Extract Program769679
-Node: Simple Sed777208
-Node: Igawk Program780282
-Ref: Igawk Program-Footnote-1794613
-Ref: Igawk Program-Footnote-2794815
-Ref: Igawk Program-Footnote-3794937
-Node: Anagram Program795052
-Node: Signature Program798114
-Node: Programs Summary799361
-Node: Programs Exercises800575
-Ref: Programs Exercises-Footnote-1804704
-Node: Namespaces804795
-Node: Global Namespace805475
-Node: Qualified Names806822
-Node: Default Namespace808202
-Node: Changing The Namespace808979
-Node: Internal Name Management810191
-Node: Namespace Example811212
-Node: Namespace Misc813281
-Node: Advanced Features814923
-Node: Nondecimal Data816908
-Node: Array Sorting818499
-Node: Controlling Array Traversal819199
-Ref: Controlling Array Traversal-Footnote-1827566
-Node: Array Sorting Functions827684
-Ref: Array Sorting Functions-Footnote-1832775
-Node: Two-way I/O832971
-Ref: Two-way I/O-Footnote-1839522
-Ref: Two-way I/O-Footnote-2839709
-Node: TCP/IP Networking839791
-Node: Profiling842909
-Ref: Profiling-Footnote-1851581
-Node: Advanced Features Summary851904
-Node: Internationalization853748
-Node: I18N and L10N855228
-Node: Explaining gettext855915
-Ref: Explaining gettext-Footnote-1861807
-Ref: Explaining gettext-Footnote-2861992
-Node: Programmer i18n862157
-Ref: Programmer i18n-Footnote-1867106
-Node: Translator i18n867155
-Node: String Extraction867949
-Ref: String Extraction-Footnote-1869081
-Node: Printf Ordering869167
-Ref: Printf Ordering-Footnote-1871953
-Node: I18N Portability872017
-Ref: I18N Portability-Footnote-1874473
-Node: I18N Example874536
-Ref: I18N Example-Footnote-1877342
-Node: Gawk I18N877415
-Node: I18N Summary878060
-Node: Debugger879401
-Node: Debugging880423
-Node: Debugging Concepts880864
-Node: Debugging Terms882673
-Node: Awk Debugging885248
-Node: Sample Debugging Session886154
-Node: Debugger Invocation886688
-Node: Finding The Bug888074
-Node: List of Debugger Commands894552
-Node: Breakpoint Control895885
-Node: Debugger Execution Control899579
-Node: Viewing And Changing Data902941
-Node: Execution Stack906315
-Node: Debugger Info907952
-Node: Miscellaneous Debugger Commands912023
-Node: Readline Support917111
-Node: Limitations918007
-Node: Debugging Summary920116
-Node: Arbitrary Precision Arithmetic921395
-Node: Computer Arithmetic922880
-Ref: table-numeric-ranges926471
-Ref: Computer Arithmetic-Footnote-1927193
-Node: Math Definitions927250
-Ref: table-ieee-formats930564
-Ref: Math Definitions-Footnote-1931167
-Node: MPFR features931272
-Node: FP Math Caution932989
-Ref: FP Math Caution-Footnote-1934061
-Node: Inexactness of computations934430
-Node: Inexact representation935390
-Node: Comparing FP Values936750
-Node: Errors accumulate937832
-Node: Getting Accuracy939265
-Node: Try To Round941975
-Node: Setting precision942874
-Ref: table-predefined-precision-strings943571
-Node: Setting the rounding mode945401
-Ref: table-gawk-rounding-modes945775
-Ref: Setting the rounding mode-Footnote-1949183
-Node: Arbitrary Precision Integers949362
-Ref: Arbitrary Precision Integers-Footnote-1954267
-Node: Checking for MPFR954416
-Node: POSIX Floating Point Problems955713
-Ref: POSIX Floating Point Problems-Footnote-1959584
-Node: Floating point summary959622
-Node: Dynamic Extensions961812
-Node: Extension Intro963365
-Node: Plugin License964631
-Node: Extension Mechanism Outline965428
-Ref: figure-load-extension965867
-Ref: figure-register-new-function967432
-Ref: figure-call-new-function968524
-Node: Extension API Description970586
-Node: Extension API Functions Introduction972228
-Node: General Data Types977562
-Ref: General Data Types-Footnote-1984767
-Node: Memory Allocation Functions985066
-Ref: Memory Allocation Functions-Footnote-1988218
-Node: Constructor Functions988317
-Node: Registration Functions991316
-Node: Extension Functions992001
-Node: Exit Callback Functions997214
-Node: Extension Version String998464
-Node: Input Parsers999127
-Node: Output Wrappers1011834
-Node: Two-way processors1016346
-Node: Printing Messages1018611
-Ref: Printing Messages-Footnote-11019782
-Node: Updating ERRNO1019935
-Node: Requesting Values1020674
-Ref: table-value-types-returned1021411
-Node: Accessing Parameters1022347
-Node: Symbol Table Access1023582
-Node: Symbol table by name1024094
-Node: Symbol table by cookie1025883
-Ref: Symbol table by cookie-Footnote-11030068
-Node: Cached values1030132
-Ref: Cached values-Footnote-11033668
-Node: Array Manipulation1033759
-Ref: Array Manipulation-Footnote-11034850
-Node: Array Data Types1034887
-Ref: Array Data Types-Footnote-11037545
-Node: Array Functions1037637
-Node: Flattening Arrays1042036
-Node: Creating Arrays1048977
-Node: Redirection API1053746
-Node: Extension API Variables1056588
-Node: Extension Versioning1057221
-Ref: gawk-api-version1057658
-Node: Extension API Informational Variables1059386
-Node: Extension API Boilerplate1060450
-Node: Changes from API V11064312
-Node: Finding Extensions1064972
-Node: Extension Example1065531
-Node: Internal File Description1066329
-Node: Internal File Ops1070409
-Ref: Internal File Ops-Footnote-11081809
-Node: Using Internal File Ops1081949
-Ref: Using Internal File Ops-Footnote-11084332
-Node: Extension Samples1084606
-Node: Extension Sample File Functions1086135
-Node: Extension Sample Fnmatch1093784
-Node: Extension Sample Fork1095271
-Node: Extension Sample Inplace1096489
-Node: Extension Sample Ord1099706
-Node: Extension Sample Readdir1100542
-Ref: table-readdir-file-types1101431
-Node: Extension Sample Revout1102236
-Node: Extension Sample Rev2way1102825
-Node: Extension Sample Read write array1103565
-Node: Extension Sample Readfile1105507
-Node: Extension Sample Time1106602
-Node: Extension Sample API Tests1107950
-Node: gawkextlib1108442
-Node: Extension summary1110889
-Node: Extension Exercises1114591
-Node: Language History1116089
-Node: V7/SVR3.11117745
-Node: SVR41119897
-Node: POSIX1121331
-Node: BTL1122710
-Node: POSIX/GNU1123439
-Node: Feature History1129331
-Node: Common Extensions1143755
-Node: Ranges and Locales1145038
-Ref: Ranges and Locales-Footnote-11149654
-Ref: Ranges and Locales-Footnote-21149681
-Ref: Ranges and Locales-Footnote-31149916
-Node: Contributors1150137
-Node: History summary1155697
-Node: Installation1157077
-Node: Gawk Distribution1158021
-Node: Getting1158505
-Node: Extracting1159466
-Node: Distribution contents1161104
-Node: Unix Installation1167446
-Node: Quick Installation1168128
-Node: Shell Startup Files1170542
-Node: Additional Configuration Options1171631
-Node: Configuration Philosophy1173620
-Node: Non-Unix Installation1175989
-Node: PC Installation1176449
-Node: PC Binary Installation1177287
-Node: PC Compiling1177722
-Node: PC Using1178839
-Node: Cygwin1181884
-Node: MSYS1182654
-Node: VMS Installation1183155
-Node: VMS Compilation1183946
-Ref: VMS Compilation-Footnote-11185175
-Node: VMS Dynamic Extensions1185233
-Node: VMS Installation Details1186918
-Node: VMS Running1189171
-Node: VMS GNV1193450
-Node: VMS Old Gawk1194185
-Node: Bugs1194656
-Node: Bug address1195319
-Node: Usenet1197716
-Node: Maintainers1198493
-Node: Other Versions1199869
-Node: Installation summary1206453
-Node: Notes1207488
-Node: Compatibility Mode1208353
-Node: Additions1209135
-Node: Accessing The Source1210060
-Node: Adding Code1211495
-Node: New Ports1217713
-Node: Derived Files1222201
-Ref: Derived Files-Footnote-11227686
-Ref: Derived Files-Footnote-21227721
-Ref: Derived Files-Footnote-31228319
-Node: Future Extensions1228433
-Node: Implementation Limitations1229091
-Node: Extension Design1230274
-Node: Old Extension Problems1231428
-Ref: Old Extension Problems-Footnote-11232946
-Node: Extension New Mechanism Goals1233003
-Ref: Extension New Mechanism Goals-Footnote-11236367
-Node: Extension Other Design Decisions1236556
-Node: Extension Future Growth1238669
-Node: Old Extension Mechanism1239505
-Node: Notes summary1241268
-Node: Basic Concepts1242450
-Node: Basic High Level1243131
-Ref: figure-general-flow1243413
-Ref: figure-process-flow1244098
-Ref: Basic High Level-Footnote-11247399
-Node: Basic Data Typing1247584
-Node: Glossary1250912
-Node: Copying1282859
-Node: GNU Free Documentation License1320398
-Node: Index1345516
+Node: Running Examples706075
+Node: Clones706803
+Node: Cut Program708027
+Node: Egrep Program717956
+Ref: Egrep Program-Footnote-1725468
+Node: Id Program725578
+Node: Split Program729258
+Ref: Split Program-Footnote-1732717
+Node: Tee Program732846
+Node: Uniq Program735636
+Node: Wc Program743062
+Ref: Wc Program-Footnote-1747317
+Node: Miscellaneous Programs747411
+Node: Dupword Program748624
+Node: Alarm Program750654
+Node: Translate Program755509
+Ref: Translate Program-Footnote-1760074
+Node: Labels Program760344
+Ref: Labels Program-Footnote-1763695
+Node: Word Sorting763779
+Node: History Sorting767851
+Node: Extract Program769686
+Node: Simple Sed777215
+Node: Igawk Program780289
+Ref: Igawk Program-Footnote-1794620
+Ref: Igawk Program-Footnote-2794822
+Ref: Igawk Program-Footnote-3794944
+Node: Anagram Program795059
+Node: Signature Program798121
+Node: Programs Summary799368
+Node: Programs Exercises800582
+Ref: Programs Exercises-Footnote-1804711
+Node: Advanced Features804802
+Node: Nondecimal Data806792
+Node: Array Sorting808383
+Node: Controlling Array Traversal809083
+Ref: Controlling Array Traversal-Footnote-1817450
+Node: Array Sorting Functions817568
+Ref: Array Sorting Functions-Footnote-1822659
+Node: Two-way I/O822855
+Ref: Two-way I/O-Footnote-1829406
+Ref: Two-way I/O-Footnote-2829593
+Node: TCP/IP Networking829675
+Node: Profiling832793
+Ref: Profiling-Footnote-1841465
+Node: Advanced Features Summary841788
+Node: Internationalization843632
+Node: I18N and L10N845112
+Node: Explaining gettext845799
+Ref: Explaining gettext-Footnote-1851691
+Ref: Explaining gettext-Footnote-2851876
+Node: Programmer i18n852041
+Ref: Programmer i18n-Footnote-1856990
+Node: Translator i18n857039
+Node: String Extraction857833
+Ref: String Extraction-Footnote-1858965
+Node: Printf Ordering859051
+Ref: Printf Ordering-Footnote-1861837
+Node: I18N Portability861901
+Ref: I18N Portability-Footnote-1864357
+Node: I18N Example864420
+Ref: I18N Example-Footnote-1867226
+Node: Gawk I18N867299
+Node: I18N Summary867944
+Node: Debugger869285
+Node: Debugging870287
+Node: Debugging Concepts870728
+Node: Debugging Terms872537
+Node: Awk Debugging875112
+Node: Sample Debugging Session876018
+Node: Debugger Invocation876552
+Node: Finding The Bug877938
+Node: List of Debugger Commands884416
+Node: Breakpoint Control885749
+Node: Debugger Execution Control889443
+Node: Viewing And Changing Data892805
+Node: Execution Stack896179
+Node: Debugger Info897816
+Node: Miscellaneous Debugger Commands901887
+Node: Readline Support906975
+Node: Limitations907871
+Node: Debugging Summary909980
+Node: Namespaces911259
+Node: Global Namespace911945
+Node: Qualified Names913292
+Node: Default Namespace914672
+Node: Changing The Namespace915449
+Node: Internal Name Management916661
+Node: Namespace Example917682
+Node: Namespace Misc919751
+Node: Arbitrary Precision Arithmetic921312
+Node: Computer Arithmetic922799
+Ref: table-numeric-ranges926390
+Ref: Computer Arithmetic-Footnote-1927112
+Node: Math Definitions927169
+Ref: table-ieee-formats930483
+Ref: Math Definitions-Footnote-1931086
+Node: MPFR features931191
+Node: FP Math Caution932908
+Ref: FP Math Caution-Footnote-1933980
+Node: Inexactness of computations934349
+Node: Inexact representation935309
+Node: Comparing FP Values936669
+Node: Errors accumulate937751
+Node: Getting Accuracy939184
+Node: Try To Round941894
+Node: Setting precision942793
+Ref: table-predefined-precision-strings943490
+Node: Setting the rounding mode945320
+Ref: table-gawk-rounding-modes945694
+Ref: Setting the rounding mode-Footnote-1949102
+Node: Arbitrary Precision Integers949281
+Ref: Arbitrary Precision Integers-Footnote-1954186
+Node: Checking for MPFR954335
+Node: POSIX Floating Point Problems955632
+Ref: POSIX Floating Point Problems-Footnote-1959503
+Node: Floating point summary959541
+Node: Dynamic Extensions961731
+Node: Extension Intro963284
+Node: Plugin License964550
+Node: Extension Mechanism Outline965347
+Ref: figure-load-extension965786
+Ref: figure-register-new-function967351
+Ref: figure-call-new-function968443
+Node: Extension API Description970505
+Node: Extension API Functions Introduction972147
+Node: General Data Types977481
+Ref: General Data Types-Footnote-1984686
+Node: Memory Allocation Functions984985
+Ref: Memory Allocation Functions-Footnote-1988137
+Node: Constructor Functions988236
+Node: Registration Functions991235
+Node: Extension Functions991920
+Node: Exit Callback Functions997133
+Node: Extension Version String998383
+Node: Input Parsers999046
+Node: Output Wrappers1011753
+Node: Two-way processors1016265
+Node: Printing Messages1018530
+Ref: Printing Messages-Footnote-11019701
+Node: Updating ERRNO1019854
+Node: Requesting Values1020593
+Ref: table-value-types-returned1021330
+Node: Accessing Parameters1022266
+Node: Symbol Table Access1023501
+Node: Symbol table by name1024013
+Node: Symbol table by cookie1025802
+Ref: Symbol table by cookie-Footnote-11029987
+Node: Cached values1030051
+Ref: Cached values-Footnote-11033587
+Node: Array Manipulation1033678
+Ref: Array Manipulation-Footnote-11034769
+Node: Array Data Types1034806
+Ref: Array Data Types-Footnote-11037464
+Node: Array Functions1037556
+Node: Flattening Arrays1041955
+Node: Creating Arrays1048896
+Node: Redirection API1053665
+Node: Extension API Variables1056507
+Node: Extension Versioning1057140
+Ref: gawk-api-version1057577
+Node: Extension API Informational Variables1059305
+Node: Extension API Boilerplate1060369
+Node: Changes from API V11064231
+Node: Finding Extensions1064891
+Node: Extension Example1065450
+Node: Internal File Description1066248
+Node: Internal File Ops1070328
+Ref: Internal File Ops-Footnote-11081728
+Node: Using Internal File Ops1081868
+Ref: Using Internal File Ops-Footnote-11084251
+Node: Extension Samples1084525
+Node: Extension Sample File Functions1086054
+Node: Extension Sample Fnmatch1093703
+Node: Extension Sample Fork1095190
+Node: Extension Sample Inplace1096408
+Node: Extension Sample Ord1099625
+Node: Extension Sample Readdir1100461
+Ref: table-readdir-file-types1101350
+Node: Extension Sample Revout1102155
+Node: Extension Sample Rev2way1102744
+Node: Extension Sample Read write array1103484
+Node: Extension Sample Readfile1105426
+Node: Extension Sample Time1106521
+Node: Extension Sample API Tests1107869
+Node: gawkextlib1108361
+Node: Extension summary1110808
+Node: Extension Exercises1114510
+Node: Language History1116008
+Node: V7/SVR3.11117664
+Node: SVR41119816
+Node: POSIX1121250
+Node: BTL1122629
+Node: POSIX/GNU1123358
+Node: Feature History1129250
+Node: Common Extensions1143674
+Node: Ranges and Locales1144957
+Ref: Ranges and Locales-Footnote-11149573
+Ref: Ranges and Locales-Footnote-21149600
+Ref: Ranges and Locales-Footnote-31149835
+Node: Contributors1150056
+Node: History summary1155616
+Node: Installation1156996
+Node: Gawk Distribution1157940
+Node: Getting1158424
+Node: Extracting1159385
+Node: Distribution contents1161023
+Node: Unix Installation1167365
+Node: Quick Installation1168047
+Node: Shell Startup Files1170461
+Node: Additional Configuration Options1171550
+Node: Configuration Philosophy1173539
+Node: Non-Unix Installation1175908
+Node: PC Installation1176368
+Node: PC Binary Installation1177206
+Node: PC Compiling1177641
+Node: PC Using1178758
+Node: Cygwin1181803
+Node: MSYS1182573
+Node: VMS Installation1183074
+Node: VMS Compilation1183865
+Ref: VMS Compilation-Footnote-11185094
+Node: VMS Dynamic Extensions1185152
+Node: VMS Installation Details1186837
+Node: VMS Running1189090
+Node: VMS GNV1193369
+Node: VMS Old Gawk1194104
+Node: Bugs1194575
+Node: Bug address1195238
+Node: Usenet1197635
+Node: Maintainers1198412
+Node: Other Versions1199788
+Node: Installation summary1206372
+Node: Notes1207407
+Node: Compatibility Mode1208272
+Node: Additions1209054
+Node: Accessing The Source1209979
+Node: Adding Code1211414
+Node: New Ports1217632
+Node: Derived Files1222120
+Ref: Derived Files-Footnote-11227605
+Ref: Derived Files-Footnote-21227640
+Ref: Derived Files-Footnote-31228238
+Node: Future Extensions1228352
+Node: Implementation Limitations1229010
+Node: Extension Design1230193
+Node: Old Extension Problems1231347
+Ref: Old Extension Problems-Footnote-11232865
+Node: Extension New Mechanism Goals1232922
+Ref: Extension New Mechanism Goals-Footnote-11236286
+Node: Extension Other Design Decisions1236475
+Node: Extension Future Growth1238588
+Node: Old Extension Mechanism1239424
+Node: Notes summary1241187
+Node: Basic Concepts1242369
+Node: Basic High Level1243050
+Ref: figure-general-flow1243332
+Ref: figure-process-flow1244017
+Ref: Basic High Level-Footnote-11247318
+Node: Basic Data Typing1247503
+Node: Glossary1250831
+Node: Copying1282778
+Node: GNU Free Documentation License1320317
+Node: Index1345435
End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index 434e7ff..60ab6e0 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -454,12 +454,12 @@ particular records in a file and perform operations upon
them.
* Library Functions:: A Library of @command{awk} Functions.
* Sample Programs:: Many @command{awk} programs with complete
explanations.
-* Namespaces:: How namespaces work in @command{gawk}.
* Advanced Features:: Stuff for advanced users, specific to
@command{gawk}.
* Internationalization:: Getting @command{gawk} to speak your
language.
* Debugger:: The @command{gawk} debugger.
+* Namespaces:: How namespaces work in @command{gawk}.
* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
@command{gawk}.
* Dynamic Extensions:: Adding new built-in functions to
@@ -852,12 +852,6 @@ particular records in a file and perform operations upon
them.
time on their hands.
* Programs Summary:: Summary of programs.
* Programs Exercises:: Exercises.
-* Global Namespace:: The global namespace in standard
@command{awk}.
-* Qualified Names:: How to qualify names with a namespace.
-* Default Namespace:: The default namespace.
-* Changing The Namespace:: How to change the namespace.
-* Namespace Example:: An example of code using a namespace.
-* Namespace Misc:: Namespace notes for developers.
* Nondecimal Data:: Allowing nondecimal input data.
* Array Sorting:: Facilities for controlling array
traversal and sorting arrays.
@@ -901,6 +895,12 @@ particular records in a file and perform operations upon
them.
* Readline Support:: Readline support.
* Limitations:: Limitations and future plans.
* Debugging Summary:: Debugging summary.
+* Global Namespace:: The global namespace in standard
@command{awk}.
+* Qualified Names:: How to qualify names with a namespace.
+* Default Namespace:: The default namespace.
+* Changing The Namespace:: How to change the namespace.
+* Namespace Example:: An example of code using a namespace.
+* Namespace Misc:: Namespace notes for developers.
* Computer Arithmetic:: A quick intro to computer math.
* Math Definitions:: Defining terms used.
* MPFR features:: The MPFR features in @command{gawk}.
@@ -27607,1349 +27607,1055 @@ It contains the following chapters:
@end itemize
@end ifdocbook
address@hidden name-spaces Name-space Name-spaces}
address@hidden Namespaces
address@hidden Namespaces in @command{gawk}
-
-This @value{CHAPTER} describes a feature that is specific to @command{gawk}.
-
address@hidden
-* Global Namespace:: The global namespace in standard @command{awk}.
-* Qualified Names:: How to qualify names with a namespace.
-* Default Namespace:: The default namespace.
-* Changing The Namespace:: How to change the namespace.
-* Internal Name Management:: How names are stored internally.
-* Namespace Example:: An example of code using a namespace.
-* Namespace Misc:: Namespace notes for developers.
address@hidden menu
-
address@hidden Global Namespace
address@hidden Standard @command{awk}'s Single Namespace
-
-In standard @command{awk}, there is a single, global, @dfn{namespace}.
-This means that @emph{all} function names and global variable names must
-be unique. For example, two different @command{awk} source files cannot
-both define a function named @code{min()}, or define an array named
@code{data}.
address@hidden Advanced Features
address@hidden Advanced Features of @command{gawk}
address@hidden @command{gawk}, features, advanced
address@hidden advanced features, @command{gawk}
address@hidden
+Contributed by: Peter Langston <address@hidden>
-This situation is okay when programs are small, say a few hundred
-lines, or even a few thousand, but it prevents the development of
-reusable libraries of @command{awk} functions, and can inadvertently
-cause independently-developed library files to accidentally step on each
-other's ``private'' global variables
-(@pxref{Library Names}).
+ Found in Steve English's "signature" line:
-Most other programming languages solve this issue by providing some kind
-of namespace control: a way to say ``this function is in namespace @var{xxx},
-and that function is in namespace @var{yyy}.'' (Of course, there is then
-still a single namespace for the namespaces, but the hope is that there
-are much fewer namespaces in use by any given program, and thus much
-less chance for collisions.) These facilities are sometimes referred
-to as @dfn{packages} or @dfn{modules}.
+"Write documentation as if whoever reads it is a violent psychopath
+who knows where you live."
address@hidden ignore
address@hidden Langston, Peter
address@hidden English, Steve
address@hidden
address@hidden documentation as if whoever reads it is
+a violent psychopath who knows where you live.}
address@hidden Steve English, as quoted by Peter Langston
address@hidden quotation
-Starting with @value{PVERSION} @strong{FIXME} 5.0, @command{gawk} provides a
-mechanism to put functions and global variables into separate namespaces.
+This @value{CHAPTER} discusses advanced features in @command{gawk}.
+It's a bit of a ``grab bag'' of items that are otherwise unrelated
+to each other.
+First, we look at a command-line option that allows @command{gawk} to recognize
+nondecimal numbers in input data, not just in @command{awk}
+programs.
+Then, @command{gawk}'s special features for sorting arrays are presented.
+Next, two-way I/O, discussed briefly in earlier parts of this
address@hidden, is described in full detail, along with the basics
+of TCP/IP networking. Finally, we see how @command{gawk}
+can @dfn{profile} an @command{awk} program, making it possible to tune
+it for performance.
address@hidden Qualified Names
address@hidden Qualified Names
address@hidden FULLXREF ON
+Additional advanced features are discussed in separate @value{CHAPTER}s of
their
+own:
-A @dfn{qualified name} is an identifier that includes a namespace
-name and the namespace separator, @code{::}. For example, one
-might have a function named @code{posix::getpid()}. Here, the
-namespace is @code{posix} and the function name within the namespace
-is @code{getpid()}. The namespace and variable or function name are
-separated by a double-colon. Only one such separator is allowed in a
-qualified name.
address@hidden @value{BULLET}
address@hidden
address@hidden, discusses how to internationalize
+your @command{awk} programs, so that they can speak multiple
+national languages.
address@hidden NOTE
-Unlike C++, the @code{::} is @emph{not} an operator. No spaces are
-allowed between the namespace name, the @code{::}, and the rest of
-the name.
address@hidden quotation
address@hidden
address@hidden, describes @command{gawk}'s built-in command-line
+debugger for debugging @command{awk} programs.
-You must use fully qualified names from one namespace to access variables
-and functions in another. This is especially important when using
-variable names to index the special @code{SYMTAB} array (@pxref{Auto-set}),
-and when making indirect function calls (@pxref{Indirect Calls}).
address@hidden
address@hidden Precision Arithmetic}, describes how you can use
address@hidden to perform arbitrary-precision arithmetic.
-It is a syntax error to use any @command{gawk} reserved word (such
-as @code{if} or @code{for}), or the name of any built-in function
-(such as @code{sin()} or @code{gsub()}) as the second part of a
-fully qualified name. Using such an identifier as a namespace
-name (currently) @emph{is} allowed, but produces a lint warning.
address@hidden
address@hidden Extensions},
+discusses the ability to dynamically add new built-in functions to
address@hidden
address@hidden itemize
address@hidden FULLXREF OFF
address@hidden pre-defined variable names may be used:
address@hidden::NR} is valid, if possibly not all that useful.
address@hidden
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array traversal and
+ sorting arrays.
+* Two-way I/O:: Two-way communications with another process.
+* TCP/IP Networking:: Using @command{gawk} for network programming.
+* Profiling:: Profiling your @command{awk} programs.
+* Advanced Features Summary:: Summary of advanced features.
address@hidden menu
address@hidden Default Namespace
address@hidden The Default Namespace
address@hidden Nondecimal Data
address@hidden Allowing Nondecimal Input Data
address@hidden @option{--non-decimal-data} option
address@hidden advanced features, nondecimal input data
address@hidden input, address@hidden nondecimal
address@hidden constants, nondecimal
-The default namespace, not surprisingly, is @samp{awk}.
-All of the predefined @command{awk} and @command{gawk} variables
-are in this namespace, and thus have qualified names like
address@hidden::ARGC}, @code{awk::NF}, and so on.
+If you run @command{gawk} with the @option{--non-decimal-data} option,
+you can have nondecimal values in your input data:
-Furthermore, even when you have changed the namespace for your
-current source file (@pxref{Changing The Namespace}), @command{gawk}
-forces unqualified identifiers whose names are all uppercase letters
-to be in the @samp{awk} namespace. This makes it possible for you to easily
-reference @command{gawk}'s global variables from different namespaces.
address@hidden
+$ @kbd{echo 0123 123 0x123 |}
+> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n", $1, $2, $3 @}'}
address@hidden 83, 123, 291
address@hidden example
-It is a syntax error to use qualified names for function parameter names.
+For this feature to work, write your program so that
address@hidden treats your data as numeric:
address@hidden Changing The Namespace
address@hidden Changing The Namespace
address@hidden
+$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
address@hidden 0123 123 0x123
address@hidden example
-In order to set the current namespace, use an @samp{@@namespace} directive
-at the top level of your program:
address@hidden
+The @code{print} statement treats its expressions as strings.
+Although the fields can act as numbers when necessary,
+they are still strings, so @code{print} does not try to treat them
+numerically. You need to add zero to a field to force it to
+be treated as a number. For example:
@example
-@@namespace "passwd"
-
-BEGIN @{ @dots{} @}
address@hidden
+$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
+> @address@hidden print $1, $2, $3}
+> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
address@hidden 0123 123 0x123
address@hidden 83 123 291
@end example
-After this directive, all simple non-completely-uppercase identifiers are
-placed into the @code{passwd} namespace.
+Because it is common to have decimal data with leading zeros, and because
+using this facility could lead to surprising results, the default is to leave
it
+disabled. If you want it, you must explicitly request it.
-You can change the namespace multiple times within a single
-source file, although this is likely to become confusing if you
-do it too much.
address@hidden programming conventions, @code{--non-decimal-data} option
address@hidden @option{--non-decimal-data} option, @code{strtonum()} function
and
address@hidden @code{strtonum()} function (@command{gawk}),
@code{--non-decimal-data} option and
address@hidden CAUTION
address@hidden of this option is not recommended.}
+It can break old programs very badly.
+Instead, use the @code{strtonum()} function to convert your data
+(@pxref{String Functions}).
+This makes your programs easier to write and easier to read, and
+leads to less surprising results.
address@hidden NOTE
-Association of unqualified identifiers to a namespace is handled while
-your program is being parsed by @command{gawk} and before it starts
-to run. There is no concept of a ``current'' namespace once your program
-starts executing. Be sure you understand this.
+This option may disappear in a future version of @command{gawk}.
@end quotation
-Each source file for @option{-i} and @option{-f} starts out with
-an implicit @samp{@@namespace "awk"}. Similarly, each chunk of
-command-line code supplied with @option{-e} has such an implicit
-initial statement (@pxref{Options}).
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
-The use of @samp{@@namespace} has no influence upon the order of execution
-of @code{BEGIN}, @code{BEGINFILE}, @code{END}, and @code{ENDFILE} rules.
address@hidden lets you control the order in which a
address@hidden (@var{indx} in @var{array})}
+loop traverses an array.
address@hidden Internal Name Management
address@hidden Internal Name Management
+In addition, two built-in functions, @code{asort()} and @code{asorti()},
+let you sort arrays based on the array values and indices, respectively.
+These two functions also provide control over the sorting criteria used
+to order the elements during sorting.
-For backwards compatibility, all identifiers in the @samp{awk} namespace
-are stored internally as unadorned identifiers. This is mainly relevant
-when using such identifiers as indices for @code{SYMTAB}, @code{FUNCTAB},
-and @code{PROCINFO["identifiers"]} (@pxref{Auto-set}), and for use in
-indirect function calls (@pxref{Indirect Calls}).
address@hidden
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
address@hidden menu
-In program code, to refer to variables and functions in the @samp{awk}
-namespace from another namespace, you must still use the @samp{awk::}
-prefix. For example:
address@hidden Controlling Array Traversal
address@hidden Controlling Array Traversal
address@hidden
-@@namespace "awk" @ii{This is the default namespace}
+By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
+scans an array is not defined; it is generally based upon
+the internal implementation of arrays inside @command{awk}.
-BEGIN @{
- Title = "My Report" @ii{Fully qualified name is} awk::Title
address@hidden
+Often, though, it is desirable to be able to loop over the elements
+in a particular order that you, the programmer, choose. @command{gawk}
+lets you do this.
-@@namespace "report" @ii{Now in} report @ii{namespace}
address@hidden Scanning} describes how you can assign special,
+predefined values to @code{PROCINFO["sorted_in"]} in order to
+control the order in which @command{gawk} traverses an array
+during a @code{for} loop.
-function compute() @ii{This is really} report::compute()
+In addition, the value of @code{PROCINFO["sorted_in"]} can be a
+function address@hidden is why the predefined sorting orders
+start with an @samp{@@} character, which cannot be part of an identifier.}
+This lets you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function. The comparison function should be defined with at least
+four arguments:
+
address@hidden
+function comp_func(i1, v1, i2, v2)
@{
- print awk::Title @ii{But would be} SYMTAB["Title"]
- @dots{}
+ @var{compare elements 1 and 2 in some fashion}
+ @var{return < 0; 0; or > 0}
@}
@end example
address@hidden Namespace Example
address@hidden Namespace Example
+Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
+are the corresponding values of the two elements being compared.
+Either @code{v1} or @code{v2}, or both, can be arrays if the array being
+traversed contains subarrays as values.
+(@xref{Arrays of Arrays} for more information about subarrays.)
+The three possible return values are interpreted as follows:
+
address@hidden @code
address@hidden comp_func(i1, v1, i2, v2) < 0
+Index @code{i1} comes before index @code{i2} during loop traversal.
+
address@hidden comp_func(i1, v1, i2, v2) == 0
+Indices @code{i1} and @code{i2}
+come together, but the relative order with respect to each other is undefined.
+
address@hidden comp_func(i1, v1, i2, v2) > 0
+Index @code{i1} comes after index @code{i2} during loop traversal.
address@hidden table
+
+Our first comparison function can be used to scan an array in
+numerical order of the indices:
@example
-# FIXME: fix this up for real, dates etc
-#
-# passwd.awk --- access password file information
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-# Revised October 2000
-# Revised December 2010
-#
-# Reworked for namespaces May 2017
+function cmp_num_idx(i1, v1, i2, v2)
address@hidden
+ # numerical index comparison, ascending order
+ return (i1 - i2)
address@hidden
address@hidden example
-@@namespace "passwd"
+Our second function traverses an array based on the string order of
+the element values rather than by indices:
-BEGIN @{
- # tailor this to suit your system
- Awklib = "/usr/local/libexec/awk/"
address@hidden
+function cmp_str_val(i1, v1, i2, v2)
address@hidden
+ # string value comparison, ascending order
+ v1 = v1 ""
+ v2 = v2 ""
+ if (v1 < v2)
+ return -1
+ return (v1 != v2)
@}
address@hidden example
-function Init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
+The third
+comparison function makes all numbers, and numeric strings without
+any leading or trailing spaces, come out first during loop traversal:
+
address@hidden
+function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
@{
- if (Inited)
- return
+ # numbers before string value comparison, ascending order
+ n1 = v1 + 0
+ n2 = v2 + 0
+ if (n1 == v1)
+ return (n2 == v2) ? (n1 - n2) : -1
+ else if (n2 == v2)
+ return 1
+ return (v1 < v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
- oldfs = FS
- oldrs = RS
- olddol0 = $0
- using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
- using_fpat = (PROCINFO["FS"] == "FPAT")
- FS = ":"
- RS = "\n"
+Here is a main program to demonstrate how @command{gawk}
+behaves using each of the previous functions:
- pwcat = Awklib "pwcat"
- while ((pwcat | getline) > 0) @{
- Byname[$1] = $0
- Byuid[$3] = $0
- Bycount[++Total] = $0
address@hidden
+BEGIN @{
+ data["one"] = 10
+ data["two"] = 20
+ data[10] = "one"
+ data[100] = 100
+ data[20] = "two"
+
+ f[1] = "cmp_num_idx"
+ f[2] = "cmp_str_val"
+ f[3] = "cmp_num_str_val"
+ for (i = 1; i <= 3; i++) @{
+ printf("Sort function: %s\n", f[i])
+ PROCINFO["sorted_in"] = f[i]
+ for (j in data)
+ printf("\tdata[%s] = %s\n", j, data[j])
+ print ""
@}
- close(pwcat)
- Count = 0
- Inited = 1
- FS = oldfs
- if (using_fw)
- FIELDWIDTHS = FIELDWIDTHS
- else if (using_fpat)
- FPAT = FPAT
- RS = oldrs
- $0 = olddol0
@}
address@hidden example
-function Getpwnam(name)
address@hidden
- Init()
- return Byname[name]
address@hidden
+Here are the results when the program is run:
-function Getpwuid(uid)
address@hidden
+$ @kbd{gawk -f compdemo.awk}
address@hidden Sort function: cmp_num_idx @ii{Sort by numeric index}
address@hidden data[two] = 20
address@hidden data[one] = 10 @ii{Both strings are numerically
zero}
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden data[100] = 100
address@hidden
address@hidden Sort function: cmp_str_val @ii{Sort by element values as
strings}
address@hidden data[one] = 10
address@hidden data[100] = 100 @ii{String 100 is less than
string 20}
address@hidden data[two] = 20
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden
address@hidden Sort function: cmp_num_str_val @ii{Sort all numeric values
before all strings}
address@hidden data[one] = 10
address@hidden data[two] = 20
address@hidden data[100] = 100
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden example
+
+Consider sorting the entries of a GNU/Linux system password file
+according to login name. The following program sorts records
+by a specific field position and can be used for this purpose:
+
address@hidden
+# passwd-sort.awk --- simple program to sort by field position
+# field position is specified by the global variable POS
+
+function cmp_field(i1, v1, i2, v2)
@{
- Init()
- return Byuid[uid]
+ # comparison by value, as string, and ascending order
+ return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
@}
-function Getpwent()
@{
- Init()
- if (Count < Total)
- return Bycount[++Count]
- return ""
+ for (i = 1; i <= NF; i++)
+ a[NR][i] = $i
@}
-function Endpwent()
address@hidden
- Count = 0
+END @{
+ PROCINFO["sorted_in"] = "cmp_field"
+ if (POS < 1 || POS > NF)
+ POS = 1
+ for (i in a) @{
+ for (j = 1; j <= NF; j++)
+ printf("%s%c", a[i][j], j < NF ? ":" : "")
+ print ""
+ @}
@}
address@hidden example
-# Compatibility:
+The first field in each entry of the password file is the user's login name,
+and the fields are separated by colons.
+Each record defines a subarray,
+with each field as an element in the subarray.
+Running the program produces the
+following output:
-@@namespace "awk"
address@hidden
+$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
address@hidden adm:x:3:4:adm:/var/adm:/sbin/nologin
address@hidden apache:x:48:48:Apache:/var/www:/sbin/nologin
address@hidden avahi:x:70:70:Avahi daemon:/:/sbin/nologin
address@hidden
address@hidden example
-function getpwnam(name)
address@hidden
- return passwd::Getpwnam(name)
address@hidden
+The comparison should normally always return the same value when given a
+specific pair of array elements as its arguments. If inconsistent
+results are returned, then the order is undefined. This behavior can be
+exploited to introduce random order into otherwise seemingly
+ordered data:
-function getpwuid(uid)
address@hidden
+function cmp_randomize(i1, v1, i2, v2)
@{
- return passwd::Getpwuid(uid)
+ # random order (caution: this may never terminate!)
+ return (2 - 4 * rand())
@}
address@hidden example
-function getpwent()
+As already mentioned, the order of the indices is arbitrary if two
+elements compare equal. This is usually not a problem, but letting
+the tied elements come out in arbitrary order can be an issue, especially
+when comparing item values. The partial ordering of the equal elements
+may change the next time the array is traversed, if other elements are added
to or
+removed from the array. One way to resolve ties when comparing elements
+with otherwise equal values is to include the indices in the comparison
+rules. Note that doing this may make the loop traversal less efficient,
+so consider it only if necessary. The following comparison functions
+force a deterministic order, and are based on the fact that the
+(string) indices of two elements are never equal:
+
address@hidden
+function cmp_numeric(i1, v1, i2, v2)
@{
- return passwd::Getpwent()
+ # numerical value (and index) comparison, descending order
+ return (v1 != v2) ? (v2 - v1) : (i2 - i1)
@}
-function endpwent()
+function cmp_string(i1, v1, i2, v2)
@{
- passwd::Endpwent()
+ # string value (and index) comparison, descending order
+ v1 = v1 i1
+ v2 = v2 i2
+ return (v1 > v2) ? -1 : (v1 != v2)
@}
@end example
address@hidden Namespace Misc
address@hidden Miscellaneous Notes
-
-Other notes for reviewers:
-
address@hidden @asis
address@hidden Profiler:
-When profiling, we can add an @code{Op_Namespace} to the start of each
-rule and function definition. If this is different than the previous
-one, output an @samp{@@namespace} statement. For each identifier,
-if it starts with the current namespace, output only the simple part.
-For all @samp{awk::XXX} if @samp{XXX} is all uppercase, strip off the
address@hidden::} part.
-
address@hidden Debugger:
-Simply print fully qualified names all the time. Maybe allow a
address@hidden @var{xxx}} command in the debugger to set the
-namespace and it will use that to create fully qualified names?
-Have to be careful about all uppercase names though.
-
address@hidden How does this affect @code{@@include}?
-Basically @code{@@include} should push and pop the namespace. Each
address@hidden@@include} saves the current namespace and starts over with
-namespace @samp{awk} until an @code{@@namespace} is seen.
-
address@hidden Extension functions
-Revise the current macros to pass @code{"awk"} as the namespace
-argument and add new macros with @samp{_ns} or some such in the name that
-pass the namespace of the extension. This preserves backwards
-compatibility at the source level while providing access to namespaces
-as needed.
-
-Actually, since we've decided that @code{awk} namespace variables and
-function are stored unadorned, the current macros that pass @code{""}
-would continue to work. Internally, we need to recognize @code{"awk"} and
address@hidden fully qualify the name before storing it in the symbol table.
address@hidden table
-
address@hidden Advanced Features
address@hidden Advanced Features of @command{gawk}
address@hidden @command{gawk}, features, advanced
address@hidden advanced features, @command{gawk}
address@hidden
-Contributed by: Peter Langston <address@hidden>
address@hidden Avoid using the term ``stable'' when describing the
unpredictable behavior
address@hidden if two items compare equal. Usually, the goal of a "stable
algorithm"
address@hidden is to maintain the original order of the items, which is a
meaningless
address@hidden concept for a list constructed from a hash.
- Found in Steve English's "signature" line:
+A custom comparison function can often simplify ordered loop
+traversal, and the sky is really the limit when it comes to
+designing such a function.
-"Write documentation as if whoever reads it is a violent psychopath
-who knows where you live."
address@hidden ignore
address@hidden Langston, Peter
address@hidden English, Steve
address@hidden
address@hidden documentation as if whoever reads it is
-a violent psychopath who knows where you live.}
address@hidden Steve English, as quoted by Peter Langston
address@hidden quotation
+When string comparisons are made during a sort, either for element
+values where one or both aren't numbers, or for element indices
+handled as strings, the value of @code{IGNORECASE}
+(@pxref{Built-in Variables}) controls whether
+the comparisons treat corresponding upper- and lowercase letters as
+equivalent or distinct.
-This @value{CHAPTER} discusses advanced features in @command{gawk}.
-It's a bit of a ``grab bag'' of items that are otherwise unrelated
-to each other.
-First, we look at a command-line option that allows @command{gawk} to recognize
-nondecimal numbers in input data, not just in @command{awk}
-programs.
-Then, @command{gawk}'s special features for sorting arrays are presented.
-Next, two-way I/O, discussed briefly in earlier parts of this
address@hidden, is described in full detail, along with the basics
-of TCP/IP networking. Finally, we see how @command{gawk}
-can @dfn{profile} an @command{awk} program, making it possible to tune
-it for performance.
+Another point to keep in mind is that in the case of subarrays,
+the element values can themselves be arrays; a production comparison
+function should use the @code{isarray()} function
+(@pxref{Type Functions})
+to check for this, and choose a defined sorting order for subarrays.
address@hidden FULLXREF ON
-Additional advanced features are discussed in separate @value{CHAPTER}s of
their
-own:
+All sorting based on @code{PROCINFO["sorted_in"]}
+is disabled in POSIX mode,
+because the @code{PROCINFO} array is not special in that case.
address@hidden @value{BULLET}
address@hidden
address@hidden, discusses how to internationalize
-your @command{awk} programs, so that they can speak multiple
-national languages.
+As a side note, sorting the array indices before traversing
+the array has been reported to add a 15% to 20% overhead to the
+execution time of @command{awk} programs. For this reason,
+sorted array traversal is not the default.
address@hidden
address@hidden, describes @command{gawk}'s built-in command-line
-debugger for debugging @command{awk} programs.
address@hidden The @command{gawk}
address@hidden maintainers believe that only the people who wish to use a
address@hidden feature should have to pay for it.
address@hidden
address@hidden Precision Arithmetic}, describes how you can use
address@hidden to perform arbitrary-precision arithmetic.
address@hidden Array Sorting Functions
address@hidden Sorting Array Values and Indices with @command{gawk}
address@hidden
address@hidden Extensions},
-discusses the ability to dynamically add new built-in functions to
address@hidden
address@hidden itemize
address@hidden FULLXREF OFF
address@hidden arrays, sorting
address@hidden
address@hidden @code{asort()} function (@command{gawk}), address@hidden sorting
address@hidden
address@hidden @code{asorti()} function (@command{gawk}), address@hidden sorting
address@hidden sort function, arrays, sorting
+In most @command{awk} implementations, sorting an array requires writing
+a @code{sort()} function. This can be educational for exploring
+different sorting algorithms, but usually that's not the point of the program.
address@hidden provides the built-in @code{asort()} and @code{asorti()}
+functions (@pxref{String Functions}) for sorting arrays. For example:
address@hidden
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal and
- sorting arrays.
-* Two-way I/O:: Two-way communications with another process.
-* TCP/IP Networking:: Using @command{gawk} for network programming.
-* Profiling:: Profiling your @command{awk} programs.
-* Advanced Features Summary:: Summary of advanced features.
address@hidden menu
address@hidden
address@hidden the array} data
+n = asort(data)
+for (i = 1; i <= n; i++)
+ @var{do something with} data[i]
address@hidden example
address@hidden Nondecimal Data
address@hidden Allowing Nondecimal Input Data
address@hidden @option{--non-decimal-data} option
address@hidden advanced features, nondecimal input data
address@hidden input, address@hidden nondecimal
address@hidden constants, nondecimal
+After the call to @code{asort()}, the array @code{data} is indexed from 1
+to some number @var{n}, the total number of elements in @code{data}.
+(This count is @code{asort()}'s return value.)
address@hidden @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
+The default comparison is based on the type of the elements
+(@pxref{Typing and Comparison}).
+All numeric values come before all string values,
+which in turn come before all subarrays.
-If you run @command{gawk} with the @option{--non-decimal-data} option,
-you can have nondecimal values in your input data:
address@hidden side effects, @code{asort()} function
+An important side effect of calling @code{asort()} is that
address@hidden array's original indices are irrevocably lost}.
+As this isn't always desirable, @code{asort()} accepts a
+second argument:
@example
-$ @kbd{echo 0123 123 0x123 |}
-> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n", $1, $2, $3 @}'}
address@hidden 83, 123, 291
address@hidden the array} source
+n = asort(source, dest)
+for (i = 1; i <= n; i++)
+ @var{do something with} dest[i]
@end example
-For this feature to work, write your program so that
address@hidden treats your data as numeric:
+In this case, @command{gawk} copies the @code{source} array into the
address@hidden array and then sorts @code{dest}, destroying its indices.
+However, the @code{source} array is not affected.
+
+Often, what's needed is to sort on the values of the @emph{indices}
+instead of the values of the elements. To do that, use the
address@hidden()} function. The interface and behavior are identical to
+that of @code{asort()}, except that the index values are used for sorting
+and become the values of the result array:
@example
-$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
address@hidden 0123 123 0x123
address@hidden source[$0] = some_func($0) @}
+
+END @{
+ n = asorti(source, dest)
+ for (i = 1; i <= n; i++) @{
+ @ii{Work with sorted indices directly:}
+ @var{do something with} dest[i]
+ @dots{}
+ @ii{Access original array via sorted indices:}
+ @var{do something with} source[dest[i]]
+ @}
address@hidden
@end example
address@hidden
-The @code{print} statement treats its expressions as strings.
-Although the fields can act as numbers when necessary,
-they are still strings, so @code{print} does not try to treat them
-numerically. You need to add zero to a field to force it to
-be treated as a number. For example:
+So far, so good. Now it starts to get interesting. Both @code{asort()}
+and @code{asorti()} accept a third string argument to control comparison
+of array elements. When we introduced @code{asort()} and @code{asorti()}
+in @ref{String Functions}, we ignored this third argument; however,
+now is the time to describe how this argument affects these two functions.
address@hidden
-$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
-> @address@hidden print $1, $2, $3}
-> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
address@hidden 0123 123 0x123
address@hidden 83 123 291
address@hidden example
+Basically, the third argument specifies how the array is to be sorted.
+There are two possibilities. As with @code{PROCINFO["sorted_in"]},
+this argument may be one of the predefined names that @command{gawk}
+provides (@pxref{Controlling Scanning}), or it may be the name of a
+user-defined function (@pxref{Controlling Array Traversal}).
-Because it is common to have decimal data with leading zeros, and because
-using this facility could lead to surprising results, the default is to leave
it
-disabled. If you want it, you must explicitly request it.
+In the latter case, @emph{the function can compare elements in any way
+it chooses}, taking into account just the indices, just the values,
+or both. This is extremely powerful.
address@hidden programming conventions, @code{--non-decimal-data} option
address@hidden @option{--non-decimal-data} option, @code{strtonum()} function
and
address@hidden @code{strtonum()} function (@command{gawk}),
@code{--non-decimal-data} option and
address@hidden CAUTION
address@hidden of this option is not recommended.}
-It can break old programs very badly.
-Instead, use the @code{strtonum()} function to convert your data
-(@pxref{String Functions}).
-This makes your programs easier to write and easier to read, and
-leads to less surprising results.
+Once the array is sorted, @code{asort()} takes the @emph{values} in
+their final order and uses them to fill in the result array, whereas
address@hidden()} takes the @emph{indices} in their final order and uses
+them to fill in the result array.
-This option may disappear in a future version of @command{gawk}.
address@hidden reference counting, sorting arrays
address@hidden NOTE
+Copying array indices and elements isn't expensive in terms of memory.
+Internally, @command{gawk} maintains @dfn{reference counts} to data.
+For example, when @code{asort()} copies the first array to the second one,
+there is only one copy of the original array elements' data, even though
+both arrays use the values.
@end quotation
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
-
address@hidden lets you control the order in which a
address@hidden (@var{indx} in @var{array})}
-loop traverses an array.
address@hidden Document It And Call It A Feature. Sigh.
address@hidden @command{gawk}, @code{IGNORECASE} variable in
address@hidden arrays, sorting, and @code{IGNORECASE} variable
address@hidden @code{IGNORECASE} variable, and array sorting functions
+Because @code{IGNORECASE} affects string comparisons, the value
+of @code{IGNORECASE} also affects sorting for both @code{asort()} and
@code{asorti()}.
+Note also that the locale's sorting order does @emph{not}
+come into play; comparisons are based on character values address@hidden
+is true because locale-based comparison occurs only when in
+POSIX-compatibility mode, and because @code{asort()} and @code{asorti()} are
address@hidden extensions, they are not available in that case.}
-In addition, two built-in functions, @code{asort()} and @code{asorti()},
-let you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
-
address@hidden
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
address@hidden menu
-
address@hidden Controlling Array Traversal
address@hidden Controlling Array Traversal
-
-By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
-scans an array is not defined; it is generally based upon
-the internal implementation of arrays inside @command{awk}.
-
-Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose. @command{gawk}
-lets you do this.
-
address@hidden Scanning} describes how you can assign special,
-predefined values to @code{PROCINFO["sorted_in"]} in order to
-control the order in which @command{gawk} traverses an array
-during a @code{for} loop.
-
-In addition, the value of @code{PROCINFO["sorted_in"]} can be a
-function address@hidden is why the predefined sorting orders
-start with an @samp{@@} character, which cannot be part of an identifier.}
-This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function. The comparison function should be defined with at least
-four arguments:
+The following example demonstrates the use of a comparison function with
address@hidden()}. The comparison function, @code{case_fold_compare()}, maps
+both values to lowercase in order to compare them ignoring case.
@example
-function comp_func(i1, v1, i2, v2)
address@hidden
- @var{compare elements 1 and 2 in some fashion}
- @var{return < 0; 0; or > 0}
address@hidden
address@hidden example
-
-Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
-are the corresponding values of the two elements being compared.
-Either @code{v1} or @code{v2}, or both, can be arrays if the array being
-traversed contains subarrays as values.
-(@xref{Arrays of Arrays} for more information about subarrays.)
-The three possible return values are interpreted as follows:
-
address@hidden @code
address@hidden comp_func(i1, v1, i2, v2) < 0
-Index @code{i1} comes before index @code{i2} during loop traversal.
-
address@hidden comp_func(i1, v1, i2, v2) == 0
-Indices @code{i1} and @code{i2}
-come together, but the relative order with respect to each other is undefined.
-
address@hidden comp_func(i1, v1, i2, v2) > 0
-Index @code{i1} comes after index @code{i2} during loop traversal.
address@hidden table
-
-Our first comparison function can be used to scan an array in
-numerical order of the indices:
+# case_fold_compare --- compare as strings, ignoring case
address@hidden
-function cmp_num_idx(i1, v1, i2, v2)
+function case_fold_compare(i1, v1, i2, v2, l, r)
@{
- # numerical index comparison, ascending order
- return (i1 - i2)
address@hidden
address@hidden example
-
-Our second function traverses an array based on the string order of
-the element values rather than by indices:
+ l = tolower(v1)
+ r = tolower(v2)
address@hidden
-function cmp_str_val(i1, v1, i2, v2)
address@hidden
- # string value comparison, ascending order
- v1 = v1 ""
- v2 = v2 ""
- if (v1 < v2)
+ if (l < r)
return -1
- return (v1 != v2)
+ else if (l == r)
+ return 0
+ else
+ return 1
@}
@end example
-The third
-comparison function makes all numbers, and numeric strings without
-any leading or trailing spaces, come out first during loop traversal:
+And here is the test program for it:
@example
-function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
address@hidden
- # numbers before string value comparison, ascending order
- n1 = v1 + 0
- n2 = v2 + 0
- if (n1 == v1)
- return (n2 == v2) ? (n1 - n2) : -1
- else if (n2 == v2)
- return 1
- return (v1 < v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
-
-Here is a main program to demonstrate how @command{gawk}
-behaves using each of the previous functions:
+# Test program
address@hidden
BEGIN @{
- data["one"] = 10
- data["two"] = 20
- data[10] = "one"
- data[100] = 100
- data[20] = "two"
+ Letters = "abcdefghijklmnopqrstuvwxyz" \
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ split(Letters, data, "")
- f[1] = "cmp_num_idx"
- f[2] = "cmp_str_val"
- f[3] = "cmp_num_str_val"
- for (i = 1; i <= 3; i++) @{
- printf("Sort function: %s\n", f[i])
- PROCINFO["sorted_in"] = f[i]
- for (j in data)
- printf("\tdata[%s] = %s\n", j, data[j])
- print ""
+ asort(data, result, "case_fold_compare")
+
+ j = length(result)
+ for (i = 1; i <= j; i++) @{
+ printf("%s", result[i])
+ if (i % (j/2) == 0)
+ printf("\n")
+ else
+ printf(" ")
@}
@}
@end example
-Here are the results when the program is run:
+When run, we get the following:
@example
-$ @kbd{gawk -f compdemo.awk}
address@hidden Sort function: cmp_num_idx @ii{Sort by numeric index}
address@hidden data[two] = 20
address@hidden data[one] = 10 @ii{Both strings are numerically
zero}
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden data[100] = 100
address@hidden
address@hidden Sort function: cmp_str_val @ii{Sort by element values as
strings}
address@hidden data[one] = 10
address@hidden data[100] = 100 @ii{String 100 is less than
string 20}
address@hidden data[two] = 20
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden
address@hidden Sort function: cmp_num_str_val @ii{Sort all numeric values
before all strings}
address@hidden data[one] = 10
address@hidden data[two] = 20
address@hidden data[100] = 100
address@hidden data[10] = one
address@hidden data[20] = two
+$ @kbd{gawk -f case_fold_compare.awk}
address@hidden A a B b c C D d e E F f g G H h i I J j k K l L M m
address@hidden n N O o p P Q q r R S s t T u U V v w W X x y Y z Z
@end example
-Consider sorting the entries of a GNU/Linux system password file
-according to login name. The following program sorts records
-by a specific field position and can be used for this purpose:
-
address@hidden
-# passwd-sort.awk --- simple program to sort by field position
-# field position is specified by the global variable POS
address@hidden Two-way I/O
address@hidden Two-Way Communications with Another Process
-function cmp_field(i1, v1, i2, v2)
address@hidden
- # comparison by value, as string, and ascending order
- return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
address@hidden
address@hidden 8/2014. Neither Mike nor BWK saw this as relevant. Commenting it
out.
address@hidden
address@hidden Brennan, Michael
address@hidden programmers, attractiveness of
address@hidden
address@hidden Path:
cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
+From: brennan@@whidbey.com (Mike Brennan)
+Newsgroups: comp.lang.awk
+Subject: Re: Learn the SECRET to Attract Women Easily
+Date: 4 Aug 1997 17:34:46 GMT
address@hidden Organization: WhidbeyNet
address@hidden Lines: 12
+Message-ID: <5s53rm$eca@@news.whidbey.com>
address@hidden References: <address@hidden>
address@hidden Reply-To: address@hidden
address@hidden NNTP-Posting-Host: asn202.whidbey.com
address@hidden X-Newsreader: slrn (0.9.4.1 UNIX)
address@hidden Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
address@hidden
- for (i = 1; i <= NF; i++)
- a[NR][i] = $i
address@hidden
+On 3 Aug 1997 13:17:43 GMT, Want More Dates???
+<tracy78@@kilgrona.com> wrote:
+>Learn the SECRET to Attract Women Easily
+>
+>The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
-END @{
- PROCINFO["sorted_in"] = "cmp_field"
- if (POS < 1 || POS > NF)
- POS = 1
- for (i in a) @{
- for (j = 1; j <= NF; j++)
- printf("%s%c", a[i][j], j < NF ? ":" : "")
- print ""
- @}
address@hidden
address@hidden example
+The scent of awk programmers is a lot more attractive to women than
+the scent of perl programmers.
+--
+Mike Brennan
address@hidden brennan@@whidbey.com
address@hidden smallexample
address@hidden ignore
-The first field in each entry of the password file is the user's login name,
-and the fields are separated by colons.
-Each record defines a subarray,
-with each field as an element in the subarray.
-Running the program produces the
-following output:
address@hidden advanced features, address@hidden communicating with
address@hidden processes, two-way communications with
+It is often useful to be able to
+send data to a separate program for
+processing and then read the result. This can always be
+done with temporary files:
@example
-$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
address@hidden adm:x:3:4:adm:/var/adm:/sbin/nologin
address@hidden apache:x:48:48:Apache:/var/www:/sbin/nologin
address@hidden avahi:x:70:70:Avahi daemon:/:/sbin/nologin
address@hidden
address@hidden example
+# Write the data for processing
+tempfile = ("mydata." PROCINFO["pid"])
+while (@var{not done with data})
+ print @var{data} | ("subprogram > " tempfile)
+close("subprogram > " tempfile)
-The comparison should normally always return the same value when given a
-specific pair of array elements as its arguments. If inconsistent
-results are returned, then the order is undefined. This behavior can be
-exploited to introduce random order into otherwise seemingly
-ordered data:
+# Read the results, remove tempfile when done
+while ((getline newdata < tempfile) > 0)
+ @var{process} newdata @var{appropriately}
+close(tempfile)
+system("rm " tempfile)
address@hidden example
address@hidden
-function cmp_randomize(i1, v1, i2, v2)
address@hidden
- # random order (caution: this may never terminate!)
- return (2 - 4 * rand())
address@hidden
address@hidden example
address@hidden
+This works, but not elegantly. Among other things, it requires that
+the program be run in a directory that cannot be shared among users;
+for example, @file{/tmp} will not do, as another user might happen
+to be using a temporary file with the same address@hidden
+Brennan suggests the use of @command{rand()} to generate unique
address@hidden This is a valid point; nevertheless, temporary files
+remain more difficult to use than two-way pipes.} @c 8/2014
-As already mentioned, the order of the indices is arbitrary if two
-elements compare equal. This is usually not a problem, but letting
-the tied elements come out in arbitrary order can be an issue, especially
-when comparing item values. The partial ordering of the equal elements
-may change the next time the array is traversed, if other elements are added
to or
-removed from the array. One way to resolve ties when comparing elements
-with otherwise equal values is to include the indices in the comparison
-rules. Note that doing this may make the loop traversal less efficient,
-so consider it only if necessary. The following comparison functions
-force a deterministic order, and are based on the fact that the
-(string) indices of two elements are never equal:
address@hidden coprocesses
address@hidden input/output, two-way
address@hidden @code{|} (vertical bar), @code{|&} operator (I/O)
address@hidden vertical bar (@code{|}), @code{|&} operator (I/O)
address@hidden @command{csh} utility, @code{|&} operator, comparison with
+However, with @command{gawk}, it is possible to
+open a @emph{two-way} pipe to another process. The second process is
+termed a @dfn{coprocess}, as it runs in parallel with @command{gawk}.
+The two-way connection is created using the @samp{|&} operator
+(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
+different from the same operator in the C shell and in Bash.}
@example
-function cmp_numeric(i1, v1, i2, v2)
address@hidden
- # numerical value (and index) comparison, descending order
- return (v1 != v2) ? (v2 - v1) : (i2 - i1)
address@hidden
-
-function cmp_string(i1, v1, i2, v2)
address@hidden
- # string value (and index) comparison, descending order
- v1 = v1 i1
- v2 = v2 i2
- return (v1 > v2) ? -1 : (v1 != v2)
address@hidden
+do @{
+ print @var{data} |& "subprogram"
+ "subprogram" |& getline results
address@hidden while (@var{data left to process})
+close("subprogram")
@end example
address@hidden Avoid using the term ``stable'' when describing the
unpredictable behavior
address@hidden if two items compare equal. Usually, the goal of a "stable
algorithm"
address@hidden is to maintain the original order of the items, which is a
meaningless
address@hidden concept for a list constructed from a hash.
+The first time an I/O operation is executed using the @samp{|&}
+operator, @command{gawk} creates a two-way pipeline to a child process
+that runs the other program. Output created with @code{print}
+or @code{printf} is written to the program's standard input, and
+output from the program's standard output can be read by the @command{gawk}
+program using @code{getline}.
+As is the case with processes started by @samp{|}, the subprogram
+can be any program, or pipeline of programs, that can be started by
+the shell.
-A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to
-designing such a function.
+There are some cautionary items to be aware of:
-When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices
-handled as strings, the value of @code{IGNORECASE}
-(@pxref{Built-in Variables}) controls whether
-the comparisons treat corresponding upper- and lowercase letters as
-equivalent or distinct.
address@hidden @value{BULLET}
address@hidden
+As the code inside @command{gawk} currently stands, the coprocess's
+standard error goes to the same place that the parent @command{gawk}'s
+standard error goes. It is not possible to read the child's
+standard error separately.
-Another point to keep in mind is that in the case of subarrays,
-the element values can themselves be arrays; a production comparison
-function should use the @code{isarray()} function
-(@pxref{Type Functions})
-to check for this, and choose a defined sorting order for subarrays.
address@hidden deadlocks
address@hidden buffering, input/output
address@hidden @code{getline} command, deadlock and
address@hidden
+I/O buffering may be a problem. @command{gawk} automatically
+flushes all output down the pipe to the coprocess.
+However, if the coprocess does not flush its output,
address@hidden may hang when doing a @code{getline} in order to read
+the coprocess's results. This could lead to a situation
+known as @dfn{deadlock}, where each process is waiting for the
+other one to do something.
address@hidden itemize
-All sorting based on @code{PROCINFO["sorted_in"]}
-is disabled in POSIX mode,
-because the @code{PROCINFO} array is not special in that case.
address@hidden @code{close()} function, two-way pipes and
+It is possible to close just one end of the two-way pipe to
+a coprocess, by supplying a second argument to the @code{close()}
+function of either @code{"to"} or @code{"from"}
+(@pxref{Close Files And Pipes}).
+These strings tell @command{gawk} to close the end of the pipe
+that sends data to the coprocess or the end that reads from it,
+respectively.
-As a side note, sorting the array indices before traversing
-the array has been reported to add a 15% to 20% overhead to the
-execution time of @command{awk} programs. For this reason,
-sorted array traversal is not the default.
address@hidden @command{sort} utility, coprocesses and
+This is particularly necessary in order to use
+the system @command{sort} utility as part of a coprocess;
address@hidden must read @emph{all} of its input
+data before it can produce any output.
+The @command{sort} program does not receive an end-of-file indication
+until @command{gawk} closes the write end of the pipe.
address@hidden The @command{gawk}
address@hidden maintainers believe that only the people who wish to use a
address@hidden feature should have to pay for it.
+When you have finished writing data to the @command{sort}
+utility, you can close the @code{"to"} end of the pipe, and
+then start reading sorted data via @code{getline}.
+For example:
address@hidden Array Sorting Functions
address@hidden Sorting Array Values and Indices with @command{gawk}
address@hidden
+BEGIN @{
+ command = "LC_ALL=C sort"
+ n = split("abcdefghijklmnopqrstuvwxyz", a, "")
address@hidden arrays, sorting
address@hidden
address@hidden @code{asort()} function (@command{gawk}), address@hidden sorting
address@hidden
address@hidden @code{asorti()} function (@command{gawk}), address@hidden sorting
address@hidden sort function, arrays, sorting
-In most @command{awk} implementations, sorting an array requires writing
-a @code{sort()} function. This can be educational for exploring
-different sorting algorithms, but usually that's not the point of the program.
address@hidden provides the built-in @code{asort()} and @code{asorti()}
-functions (@pxref{String Functions}) for sorting arrays. For example:
+ for (i = n; i > 0; i--)
+ print a[i] |& command
+ close(command, "to")
address@hidden
address@hidden the array} data
-n = asort(data)
-for (i = 1; i <= n; i++)
- @var{do something with} data[i]
+ while ((command |& getline line) > 0)
+ print "got", line
+ close(command)
address@hidden
@end example
-After the call to @code{asort()}, the array @code{data} is indexed from 1
-to some number @var{n}, the total number of elements in @code{data}.
-(This count is @code{asort()}'s return value.)
address@hidden @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
-The default comparison is based on the type of the elements
-(@pxref{Typing and Comparison}).
-All numeric values come before all string values,
-which in turn come before all subarrays.
+This program writes the letters of the alphabet in reverse order, one
+per line, down the two-way pipe to @command{sort}. It then closes the
+write end of the pipe, so that @command{sort} receives an end-of-file
+indication. This causes @command{sort} to sort the data and write the
+sorted data back to the @command{gawk} program. Once all of the data
+has been read, @command{gawk} terminates the coprocess and exits.
address@hidden side effects, @code{asort()} function
-An important side effect of calling @code{asort()} is that
address@hidden array's original indices are irrevocably lost}.
-As this isn't always desirable, @code{asort()} accepts a
-second argument:
+As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
+command ensures traditional Unix (ASCII) sorting from @command{sort}.
+This is not strictly necessary here, but it's good to know how to do this.
address@hidden
address@hidden the array} source
-n = asort(source, dest)
-for (i = 1; i <= n; i++)
- @var{do something with} dest[i]
address@hidden example
+Be careful when closing the @code{"from"} end of a two-way pipe; in this
+case @command{gawk} waits for the child process to exit, which may cause
+your program to hang. (Thus, this particular feature is of much less
+use in practice than being able to close the @code{"to"} end.)
-In this case, @command{gawk} copies the @code{source} array into the
address@hidden array and then sorts @code{dest}, destroying its indices.
-However, the @code{source} array is not affected.
address@hidden CAUTION
+Normally,
+it is a fatal error to write to the @code{"to"} end of a two-way
+pipe which has been closed, and it is also a fatal error to read
+from the @code{"from"} end of a two-way pipe that has been closed.
-Often, what's needed is to sort on the values of the @emph{indices}
-instead of the values of the elements. To do that, use the
address@hidden()} function. The interface and behavior are identical to
-that of @code{asort()}, except that the index values are used for sorting
-and become the values of the result array:
+You may set @code{PROCINFO["@var{command}", "NONFATAL"]} to
+make such operations become nonfatal. If you do so, you then need
+to check @code{ERRNO} after each @code{print}, @code{printf},
+or @code{getline}.
address@hidden, for more information.
address@hidden quotation
address@hidden
address@hidden source[$0] = some_func($0) @}
address@hidden @command{gawk}, @code{PROCINFO} array in
address@hidden @code{PROCINFO} array, and communications via ptys
+You may also use pseudo-ttys (ptys) for
+two-way communication instead of pipes, if your system supports them.
+This is done on a per-command basis, by setting a special element
+in the @code{PROCINFO} array
+(@pxref{Auto-set}),
+like so:
-END @{
- n = asorti(source, dest)
- for (i = 1; i <= n; i++) @{
- @ii{Work with sorted indices directly:}
- @var{do something with} dest[i]
- @dots{}
- @ii{Access original array via sorted indices:}
- @var{do something with} source[dest[i]]
- @}
address@hidden
address@hidden
+command = "sort -nr" # command, save in convenience variable
+PROCINFO[command, "pty"] = 1 # update PROCINFO
+print @dots{} |& command # start two-way pipe
address@hidden
@end example
-So far, so good. Now it starts to get interesting. Both @code{asort()}
-and @code{asorti()} accept a third string argument to control comparison
-of array elements. When we introduced @code{asort()} and @code{asorti()}
-in @ref{String Functions}, we ignored this third argument; however,
-now is the time to describe how this argument affects these two functions.
-
-Basically, the third argument specifies how the array is to be sorted.
-There are two possibilities. As with @code{PROCINFO["sorted_in"]},
-this argument may be one of the predefined names that @command{gawk}
-provides (@pxref{Controlling Scanning}), or it may be the name of a
-user-defined function (@pxref{Controlling Array Traversal}).
address@hidden
+If your system does not have ptys, or if all the system's ptys are in use,
address@hidden automatically falls back to using regular pipes.
-In the latter case, @emph{the function can compare elements in any way
-it chooses}, taking into account just the indices, just the values,
-or both. This is extremely powerful.
+Using ptys usually avoids the buffer deadlock issues described earlier,
+at some loss in performance. This is because the tty driver buffers
+and sends data line-by-line. On systems with the @command{stdbuf}
+(part of the @uref{http://www.gnu.org/software/coreutils/coreutils.html,
+GNU Coreutils package}), you can use that program instead of ptys.
-Once the array is sorted, @code{asort()} takes the @emph{values} in
-their final order and uses them to fill in the result array, whereas
address@hidden()} takes the @emph{indices} in their final order and uses
-them to fill in the result array.
+Note also that ptys are not fully transparent. Certain binary control
+codes, such @kbd{Ctrl-d} for end-of-file, are interpreted by the tty
+driver and not passed through.
address@hidden reference counting, sorting arrays
address@hidden NOTE
-Copying array indices and elements isn't expensive in terms of memory.
-Internally, @command{gawk} maintains @dfn{reference counts} to data.
-For example, when @code{asort()} copies the first array to the second one,
-there is only one copy of the original array elements' data, even though
-both arrays use the values.
address@hidden CAUTION
+Finally, coprocesses open up the possibility of @dfn{deadlock} between
address@hidden and the program running in the coprocess. This can occur
+if you send ``too much'' data to the coprocess before reading any back;
+each process is blocked writing data with noone available to read what
+they've already written. There is no workaround for deadlock; careful
+programming and knowledge of the behavior of the coprocess are required.
@end quotation
address@hidden Document It And Call It A Feature. Sigh.
address@hidden @command{gawk}, @code{IGNORECASE} variable in
address@hidden arrays, sorting, and @code{IGNORECASE} variable
address@hidden @code{IGNORECASE} variable, and array sorting functions
-Because @code{IGNORECASE} affects string comparisons, the value
-of @code{IGNORECASE} also affects sorting for both @code{asort()} and
@code{asorti()}.
-Note also that the locale's sorting order does @emph{not}
-come into play; comparisons are based on character values address@hidden
-is true because locale-based comparison occurs only when in
-POSIX-compatibility mode, and because @code{asort()} and @code{asorti()} are
address@hidden extensions, they are not available in that case.}
address@hidden TCP/IP Networking
address@hidden Using @command{gawk} for Network Programming
address@hidden advanced features, network programming
address@hidden networks, programming
address@hidden TCP/IP
address@hidden @code{/inet/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet/@dots{}} (@command{gawk})
address@hidden @code{/inet4/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet4/@dots{}} (@command{gawk})
address@hidden @code{/inet6/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet6/@dots{}} (@command{gawk})
address@hidden @code{EMRED}
address@hidden
address@hidden
address@hidden:@*
+@ @ @ @ @i{A host is a host from coast to coast,@*
+@ @ @ @ and nobody talks to a host that's close,@*
+@ @ @ @ unless the host that isn't address@hidden
+@ @ @ @ is busy, hung, or dead.}
address@hidden Mike O'Brien (aka Mr.@: Protocol)
address@hidden quotation
address@hidden ifnotdocbook
-The following example demonstrates the use of a comparison function with
address@hidden()}. The comparison function, @code{case_fold_compare()}, maps
-both values to lowercase in order to compare them ignoring case.
address@hidden
+<blockquote>
+<attribution>Mike O'Brien (aka Mr. Protocol)</attribution>
+<literallayout class="normal"><literal>EMRED</literal>:
+ <emphasis>A host is a host from coast to
coast,</emphasis>
+ <emphasis>and no-one can talk to host that's
close,</emphasis>
+ <emphasis>unless the host that isn't close</emphasis>
+ <emphasis>is busy, hung, or
dead.</emphasis></literallayout>
+</blockquote>
address@hidden docbook
address@hidden
-# case_fold_compare --- compare as strings, ignoring case
+In addition to being able to open a two-way pipeline to a coprocess
+on the same system
+(@pxref{Two-way I/O}),
+it is possible to make a two-way connection to
+another process on another system across an IP network connection.
-function case_fold_compare(i1, v1, i2, v2, l, r)
address@hidden
- l = tolower(v1)
- r = tolower(v2)
+You can think of this as just a @emph{very long} two-way pipeline to
+a coprocess.
+The way @command{gawk} decides that you want to use TCP/IP networking is
+by recognizing special @value{FN}s that begin with one of @samp{/inet/},
address@hidden/inet4/}, or @samp{/inet6/}.
- if (l < r)
- return -1
- else if (l == r)
- return 0
- else
- return 1
address@hidden
address@hidden example
+The full syntax of the special @value{FN} is
address@hidden/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
+The components are:
-And here is the test program for it:
address@hidden @var
address@hidden net-type
+Specifies the kind of Internet connection to make.
+Use @samp{/inet4/} to force IPv4, and
address@hidden/inet6/} to force IPv6.
+Plain @samp{/inet/} (which used to be the only option) uses
+the system default, most likely IPv4.
address@hidden
-# Test program
address@hidden protocol
+The protocol to use over IP. This must be either @samp{tcp}, or
address@hidden, for a TCP or UDP IP connection,
+respectively. TCP should be used for most applications.
-BEGIN @{
- Letters = "abcdefghijklmnopqrstuvwxyz" \
- "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
- split(Letters, data, "")
address@hidden local-port
address@hidden @code{getaddrinfo()} function (C library)
+The local TCP or UDP port number to use. Use a port number of @samp{0}
+when you want the system to pick a port. This is what you should do
+when writing a TCP or UDP client.
+You may also use a well-known service name, such as @samp{smtp}
+or @samp{http}, in which case @command{gawk} attempts to determine
+the predefined port number using the C @code{getaddrinfo()} function.
- asort(data, result, "case_fold_compare")
address@hidden remote-host
+The IP address or fully qualified domain name of the Internet
+host to which you want to connect.
- j = length(result)
- for (i = 1; i <= j; i++) @{
- printf("%s", result[i])
- if (i % (j/2) == 0)
- printf("\n")
- else
- printf(" ")
- @}
address@hidden
address@hidden example
address@hidden remote-port
+The TCP or UDP port number to use on the given @var{remote-host}.
+Again, use @samp{0} if you don't care, or else a well-known
+service name.
address@hidden table
-When run, we get the following:
address@hidden @command{gawk}, @code{ERRNO} variable in
address@hidden @code{ERRNO} variable
address@hidden NOTE
+Failure in opening a two-way socket will result in a nonfatal error
+being returned to the calling code. The value of @code{ERRNO} indicates
+the error (@pxref{Auto-set}).
address@hidden quotation
+
+Consider the following very simple example:
@example
-$ @kbd{gawk -f case_fold_compare.awk}
address@hidden A a B b c C D d e E F f g G H h i I J j k K l L M m
address@hidden n N O o p P Q q r R S s t T u U V v w W X x y Y z Z
+BEGIN @{
+ Service = "/inet/tcp/0/localhost/daytime"
+ Service |& getline
+ print $0
+ close(Service)
address@hidden
@end example
address@hidden Two-way I/O
address@hidden Two-Way Communications with Another Process
+This program reads the current date and time from the local system's
+TCP @code{daytime} server.
+It then prints the results and closes the connection.
address@hidden 8/2014. Neither Mike nor BWK saw this as relevant. Commenting it
out.
address@hidden
address@hidden Brennan, Michael
address@hidden programmers, attractiveness of
address@hidden
address@hidden Path:
cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
-From: brennan@@whidbey.com (Mike Brennan)
-Newsgroups: comp.lang.awk
-Subject: Re: Learn the SECRET to Attract Women Easily
-Date: 4 Aug 1997 17:34:46 GMT
address@hidden Organization: WhidbeyNet
address@hidden Lines: 12
-Message-ID: <5s53rm$eca@@news.whidbey.com>
address@hidden References: <address@hidden>
address@hidden Reply-To: address@hidden
address@hidden NNTP-Posting-Host: asn202.whidbey.com
address@hidden X-Newsreader: slrn (0.9.4.1 UNIX)
address@hidden Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
+Because this topic is extensive, the use of @command{gawk} for
+TCP/IP programming is documented separately.
address@hidden
+See
address@hidden, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
address@hidden ifinfo
address@hidden
+See
address@hidden://www.gnu.org/software/gawk/manual/gawkinet/,
address@hidden@value{GAWKINETTITLE}}},
+which comes as part of the @command{gawk} distribution,
address@hidden ifnotinfo
+for a much more complete introduction and discussion, as well as
+extensive examples.
-On 3 Aug 1997 13:17:43 GMT, Want More Dates???
-<tracy78@@kilgrona.com> wrote:
->Learn the SECRET to Attract Women Easily
->
->The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
address@hidden NOTE
address@hidden can only open direct sockets. There is currently
+no way to access services available over Secure Socket Layer
+(SSL); this includes any web service whose URL starts with @samp{https://}.
address@hidden quotation
-The scent of awk programmers is a lot more attractive to women than
-the scent of perl programmers.
---
-Mike Brennan
address@hidden brennan@@whidbey.com
address@hidden smallexample
address@hidden ignore
address@hidden advanced features, address@hidden communicating with
address@hidden processes, two-way communications with
-It is often useful to be able to
-send data to a separate program for
-processing and then read the result. This can always be
-done with temporary files:
address@hidden Profiling
address@hidden Profiling Your @command{awk} Programs
address@hidden @command{awk} programs, profiling
address@hidden profiling @command{awk} programs
address@hidden @code{awkprof.out} file
address@hidden files, @code{awkprof.out}
address@hidden
-# Write the data for processing
-tempfile = ("mydata." PROCINFO["pid"])
-while (@var{not done with data})
- print @var{data} | ("subprogram > " tempfile)
-close("subprogram > " tempfile)
+You may produce execution traces of your @command{awk} programs.
+This is done by passing the option @option{--profile} to @command{gawk}.
+When @command{gawk} has finished running, it creates a profile of your program
in a file
+named @file{awkprof.out}. Because it is profiling, it also executes up to 45%
slower than
address@hidden normally does.
-# Read the results, remove tempfile when done
-while ((getline newdata < tempfile) > 0)
- @var{process} newdata @var{appropriately}
-close(tempfile)
-system("rm " tempfile)
address@hidden @option{--profile} option
+As shown in the following example,
+the @option{--profile} option can be used to change the name of the file
+where @command{gawk} will write the profile:
+
address@hidden
+gawk --profile=myprog.prof -f myprog.awk data1 data2
@end example
@noindent
-This works, but not elegantly. Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, @file{/tmp} will not do, as another user might happen
-to be using a temporary file with the same address@hidden
-Brennan suggests the use of @command{rand()} to generate unique
address@hidden This is a valid point; nevertheless, temporary files
-remain more difficult to use than two-way pipes.} @c 8/2014
+In the preceding example, @command{gawk} places the profile in
address@hidden instead of in @file{awkprof.out}.
address@hidden coprocesses
address@hidden input/output, two-way
address@hidden @code{|} (vertical bar), @code{|&} operator (I/O)
address@hidden vertical bar (@code{|}), @code{|&} operator (I/O)
address@hidden @command{csh} utility, @code{|&} operator, comparison with
-However, with @command{gawk}, it is possible to
-open a @emph{two-way} pipe to another process. The second process is
-termed a @dfn{coprocess}, as it runs in parallel with @command{gawk}.
-The two-way connection is created using the @samp{|&} operator
-(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
-different from the same operator in the C shell and in Bash.}
+Here is a sample session showing a simple @command{awk} program,
+its input data, and the results from running @command{gawk} with the
address@hidden option. First, the @command{awk} program:
@example
-do @{
- print @var{data} |& "subprogram"
- "subprogram" |& getline results
address@hidden while (@var{data left to process})
-close("subprogram")
address@hidden example
+BEGIN @{ print "First BEGIN rule" @}
-The first time an I/O operation is executed using the @samp{|&}
-operator, @command{gawk} creates a two-way pipeline to a child process
-that runs the other program. Output created with @code{print}
-or @code{printf} is written to the program's standard input, and
-output from the program's standard output can be read by the @command{gawk}
-program using @code{getline}.
-As is the case with processes started by @samp{|}, the subprogram
-can be any program, or pipeline of programs, that can be started by
-the shell.
+END @{ print "First END rule" @}
-There are some cautionary items to be aware of:
+/foo/ @{
+ print "matched /foo/, gosh"
+ for (i = 1; i <= 3; i++)
+ sing()
address@hidden
address@hidden @value{BULLET}
address@hidden
-As the code inside @command{gawk} currently stands, the coprocess's
-standard error goes to the same place that the parent @command{gawk}'s
-standard error goes. It is not possible to read the child's
-standard error separately.
address@hidden
+ if (/foo/)
+ print "if is true"
+ else
+ print "else is true"
address@hidden
address@hidden deadlocks
address@hidden buffering, input/output
address@hidden @code{getline} command, deadlock and
address@hidden
-I/O buffering may be a problem. @command{gawk} automatically
-flushes all output down the pipe to the coprocess.
-However, if the coprocess does not flush its output,
address@hidden may hang when doing a @code{getline} in order to read
-the coprocess's results. This could lead to a situation
-known as @dfn{deadlock}, where each process is waiting for the
-other one to do something.
address@hidden itemize
+BEGIN @{ print "Second BEGIN rule" @}
address@hidden @code{close()} function, two-way pipes and
-It is possible to close just one end of the two-way pipe to
-a coprocess, by supplying a second argument to the @code{close()}
-function of either @code{"to"} or @code{"from"}
-(@pxref{Close Files And Pipes}).
-These strings tell @command{gawk} to close the end of the pipe
-that sends data to the coprocess or the end that reads from it,
-respectively.
+END @{ print "Second END rule" @}
address@hidden @command{sort} utility, coprocesses and
-This is particularly necessary in order to use
-the system @command{sort} utility as part of a coprocess;
address@hidden must read @emph{all} of its input
-data before it can produce any output.
-The @command{sort} program does not receive an end-of-file indication
-until @command{gawk} closes the write end of the pipe.
+function sing( dummy)
address@hidden
+ print "I gotta be me!"
address@hidden
address@hidden example
-When you have finished writing data to the @command{sort}
-utility, you can close the @code{"to"} end of the pipe, and
-then start reading sorted data via @code{getline}.
-For example:
+Following is the input data:
@example
-BEGIN @{
- command = "LC_ALL=C sort"
- n = split("abcdefghijklmnopqrstuvwxyz", a, "")
-
- for (i = n; i > 0; i--)
- print a[i] |& command
- close(command, "to")
-
- while ((command |& getline line) > 0)
- print "got", line
- close(command)
address@hidden
+foo
+bar
+baz
+foo
+junk
@end example
-This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to @command{sort}. It then closes the
-write end of the pipe, so that @command{sort} receives an end-of-file
-indication. This causes @command{sort} to sort the data and write the
-sorted data back to the @command{gawk} program. Once all of the data
-has been read, @command{gawk} terminates the coprocess and exits.
-
-As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
-command ensures traditional Unix (ASCII) sorting from @command{sort}.
-This is not strictly necessary here, but it's good to know how to do this.
+Here is the @file{awkprof.out} that results from running the
address@hidden profiler on this program and data (this example also
+illustrates that @command{awk} programmers sometimes get up very early
+in the morning to work):
-Be careful when closing the @code{"from"} end of a two-way pipe; in this
-case @command{gawk} waits for the child process to exit, which may cause
-your program to hang. (Thus, this particular feature is of much less
-use in practice than being able to close the @code{"to"} end.)
address@hidden @code{BEGIN} pattern, and profiling
address@hidden @code{END} pattern, and profiling
address@hidden
+ # gawk profile, created Mon Sep 29 05:16:21 2014
address@hidden CAUTION
-Normally,
-it is a fatal error to write to the @code{"to"} end of a two-way
-pipe which has been closed, and it is also a fatal error to read
-from the @code{"from"} end of a two-way pipe that has been closed.
+ # BEGIN rule(s)
-You may set @code{PROCINFO["@var{command}", "NONFATAL"]} to
-make such operations become nonfatal. If you do so, you then need
-to check @code{ERRNO} after each @code{print}, @code{printf},
-or @code{getline}.
address@hidden, for more information.
address@hidden quotation
+ BEGIN @{
+ 1 print "First BEGIN rule"
+ @}
address@hidden @command{gawk}, @code{PROCINFO} array in
address@hidden @code{PROCINFO} array, and communications via ptys
-You may also use pseudo-ttys (ptys) for
-two-way communication instead of pipes, if your system supports them.
-This is done on a per-command basis, by setting a special element
-in the @code{PROCINFO} array
-(@pxref{Auto-set}),
-like so:
+ BEGIN @{
+ 1 print "Second BEGIN rule"
+ @}
address@hidden
-command = "sort -nr" # command, save in convenience variable
-PROCINFO[command, "pty"] = 1 # update PROCINFO
-print @dots{} |& command # start two-way pipe
address@hidden
address@hidden example
+ # Rule(s)
address@hidden
-If your system does not have ptys, or if all the system's ptys are in use,
address@hidden automatically falls back to using regular pipes.
+ 5 /foo/ @{ # 2
+ 2 print "matched /foo/, gosh"
+ 6 for (i = 1; i <= 3; i++) @{
+ 6 sing()
+ @}
+ @}
-Using ptys usually avoids the buffer deadlock issues described earlier,
-at some loss in performance. This is because the tty driver buffers
-and sends data line-by-line. On systems with the @command{stdbuf}
-(part of the @uref{http://www.gnu.org/software/coreutils/coreutils.html,
-GNU Coreutils package}), you can use that program instead of ptys.
+ 5 @{
+ 5 if (/foo/) @{ # 2
+ 2 print "if is true"
+ 3 @} else @{
+ 3 print "else is true"
+ @}
+ @}
-Note also that ptys are not fully transparent. Certain binary control
-codes, such @kbd{Ctrl-d} for end-of-file, are interpreted by the tty
-driver and not passed through.
+ # END rule(s)
address@hidden CAUTION
-Finally, coprocesses open up the possibility of @dfn{deadlock} between
address@hidden and the program running in the coprocess. This can occur
-if you send ``too much'' data to the coprocess before reading any back;
-each process is blocked writing data with noone available to read what
-they've already written. There is no workaround for deadlock; careful
-programming and knowledge of the behavior of the coprocess are required.
address@hidden quotation
+ END @{
+ 1 print "First END rule"
+ @}
address@hidden TCP/IP Networking
address@hidden Using @command{gawk} for Network Programming
address@hidden advanced features, network programming
address@hidden networks, programming
address@hidden TCP/IP
address@hidden @code{/inet/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet/@dots{}} (@command{gawk})
address@hidden @code{/inet4/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet4/@dots{}} (@command{gawk})
address@hidden @code{/inet6/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet6/@dots{}} (@command{gawk})
address@hidden @code{EMRED}
address@hidden
address@hidden
address@hidden:@*
-@ @ @ @ @i{A host is a host from coast to coast,@*
-@ @ @ @ and nobody talks to a host that's close,@*
-@ @ @ @ unless the host that isn't address@hidden
-@ @ @ @ is busy, hung, or dead.}
address@hidden Mike O'Brien (aka Mr.@: Protocol)
address@hidden quotation
address@hidden ifnotdocbook
+ END @{
+ 1 print "Second END rule"
+ @}
address@hidden
-<blockquote>
-<attribution>Mike O'Brien (aka Mr. Protocol)</attribution>
-<literallayout class="normal"><literal>EMRED</literal>:
- <emphasis>A host is a host from coast to
coast,</emphasis>
- <emphasis>and no-one can talk to host that's
close,</emphasis>
- <emphasis>unless the host that isn't close</emphasis>
- <emphasis>is busy, hung, or
dead.</emphasis></literallayout>
-</blockquote>
address@hidden docbook
-In addition to being able to open a two-way pipeline to a coprocess
-on the same system
-(@pxref{Two-way I/O}),
-it is possible to make a two-way connection to
-another process on another system across an IP network connection.
+ # Functions, listed alphabetically
-You can think of this as just a @emph{very long} two-way pipeline to
-a coprocess.
-The way @command{gawk} decides that you want to use TCP/IP networking is
-by recognizing special @value{FN}s that begin with one of @samp{/inet/},
address@hidden/inet4/}, or @samp{/inet6/}.
+ 6 function sing(dummy)
+ @{
+ 6 print "I gotta be me!"
+ @}
address@hidden example
-The full syntax of the special @value{FN} is
address@hidden/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
-The components are:
-
address@hidden @var
address@hidden net-type
-Specifies the kind of Internet connection to make.
-Use @samp{/inet4/} to force IPv4, and
address@hidden/inet6/} to force IPv6.
-Plain @samp{/inet/} (which used to be the only option) uses
-the system default, most likely IPv4.
-
address@hidden protocol
-The protocol to use over IP. This must be either @samp{tcp}, or
address@hidden, for a TCP or UDP IP connection,
-respectively. TCP should be used for most applications.
-
address@hidden local-port
address@hidden @code{getaddrinfo()} function (C library)
-The local TCP or UDP port number to use. Use a port number of @samp{0}
-when you want the system to pick a port. This is what you should do
-when writing a TCP or UDP client.
-You may also use a well-known service name, such as @samp{smtp}
-or @samp{http}, in which case @command{gawk} attempts to determine
-the predefined port number using the C @code{getaddrinfo()} function.
-
address@hidden remote-host
-The IP address or fully qualified domain name of the Internet
-host to which you want to connect.
-
address@hidden remote-port
-The TCP or UDP port number to use on the given @var{remote-host}.
-Again, use @samp{0} if you don't care, or else a well-known
-service name.
address@hidden table
-
address@hidden @command{gawk}, @code{ERRNO} variable in
address@hidden @code{ERRNO} variable
address@hidden NOTE
-Failure in opening a two-way socket will result in a nonfatal error
-being returned to the calling code. The value of @code{ERRNO} indicates
-the error (@pxref{Auto-set}).
address@hidden quotation
-
-Consider the following very simple example:
-
address@hidden
-BEGIN @{
- Service = "/inet/tcp/0/localhost/daytime"
- Service |& getline
- print $0
- close(Service)
address@hidden
address@hidden example
-
-This program reads the current date and time from the local system's
-TCP @code{daytime} server.
-It then prints the results and closes the connection.
-
-Because this topic is extensive, the use of @command{gawk} for
-TCP/IP programming is documented separately.
address@hidden
-See
address@hidden, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
address@hidden ifinfo
address@hidden
-See
address@hidden://www.gnu.org/software/gawk/manual/gawkinet/,
address@hidden@value{GAWKINETTITLE}}},
-which comes as part of the @command{gawk} distribution,
address@hidden ifnotinfo
-for a much more complete introduction and discussion, as well as
-extensive examples.
-
address@hidden NOTE
address@hidden can only open direct sockets. There is currently
-no way to access services available over Secure Socket Layer
-(SSL); this includes any web service whose URL starts with @samp{https://}.
address@hidden quotation
-
-
address@hidden Profiling
address@hidden Profiling Your @command{awk} Programs
address@hidden @command{awk} programs, profiling
address@hidden profiling @command{awk} programs
address@hidden @code{awkprof.out} file
address@hidden files, @code{awkprof.out}
-
-You may produce execution traces of your @command{awk} programs.
-This is done by passing the option @option{--profile} to @command{gawk}.
-When @command{gawk} has finished running, it creates a profile of your program
in a file
-named @file{awkprof.out}. Because it is profiling, it also executes up to 45%
slower than
address@hidden normally does.
-
address@hidden @option{--profile} option
-As shown in the following example,
-the @option{--profile} option can be used to change the name of the file
-where @command{gawk} will write the profile:
-
address@hidden
-gawk --profile=myprog.prof -f myprog.awk data1 data2
address@hidden example
-
address@hidden
-In the preceding example, @command{gawk} places the profile in
address@hidden instead of in @file{awkprof.out}.
-
-Here is a sample session showing a simple @command{awk} program,
-its input data, and the results from running @command{gawk} with the
address@hidden option. First, the @command{awk} program:
-
address@hidden
-BEGIN @{ print "First BEGIN rule" @}
-
-END @{ print "First END rule" @}
-
-/foo/ @{
- print "matched /foo/, gosh"
- for (i = 1; i <= 3; i++)
- sing()
address@hidden
-
address@hidden
- if (/foo/)
- print "if is true"
- else
- print "else is true"
address@hidden
-
-BEGIN @{ print "Second BEGIN rule" @}
-
-END @{ print "Second END rule" @}
-
-function sing( dummy)
address@hidden
- print "I gotta be me!"
address@hidden
address@hidden example
-
-Following is the input data:
-
address@hidden
-foo
-bar
-baz
-foo
-junk
address@hidden example
-
-Here is the @file{awkprof.out} that results from running the
address@hidden profiler on this program and data (this example also
-illustrates that @command{awk} programmers sometimes get up very early
-in the morning to work):
-
address@hidden @code{BEGIN} pattern, and profiling
address@hidden @code{END} pattern, and profiling
address@hidden
- # gawk profile, created Mon Sep 29 05:16:21 2014
-
- # BEGIN rule(s)
-
- BEGIN @{
- 1 print "First BEGIN rule"
- @}
-
- BEGIN @{
- 1 print "Second BEGIN rule"
- @}
-
- # Rule(s)
-
- 5 /foo/ @{ # 2
- 2 print "matched /foo/, gosh"
- 6 for (i = 1; i <= 3; i++) @{
- 6 sing()
- @}
- @}
-
- 5 @{
- 5 if (/foo/) @{ # 2
- 2 print "if is true"
- 3 @} else @{
- 3 print "else is true"
- @}
- @}
-
- # END rule(s)
-
- END @{
- 1 print "First END rule"
- @}
-
- END @{
- 1 print "Second END rule"
- @}
-
-
- # Functions, listed alphabetically
-
- 6 function sing(dummy)
- @{
- 6 print "I gotta be me!"
- @}
address@hidden example
-
-This example illustrates many of the basic features of profiling output.
-They are as follows:
+This example illustrates many of the basic features of profiling output.
+They are as follows:
@itemize @value{BULLET}
@item
@@ -31502,6 +31208,299 @@ program being debugged, but occasionally it can.
@end itemize
address@hidden name-spaces Name-space Name-spaces}
address@hidden Namespaces
address@hidden Namespaces in @command{gawk}
+
+This @value{CHAPTER} describes a feature that is specific to @command{gawk}.
+
address@hidden
+* Global Namespace:: The global namespace in standard @command{awk}.
+* Qualified Names:: How to qualify names with a namespace.
+* Default Namespace:: The default namespace.
+* Changing The Namespace:: How to change the namespace.
+* Internal Name Management:: How names are stored internally.
+* Namespace Example:: An example of code using a namespace.
+* Namespace Misc:: Namespace notes for developers.
address@hidden menu
+
address@hidden Global Namespace
address@hidden Standard @command{awk}'s Single Namespace
+
+In standard @command{awk}, there is a single, global, @dfn{namespace}.
+This means that @emph{all} function names and global variable names must
+be unique. For example, two different @command{awk} source files cannot
+both define a function named @code{min()}, or define an array named
@code{data}.
+
+This situation is okay when programs are small, say a few hundred
+lines, or even a few thousand, but it prevents the development of
+reusable libraries of @command{awk} functions, and can inadvertently
+cause independently-developed library files to accidentally step on each
+other's ``private'' global variables
+(@pxref{Library Names}).
+
+Most other programming languages solve this issue by providing some kind
+of namespace control: a way to say ``this function is in namespace @var{xxx},
+and that function is in namespace @var{yyy}.'' (Of course, there is then
+still a single namespace for the namespaces, but the hope is that there
+are much fewer namespaces in use by any given program, and thus much
+less chance for collisions.) These facilities are sometimes referred
+to as @dfn{packages} or @dfn{modules}.
+
+Starting with @value{PVERSION} @strong{FIXME} 5.0, @command{gawk} provides a
+mechanism to put functions and global variables into separate namespaces.
+
address@hidden Qualified Names
address@hidden Qualified Names
+
+A @dfn{qualified name} is an identifier that includes a namespace
+name and the namespace separator, @code{::}. For example, one
+might have a function named @code{posix::getpid()}. Here, the
+namespace is @code{posix} and the function name within the namespace
+is @code{getpid()}. The namespace and variable or function name are
+separated by a double-colon. Only one such separator is allowed in a
+qualified name.
+
address@hidden NOTE
+Unlike C++, the @code{::} is @emph{not} an operator. No spaces are
+allowed between the namespace name, the @code{::}, and the rest of
+the name.
address@hidden quotation
+
+You must use fully qualified names from one namespace to access variables
+and functions in another. This is especially important when using
+variable names to index the special @code{SYMTAB} array (@pxref{Auto-set}),
+and when making indirect function calls (@pxref{Indirect Calls}).
+
+It is a syntax error to use any @command{gawk} reserved word (such
+as @code{if} or @code{for}), or the name of any built-in function
+(such as @code{sin()} or @code{gsub()}) as the second part of a
+fully qualified name. Using such an identifier as a namespace
+name (currently) @emph{is} allowed, but produces a lint warning.
+
address@hidden pre-defined variable names may be used:
address@hidden::NR} is valid, if possibly not all that useful.
+
address@hidden Default Namespace
address@hidden The Default Namespace
+
+The default namespace, not surprisingly, is @samp{awk}.
+All of the predefined @command{awk} and @command{gawk} variables
+are in this namespace, and thus have qualified names like
address@hidden::ARGC}, @code{awk::NF}, and so on.
+
+Furthermore, even when you have changed the namespace for your
+current source file (@pxref{Changing The Namespace}), @command{gawk}
+forces unqualified identifiers whose names are all uppercase letters
+to be in the @samp{awk} namespace. This makes it possible for you to easily
+reference @command{gawk}'s global variables from different namespaces.
+
+It is a syntax error to use qualified names for function parameter names.
+
address@hidden Changing The Namespace
address@hidden Changing The Namespace
+
+In order to set the current namespace, use an @samp{@@namespace} directive
+at the top level of your program:
+
address@hidden
+@@namespace "passwd"
+
+BEGIN @{ @dots{} @}
address@hidden
address@hidden example
+
+After this directive, all simple non-completely-uppercase identifiers are
+placed into the @code{passwd} namespace.
+
+You can change the namespace multiple times within a single
+source file, although this is likely to become confusing if you
+do it too much.
+
address@hidden NOTE
+Association of unqualified identifiers to a namespace is handled while
+your program is being parsed by @command{gawk} and before it starts
+to run. There is no concept of a ``current'' namespace once your program
+starts executing. Be sure you understand this.
address@hidden quotation
+
+Each source file for @option{-i} and @option{-f} starts out with
+an implicit @samp{@@namespace "awk"}. Similarly, each chunk of
+command-line code supplied with @option{-e} has such an implicit
+initial statement (@pxref{Options}).
+
+The use of @samp{@@namespace} has no influence upon the order of execution
+of @code{BEGIN}, @code{BEGINFILE}, @code{END}, and @code{ENDFILE} rules.
+
address@hidden Internal Name Management
address@hidden Internal Name Management
+
+For backwards compatibility, all identifiers in the @samp{awk} namespace
+are stored internally as unadorned identifiers. This is mainly relevant
+when using such identifiers as indices for @code{SYMTAB}, @code{FUNCTAB},
+and @code{PROCINFO["identifiers"]} (@pxref{Auto-set}), and for use in
+indirect function calls (@pxref{Indirect Calls}).
+
+In program code, to refer to variables and functions in the @samp{awk}
+namespace from another namespace, you must still use the @samp{awk::}
+prefix. For example:
+
address@hidden
+@@namespace "awk" @ii{This is the default namespace}
+
+BEGIN @{
+ Title = "My Report" @ii{Fully qualified name is} awk::Title
address@hidden
+
+@@namespace "report" @ii{Now in} report @ii{namespace}
+
+function compute() @ii{This is really} report::compute()
address@hidden
+ print awk::Title @ii{But would be} SYMTAB["Title"]
+ @dots{}
address@hidden
address@hidden example
+
address@hidden Namespace Example
address@hidden Namespace Example
+
address@hidden
+# FIXME: fix this up for real, dates etc
+#
+# passwd.awk --- access password file information
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
+# Revised October 2000
+# Revised December 2010
+#
+# Reworked for namespaces May 2017
+
+@@namespace "passwd"
+
+BEGIN @{
+ # tailor this to suit your system
+ Awklib = "/usr/local/libexec/awk/"
address@hidden
+
+function Init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
address@hidden
+ if (Inited)
+ return
+
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+ using_fpat = (PROCINFO["FS"] == "FPAT")
+ FS = ":"
+ RS = "\n"
+
+ pwcat = Awklib "pwcat"
+ while ((pwcat | getline) > 0) @{
+ Byname[$1] = $0
+ Byuid[$3] = $0
+ Bycount[++Total] = $0
+ @}
+ close(pwcat)
+ Count = 0
+ Inited = 1
+ FS = oldfs
+ if (using_fw)
+ FIELDWIDTHS = FIELDWIDTHS
+ else if (using_fpat)
+ FPAT = FPAT
+ RS = oldrs
+ $0 = olddol0
address@hidden
+
+function Getpwnam(name)
address@hidden
+ Init()
+ return Byname[name]
address@hidden
+
+function Getpwuid(uid)
address@hidden
+ Init()
+ return Byuid[uid]
address@hidden
+
+function Getpwent()
address@hidden
+ Init()
+ if (Count < Total)
+ return Bycount[++Count]
+ return ""
address@hidden
+
+function Endpwent()
address@hidden
+ Count = 0
address@hidden
+
+# Compatibility:
+
+@@namespace "awk"
+
+function getpwnam(name)
address@hidden
+ return passwd::Getpwnam(name)
address@hidden
+
+function getpwuid(uid)
address@hidden
+ return passwd::Getpwuid(uid)
address@hidden
+
+function getpwent()
address@hidden
+ return passwd::Getpwent()
address@hidden
+
+function endpwent()
address@hidden
+ passwd::Endpwent()
address@hidden
address@hidden example
+
address@hidden Namespace Misc
address@hidden Miscellaneous Notes
+
+Other notes for reviewers:
+
address@hidden @asis
address@hidden Profiler:
+When profiling, we include the namespace in the @code{Op_Rule}
+and @code{Op_Func} instructions. If the namespace
+is different from the previous
+one, output an @samp{@@namespace} statement. For each identifier,
+if it starts with the current namespace, output only the simple part.
+
address@hidden Debugger:
+Simply print fully qualified names all the time. Maybe allow a
address@hidden @var{xxx}} command in the debugger to set the
+namespace and it will use that to create fully qualified names?
+Have to be careful about all uppercase names though.
+
address@hidden How does this affect @code{@@include}?
+Basically @code{@@include} should push and pop the namespace. Each
address@hidden@@include} saves the current namespace and starts over with
+namespace @samp{awk} until an @code{@@namespace} is seen.
+
address@hidden Extension functions
+Revise the current macros to pass @code{"awk"} as the namespace
+argument and add new macros with @samp{_ns} or some such in the name that
+pass the namespace of the extension. This preserves backwards
+compatibility at the source level while providing access to namespaces
+as needed.
+
+Actually, since we've decided that @code{awk} namespace variables and
+function are stored unadorned, the current macros that pass @code{""}
+would continue to work. Internally, we need to recognize @code{"awk"} and
address@hidden fully qualify the name before storing it in the symbol table.
address@hidden table
+
@node Arbitrary Precision Arithmetic
@chapter Arithmetic and Arbitrary-Precision Arithmetic with @command{gawk}
@cindex arbitrary precision
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 61da6a9..60e7bcf 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -449,12 +449,12 @@ particular records in a file and perform operations upon
them.
* Library Functions:: A Library of @command{awk} Functions.
* Sample Programs:: Many @command{awk} programs with complete
explanations.
-* Namespaces:: How namespaces work in @command{gawk}.
* Advanced Features:: Stuff for advanced users, specific to
@command{gawk}.
* Internationalization:: Getting @command{gawk} to speak your
language.
* Debugger:: The @command{gawk} debugger.
+* Namespaces:: How namespaces work in @command{gawk}.
* Arbitrary Precision Arithmetic:: Arbitrary precision arithmetic with
@command{gawk}.
* Dynamic Extensions:: Adding new built-in functions to
@@ -847,12 +847,6 @@ particular records in a file and perform operations upon
them.
time on their hands.
* Programs Summary:: Summary of programs.
* Programs Exercises:: Exercises.
-* Global Namespace:: The global namespace in standard
@command{awk}.
-* Qualified Names:: How to qualify names with a namespace.
-* Default Namespace:: The default namespace.
-* Changing The Namespace:: How to change the namespace.
-* Namespace Example:: An example of code using a namespace.
-* Namespace Misc:: Namespace notes for developers.
* Nondecimal Data:: Allowing nondecimal input data.
* Array Sorting:: Facilities for controlling array
traversal and sorting arrays.
@@ -896,6 +890,12 @@ particular records in a file and perform operations upon
them.
* Readline Support:: Readline support.
* Limitations:: Limitations and future plans.
* Debugging Summary:: Debugging summary.
+* Global Namespace:: The global namespace in standard
@command{awk}.
+* Qualified Names:: How to qualify names with a namespace.
+* Default Namespace:: The default namespace.
+* Changing The Namespace:: How to change the namespace.
+* Namespace Example:: An example of code using a namespace.
+* Namespace Misc:: Namespace notes for developers.
* Computer Arithmetic:: A quick intro to computer math.
* Math Definitions:: Defining terms used.
* MPFR features:: The MPFR features in @command{gawk}.
@@ -26621,1349 +26621,1055 @@ It contains the following chapters:
@end itemize
@end ifdocbook
address@hidden name-spaces Name-space Name-spaces}
address@hidden Namespaces
address@hidden Namespaces in @command{gawk}
-
-This @value{CHAPTER} describes a feature that is specific to @command{gawk}.
-
address@hidden
-* Global Namespace:: The global namespace in standard @command{awk}.
-* Qualified Names:: How to qualify names with a namespace.
-* Default Namespace:: The default namespace.
-* Changing The Namespace:: How to change the namespace.
-* Internal Name Management:: How names are stored internally.
-* Namespace Example:: An example of code using a namespace.
-* Namespace Misc:: Namespace notes for developers.
address@hidden menu
-
address@hidden Global Namespace
address@hidden Standard @command{awk}'s Single Namespace
-
-In standard @command{awk}, there is a single, global, @dfn{namespace}.
-This means that @emph{all} function names and global variable names must
-be unique. For example, two different @command{awk} source files cannot
-both define a function named @code{min()}, or define an array named
@code{data}.
address@hidden Advanced Features
address@hidden Advanced Features of @command{gawk}
address@hidden @command{gawk}, features, advanced
address@hidden advanced features, @command{gawk}
address@hidden
+Contributed by: Peter Langston <address@hidden>
-This situation is okay when programs are small, say a few hundred
-lines, or even a few thousand, but it prevents the development of
-reusable libraries of @command{awk} functions, and can inadvertently
-cause independently-developed library files to accidentally step on each
-other's ``private'' global variables
-(@pxref{Library Names}).
+ Found in Steve English's "signature" line:
-Most other programming languages solve this issue by providing some kind
-of namespace control: a way to say ``this function is in namespace @var{xxx},
-and that function is in namespace @var{yyy}.'' (Of course, there is then
-still a single namespace for the namespaces, but the hope is that there
-are much fewer namespaces in use by any given program, and thus much
-less chance for collisions.) These facilities are sometimes referred
-to as @dfn{packages} or @dfn{modules}.
+"Write documentation as if whoever reads it is a violent psychopath
+who knows where you live."
address@hidden ignore
address@hidden Langston, Peter
address@hidden English, Steve
address@hidden
address@hidden documentation as if whoever reads it is
+a violent psychopath who knows where you live.}
address@hidden Steve English, as quoted by Peter Langston
address@hidden quotation
-Starting with @value{PVERSION} @strong{FIXME} 5.0, @command{gawk} provides a
-mechanism to put functions and global variables into separate namespaces.
+This @value{CHAPTER} discusses advanced features in @command{gawk}.
+It's a bit of a ``grab bag'' of items that are otherwise unrelated
+to each other.
+First, we look at a command-line option that allows @command{gawk} to recognize
+nondecimal numbers in input data, not just in @command{awk}
+programs.
+Then, @command{gawk}'s special features for sorting arrays are presented.
+Next, two-way I/O, discussed briefly in earlier parts of this
address@hidden, is described in full detail, along with the basics
+of TCP/IP networking. Finally, we see how @command{gawk}
+can @dfn{profile} an @command{awk} program, making it possible to tune
+it for performance.
address@hidden Qualified Names
address@hidden Qualified Names
address@hidden FULLXREF ON
+Additional advanced features are discussed in separate @value{CHAPTER}s of
their
+own:
-A @dfn{qualified name} is an identifier that includes a namespace
-name and the namespace separator, @code{::}. For example, one
-might have a function named @code{posix::getpid()}. Here, the
-namespace is @code{posix} and the function name within the namespace
-is @code{getpid()}. The namespace and variable or function name are
-separated by a double-colon. Only one such separator is allowed in a
-qualified name.
address@hidden @value{BULLET}
address@hidden
address@hidden, discusses how to internationalize
+your @command{awk} programs, so that they can speak multiple
+national languages.
address@hidden NOTE
-Unlike C++, the @code{::} is @emph{not} an operator. No spaces are
-allowed between the namespace name, the @code{::}, and the rest of
-the name.
address@hidden quotation
address@hidden
address@hidden, describes @command{gawk}'s built-in command-line
+debugger for debugging @command{awk} programs.
-You must use fully qualified names from one namespace to access variables
-and functions in another. This is especially important when using
-variable names to index the special @code{SYMTAB} array (@pxref{Auto-set}),
-and when making indirect function calls (@pxref{Indirect Calls}).
address@hidden
address@hidden Precision Arithmetic}, describes how you can use
address@hidden to perform arbitrary-precision arithmetic.
-It is a syntax error to use any @command{gawk} reserved word (such
-as @code{if} or @code{for}), or the name of any built-in function
-(such as @code{sin()} or @code{gsub()}) as the second part of a
-fully qualified name. Using such an identifier as a namespace
-name (currently) @emph{is} allowed, but produces a lint warning.
address@hidden
address@hidden Extensions},
+discusses the ability to dynamically add new built-in functions to
address@hidden
address@hidden itemize
address@hidden FULLXREF OFF
address@hidden pre-defined variable names may be used:
address@hidden::NR} is valid, if possibly not all that useful.
address@hidden
+* Nondecimal Data:: Allowing nondecimal input data.
+* Array Sorting:: Facilities for controlling array traversal and
+ sorting arrays.
+* Two-way I/O:: Two-way communications with another process.
+* TCP/IP Networking:: Using @command{gawk} for network programming.
+* Profiling:: Profiling your @command{awk} programs.
+* Advanced Features Summary:: Summary of advanced features.
address@hidden menu
address@hidden Default Namespace
address@hidden The Default Namespace
address@hidden Nondecimal Data
address@hidden Allowing Nondecimal Input Data
address@hidden @option{--non-decimal-data} option
address@hidden advanced features, nondecimal input data
address@hidden input, address@hidden nondecimal
address@hidden constants, nondecimal
-The default namespace, not surprisingly, is @samp{awk}.
-All of the predefined @command{awk} and @command{gawk} variables
-are in this namespace, and thus have qualified names like
address@hidden::ARGC}, @code{awk::NF}, and so on.
+If you run @command{gawk} with the @option{--non-decimal-data} option,
+you can have nondecimal values in your input data:
-Furthermore, even when you have changed the namespace for your
-current source file (@pxref{Changing The Namespace}), @command{gawk}
-forces unqualified identifiers whose names are all uppercase letters
-to be in the @samp{awk} namespace. This makes it possible for you to easily
-reference @command{gawk}'s global variables from different namespaces.
address@hidden
+$ @kbd{echo 0123 123 0x123 |}
+> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n", $1, $2, $3 @}'}
address@hidden 83, 123, 291
address@hidden example
-It is a syntax error to use qualified names for function parameter names.
+For this feature to work, write your program so that
address@hidden treats your data as numeric:
address@hidden Changing The Namespace
address@hidden Changing The Namespace
address@hidden
+$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
address@hidden 0123 123 0x123
address@hidden example
-In order to set the current namespace, use an @samp{@@namespace} directive
-at the top level of your program:
address@hidden
+The @code{print} statement treats its expressions as strings.
+Although the fields can act as numbers when necessary,
+they are still strings, so @code{print} does not try to treat them
+numerically. You need to add zero to a field to force it to
+be treated as a number. For example:
@example
-@@namespace "passwd"
-
-BEGIN @{ @dots{} @}
address@hidden
+$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
+> @address@hidden print $1, $2, $3}
+> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
address@hidden 0123 123 0x123
address@hidden 83 123 291
@end example
-After this directive, all simple non-completely-uppercase identifiers are
-placed into the @code{passwd} namespace.
+Because it is common to have decimal data with leading zeros, and because
+using this facility could lead to surprising results, the default is to leave
it
+disabled. If you want it, you must explicitly request it.
-You can change the namespace multiple times within a single
-source file, although this is likely to become confusing if you
-do it too much.
address@hidden programming conventions, @code{--non-decimal-data} option
address@hidden @option{--non-decimal-data} option, @code{strtonum()} function
and
address@hidden @code{strtonum()} function (@command{gawk}),
@code{--non-decimal-data} option and
address@hidden CAUTION
address@hidden of this option is not recommended.}
+It can break old programs very badly.
+Instead, use the @code{strtonum()} function to convert your data
+(@pxref{String Functions}).
+This makes your programs easier to write and easier to read, and
+leads to less surprising results.
address@hidden NOTE
-Association of unqualified identifiers to a namespace is handled while
-your program is being parsed by @command{gawk} and before it starts
-to run. There is no concept of a ``current'' namespace once your program
-starts executing. Be sure you understand this.
+This option may disappear in a future version of @command{gawk}.
@end quotation
-Each source file for @option{-i} and @option{-f} starts out with
-an implicit @samp{@@namespace "awk"}. Similarly, each chunk of
-command-line code supplied with @option{-e} has such an implicit
-initial statement (@pxref{Options}).
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
-The use of @samp{@@namespace} has no influence upon the order of execution
-of @code{BEGIN}, @code{BEGINFILE}, @code{END}, and @code{ENDFILE} rules.
address@hidden lets you control the order in which a
address@hidden (@var{indx} in @var{array})}
+loop traverses an array.
address@hidden Internal Name Management
address@hidden Internal Name Management
+In addition, two built-in functions, @code{asort()} and @code{asorti()},
+let you sort arrays based on the array values and indices, respectively.
+These two functions also provide control over the sorting criteria used
+to order the elements during sorting.
-For backwards compatibility, all identifiers in the @samp{awk} namespace
-are stored internally as unadorned identifiers. This is mainly relevant
-when using such identifiers as indices for @code{SYMTAB}, @code{FUNCTAB},
-and @code{PROCINFO["identifiers"]} (@pxref{Auto-set}), and for use in
-indirect function calls (@pxref{Indirect Calls}).
address@hidden
+* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
+* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
address@hidden menu
-In program code, to refer to variables and functions in the @samp{awk}
-namespace from another namespace, you must still use the @samp{awk::}
-prefix. For example:
address@hidden Controlling Array Traversal
address@hidden Controlling Array Traversal
address@hidden
-@@namespace "awk" @ii{This is the default namespace}
+By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
+scans an array is not defined; it is generally based upon
+the internal implementation of arrays inside @command{awk}.
-BEGIN @{
- Title = "My Report" @ii{Fully qualified name is} awk::Title
address@hidden
+Often, though, it is desirable to be able to loop over the elements
+in a particular order that you, the programmer, choose. @command{gawk}
+lets you do this.
-@@namespace "report" @ii{Now in} report @ii{namespace}
address@hidden Scanning} describes how you can assign special,
+predefined values to @code{PROCINFO["sorted_in"]} in order to
+control the order in which @command{gawk} traverses an array
+during a @code{for} loop.
-function compute() @ii{This is really} report::compute()
+In addition, the value of @code{PROCINFO["sorted_in"]} can be a
+function address@hidden is why the predefined sorting orders
+start with an @samp{@@} character, which cannot be part of an identifier.}
+This lets you traverse an array based on any custom criterion.
+The array elements are ordered according to the return value of this
+function. The comparison function should be defined with at least
+four arguments:
+
address@hidden
+function comp_func(i1, v1, i2, v2)
@{
- print awk::Title @ii{But would be} SYMTAB["Title"]
- @dots{}
+ @var{compare elements 1 and 2 in some fashion}
+ @var{return < 0; 0; or > 0}
@}
@end example
address@hidden Namespace Example
address@hidden Namespace Example
+Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
+are the corresponding values of the two elements being compared.
+Either @code{v1} or @code{v2}, or both, can be arrays if the array being
+traversed contains subarrays as values.
+(@xref{Arrays of Arrays} for more information about subarrays.)
+The three possible return values are interpreted as follows:
+
address@hidden @code
address@hidden comp_func(i1, v1, i2, v2) < 0
+Index @code{i1} comes before index @code{i2} during loop traversal.
+
address@hidden comp_func(i1, v1, i2, v2) == 0
+Indices @code{i1} and @code{i2}
+come together, but the relative order with respect to each other is undefined.
+
address@hidden comp_func(i1, v1, i2, v2) > 0
+Index @code{i1} comes after index @code{i2} during loop traversal.
address@hidden table
+
+Our first comparison function can be used to scan an array in
+numerical order of the indices:
@example
-# FIXME: fix this up for real, dates etc
-#
-# passwd.awk --- access password file information
-#
-# Arnold Robbins, arnold@@skeeve.com, Public Domain
-# May 1993
-# Revised October 2000
-# Revised December 2010
-#
-# Reworked for namespaces May 2017
+function cmp_num_idx(i1, v1, i2, v2)
address@hidden
+ # numerical index comparison, ascending order
+ return (i1 - i2)
address@hidden
address@hidden example
-@@namespace "passwd"
+Our second function traverses an array based on the string order of
+the element values rather than by indices:
-BEGIN @{
- # tailor this to suit your system
- Awklib = "/usr/local/libexec/awk/"
address@hidden
+function cmp_str_val(i1, v1, i2, v2)
address@hidden
+ # string value comparison, ascending order
+ v1 = v1 ""
+ v2 = v2 ""
+ if (v1 < v2)
+ return -1
+ return (v1 != v2)
@}
address@hidden example
-function Init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
+The third
+comparison function makes all numbers, and numeric strings without
+any leading or trailing spaces, come out first during loop traversal:
+
address@hidden
+function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
@{
- if (Inited)
- return
+ # numbers before string value comparison, ascending order
+ n1 = v1 + 0
+ n2 = v2 + 0
+ if (n1 == v1)
+ return (n2 == v2) ? (n1 - n2) : -1
+ else if (n2 == v2)
+ return 1
+ return (v1 < v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
- oldfs = FS
- oldrs = RS
- olddol0 = $0
- using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
- using_fpat = (PROCINFO["FS"] == "FPAT")
- FS = ":"
- RS = "\n"
+Here is a main program to demonstrate how @command{gawk}
+behaves using each of the previous functions:
- pwcat = Awklib "pwcat"
- while ((pwcat | getline) > 0) @{
- Byname[$1] = $0
- Byuid[$3] = $0
- Bycount[++Total] = $0
address@hidden
+BEGIN @{
+ data["one"] = 10
+ data["two"] = 20
+ data[10] = "one"
+ data[100] = 100
+ data[20] = "two"
+
+ f[1] = "cmp_num_idx"
+ f[2] = "cmp_str_val"
+ f[3] = "cmp_num_str_val"
+ for (i = 1; i <= 3; i++) @{
+ printf("Sort function: %s\n", f[i])
+ PROCINFO["sorted_in"] = f[i]
+ for (j in data)
+ printf("\tdata[%s] = %s\n", j, data[j])
+ print ""
@}
- close(pwcat)
- Count = 0
- Inited = 1
- FS = oldfs
- if (using_fw)
- FIELDWIDTHS = FIELDWIDTHS
- else if (using_fpat)
- FPAT = FPAT
- RS = oldrs
- $0 = olddol0
@}
address@hidden example
-function Getpwnam(name)
address@hidden
- Init()
- return Byname[name]
address@hidden
+Here are the results when the program is run:
-function Getpwuid(uid)
address@hidden
+$ @kbd{gawk -f compdemo.awk}
address@hidden Sort function: cmp_num_idx @ii{Sort by numeric index}
address@hidden data[two] = 20
address@hidden data[one] = 10 @ii{Both strings are numerically
zero}
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden data[100] = 100
address@hidden
address@hidden Sort function: cmp_str_val @ii{Sort by element values as
strings}
address@hidden data[one] = 10
address@hidden data[100] = 100 @ii{String 100 is less than
string 20}
address@hidden data[two] = 20
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden
address@hidden Sort function: cmp_num_str_val @ii{Sort all numeric values
before all strings}
address@hidden data[one] = 10
address@hidden data[two] = 20
address@hidden data[100] = 100
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden example
+
+Consider sorting the entries of a GNU/Linux system password file
+according to login name. The following program sorts records
+by a specific field position and can be used for this purpose:
+
address@hidden
+# passwd-sort.awk --- simple program to sort by field position
+# field position is specified by the global variable POS
+
+function cmp_field(i1, v1, i2, v2)
@{
- Init()
- return Byuid[uid]
+ # comparison by value, as string, and ascending order
+ return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
@}
-function Getpwent()
@{
- Init()
- if (Count < Total)
- return Bycount[++Count]
- return ""
+ for (i = 1; i <= NF; i++)
+ a[NR][i] = $i
@}
-function Endpwent()
address@hidden
- Count = 0
+END @{
+ PROCINFO["sorted_in"] = "cmp_field"
+ if (POS < 1 || POS > NF)
+ POS = 1
+ for (i in a) @{
+ for (j = 1; j <= NF; j++)
+ printf("%s%c", a[i][j], j < NF ? ":" : "")
+ print ""
+ @}
@}
address@hidden example
-# Compatibility:
+The first field in each entry of the password file is the user's login name,
+and the fields are separated by colons.
+Each record defines a subarray,
+with each field as an element in the subarray.
+Running the program produces the
+following output:
-@@namespace "awk"
address@hidden
+$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
address@hidden adm:x:3:4:adm:/var/adm:/sbin/nologin
address@hidden apache:x:48:48:Apache:/var/www:/sbin/nologin
address@hidden avahi:x:70:70:Avahi daemon:/:/sbin/nologin
address@hidden
address@hidden example
-function getpwnam(name)
address@hidden
- return passwd::Getpwnam(name)
address@hidden
+The comparison should normally always return the same value when given a
+specific pair of array elements as its arguments. If inconsistent
+results are returned, then the order is undefined. This behavior can be
+exploited to introduce random order into otherwise seemingly
+ordered data:
-function getpwuid(uid)
address@hidden
+function cmp_randomize(i1, v1, i2, v2)
@{
- return passwd::Getpwuid(uid)
+ # random order (caution: this may never terminate!)
+ return (2 - 4 * rand())
@}
address@hidden example
-function getpwent()
+As already mentioned, the order of the indices is arbitrary if two
+elements compare equal. This is usually not a problem, but letting
+the tied elements come out in arbitrary order can be an issue, especially
+when comparing item values. The partial ordering of the equal elements
+may change the next time the array is traversed, if other elements are added
to or
+removed from the array. One way to resolve ties when comparing elements
+with otherwise equal values is to include the indices in the comparison
+rules. Note that doing this may make the loop traversal less efficient,
+so consider it only if necessary. The following comparison functions
+force a deterministic order, and are based on the fact that the
+(string) indices of two elements are never equal:
+
address@hidden
+function cmp_numeric(i1, v1, i2, v2)
@{
- return passwd::Getpwent()
+ # numerical value (and index) comparison, descending order
+ return (v1 != v2) ? (v2 - v1) : (i2 - i1)
@}
-function endpwent()
+function cmp_string(i1, v1, i2, v2)
@{
- passwd::Endpwent()
+ # string value (and index) comparison, descending order
+ v1 = v1 i1
+ v2 = v2 i2
+ return (v1 > v2) ? -1 : (v1 != v2)
@}
@end example
address@hidden Namespace Misc
address@hidden Miscellaneous Notes
-
-Other notes for reviewers:
-
address@hidden @asis
address@hidden Profiler:
-When profiling, we can add an @code{Op_Namespace} to the start of each
-rule and function definition. If this is different than the previous
-one, output an @samp{@@namespace} statement. For each identifier,
-if it starts with the current namespace, output only the simple part.
-For all @samp{awk::XXX} if @samp{XXX} is all uppercase, strip off the
address@hidden::} part.
-
address@hidden Debugger:
-Simply print fully qualified names all the time. Maybe allow a
address@hidden @var{xxx}} command in the debugger to set the
-namespace and it will use that to create fully qualified names?
-Have to be careful about all uppercase names though.
-
address@hidden How does this affect @code{@@include}?
-Basically @code{@@include} should push and pop the namespace. Each
address@hidden@@include} saves the current namespace and starts over with
-namespace @samp{awk} until an @code{@@namespace} is seen.
-
address@hidden Extension functions
-Revise the current macros to pass @code{"awk"} as the namespace
-argument and add new macros with @samp{_ns} or some such in the name that
-pass the namespace of the extension. This preserves backwards
-compatibility at the source level while providing access to namespaces
-as needed.
-
-Actually, since we've decided that @code{awk} namespace variables and
-function are stored unadorned, the current macros that pass @code{""}
-would continue to work. Internally, we need to recognize @code{"awk"} and
address@hidden fully qualify the name before storing it in the symbol table.
address@hidden table
-
address@hidden Advanced Features
address@hidden Advanced Features of @command{gawk}
address@hidden @command{gawk}, features, advanced
address@hidden advanced features, @command{gawk}
address@hidden
-Contributed by: Peter Langston <address@hidden>
address@hidden Avoid using the term ``stable'' when describing the
unpredictable behavior
address@hidden if two items compare equal. Usually, the goal of a "stable
algorithm"
address@hidden is to maintain the original order of the items, which is a
meaningless
address@hidden concept for a list constructed from a hash.
- Found in Steve English's "signature" line:
+A custom comparison function can often simplify ordered loop
+traversal, and the sky is really the limit when it comes to
+designing such a function.
-"Write documentation as if whoever reads it is a violent psychopath
-who knows where you live."
address@hidden ignore
address@hidden Langston, Peter
address@hidden English, Steve
address@hidden
address@hidden documentation as if whoever reads it is
-a violent psychopath who knows where you live.}
address@hidden Steve English, as quoted by Peter Langston
address@hidden quotation
+When string comparisons are made during a sort, either for element
+values where one or both aren't numbers, or for element indices
+handled as strings, the value of @code{IGNORECASE}
+(@pxref{Built-in Variables}) controls whether
+the comparisons treat corresponding upper- and lowercase letters as
+equivalent or distinct.
-This @value{CHAPTER} discusses advanced features in @command{gawk}.
-It's a bit of a ``grab bag'' of items that are otherwise unrelated
-to each other.
-First, we look at a command-line option that allows @command{gawk} to recognize
-nondecimal numbers in input data, not just in @command{awk}
-programs.
-Then, @command{gawk}'s special features for sorting arrays are presented.
-Next, two-way I/O, discussed briefly in earlier parts of this
address@hidden, is described in full detail, along with the basics
-of TCP/IP networking. Finally, we see how @command{gawk}
-can @dfn{profile} an @command{awk} program, making it possible to tune
-it for performance.
+Another point to keep in mind is that in the case of subarrays,
+the element values can themselves be arrays; a production comparison
+function should use the @code{isarray()} function
+(@pxref{Type Functions})
+to check for this, and choose a defined sorting order for subarrays.
address@hidden FULLXREF ON
-Additional advanced features are discussed in separate @value{CHAPTER}s of
their
-own:
+All sorting based on @code{PROCINFO["sorted_in"]}
+is disabled in POSIX mode,
+because the @code{PROCINFO} array is not special in that case.
address@hidden @value{BULLET}
address@hidden
address@hidden, discusses how to internationalize
-your @command{awk} programs, so that they can speak multiple
-national languages.
+As a side note, sorting the array indices before traversing
+the array has been reported to add a 15% to 20% overhead to the
+execution time of @command{awk} programs. For this reason,
+sorted array traversal is not the default.
address@hidden
address@hidden, describes @command{gawk}'s built-in command-line
-debugger for debugging @command{awk} programs.
address@hidden The @command{gawk}
address@hidden maintainers believe that only the people who wish to use a
address@hidden feature should have to pay for it.
address@hidden
address@hidden Precision Arithmetic}, describes how you can use
address@hidden to perform arbitrary-precision arithmetic.
address@hidden Array Sorting Functions
address@hidden Sorting Array Values and Indices with @command{gawk}
address@hidden
address@hidden Extensions},
-discusses the ability to dynamically add new built-in functions to
address@hidden
address@hidden itemize
address@hidden FULLXREF OFF
address@hidden arrays, sorting
address@hidden
address@hidden @code{asort()} function (@command{gawk}), address@hidden sorting
address@hidden
address@hidden @code{asorti()} function (@command{gawk}), address@hidden sorting
address@hidden sort function, arrays, sorting
+In most @command{awk} implementations, sorting an array requires writing
+a @code{sort()} function. This can be educational for exploring
+different sorting algorithms, but usually that's not the point of the program.
address@hidden provides the built-in @code{asort()} and @code{asorti()}
+functions (@pxref{String Functions}) for sorting arrays. For example:
address@hidden
-* Nondecimal Data:: Allowing nondecimal input data.
-* Array Sorting:: Facilities for controlling array traversal and
- sorting arrays.
-* Two-way I/O:: Two-way communications with another process.
-* TCP/IP Networking:: Using @command{gawk} for network programming.
-* Profiling:: Profiling your @command{awk} programs.
-* Advanced Features Summary:: Summary of advanced features.
address@hidden menu
address@hidden
address@hidden the array} data
+n = asort(data)
+for (i = 1; i <= n; i++)
+ @var{do something with} data[i]
address@hidden example
address@hidden Nondecimal Data
address@hidden Allowing Nondecimal Input Data
address@hidden @option{--non-decimal-data} option
address@hidden advanced features, nondecimal input data
address@hidden input, address@hidden nondecimal
address@hidden constants, nondecimal
+After the call to @code{asort()}, the array @code{data} is indexed from 1
+to some number @var{n}, the total number of elements in @code{data}.
+(This count is @code{asort()}'s return value.)
address@hidden @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
+The default comparison is based on the type of the elements
+(@pxref{Typing and Comparison}).
+All numeric values come before all string values,
+which in turn come before all subarrays.
-If you run @command{gawk} with the @option{--non-decimal-data} option,
-you can have nondecimal values in your input data:
address@hidden side effects, @code{asort()} function
+An important side effect of calling @code{asort()} is that
address@hidden array's original indices are irrevocably lost}.
+As this isn't always desirable, @code{asort()} accepts a
+second argument:
@example
-$ @kbd{echo 0123 123 0x123 |}
-> @kbd{gawk --non-decimal-data '@{ printf "%d, %d, %d\n", $1, $2, $3 @}'}
address@hidden 83, 123, 291
address@hidden the array} source
+n = asort(source, dest)
+for (i = 1; i <= n; i++)
+ @var{do something with} dest[i]
@end example
-For this feature to work, write your program so that
address@hidden treats your data as numeric:
+In this case, @command{gawk} copies the @code{source} array into the
address@hidden array and then sorts @code{dest}, destroying its indices.
+However, the @code{source} array is not affected.
+
+Often, what's needed is to sort on the values of the @emph{indices}
+instead of the values of the elements. To do that, use the
address@hidden()} function. The interface and behavior are identical to
+that of @code{asort()}, except that the index values are used for sorting
+and become the values of the result array:
@example
-$ @kbd{echo 0123 123 0x123 | gawk '@{ print $1, $2, $3 @}'}
address@hidden 0123 123 0x123
address@hidden source[$0] = some_func($0) @}
+
+END @{
+ n = asorti(source, dest)
+ for (i = 1; i <= n; i++) @{
+ @ii{Work with sorted indices directly:}
+ @var{do something with} dest[i]
+ @dots{}
+ @ii{Access original array via sorted indices:}
+ @var{do something with} source[dest[i]]
+ @}
address@hidden
@end example
address@hidden
-The @code{print} statement treats its expressions as strings.
-Although the fields can act as numbers when necessary,
-they are still strings, so @code{print} does not try to treat them
-numerically. You need to add zero to a field to force it to
-be treated as a number. For example:
+So far, so good. Now it starts to get interesting. Both @code{asort()}
+and @code{asorti()} accept a third string argument to control comparison
+of array elements. When we introduced @code{asort()} and @code{asorti()}
+in @ref{String Functions}, we ignored this third argument; however,
+now is the time to describe how this argument affects these two functions.
address@hidden
-$ @kbd{echo 0123 123 0x123 | gawk --non-decimal-data '}
-> @address@hidden print $1, $2, $3}
-> @kbd{print $1 + 0, $2 + 0, $3 + 0 @}'}
address@hidden 0123 123 0x123
address@hidden 83 123 291
address@hidden example
+Basically, the third argument specifies how the array is to be sorted.
+There are two possibilities. As with @code{PROCINFO["sorted_in"]},
+this argument may be one of the predefined names that @command{gawk}
+provides (@pxref{Controlling Scanning}), or it may be the name of a
+user-defined function (@pxref{Controlling Array Traversal}).
-Because it is common to have decimal data with leading zeros, and because
-using this facility could lead to surprising results, the default is to leave
it
-disabled. If you want it, you must explicitly request it.
+In the latter case, @emph{the function can compare elements in any way
+it chooses}, taking into account just the indices, just the values,
+or both. This is extremely powerful.
address@hidden programming conventions, @code{--non-decimal-data} option
address@hidden @option{--non-decimal-data} option, @code{strtonum()} function
and
address@hidden @code{strtonum()} function (@command{gawk}),
@code{--non-decimal-data} option and
address@hidden CAUTION
address@hidden of this option is not recommended.}
-It can break old programs very badly.
-Instead, use the @code{strtonum()} function to convert your data
-(@pxref{String Functions}).
-This makes your programs easier to write and easier to read, and
-leads to less surprising results.
+Once the array is sorted, @code{asort()} takes the @emph{values} in
+their final order and uses them to fill in the result array, whereas
address@hidden()} takes the @emph{indices} in their final order and uses
+them to fill in the result array.
-This option may disappear in a future version of @command{gawk}.
address@hidden reference counting, sorting arrays
address@hidden NOTE
+Copying array indices and elements isn't expensive in terms of memory.
+Internally, @command{gawk} maintains @dfn{reference counts} to data.
+For example, when @code{asort()} copies the first array to the second one,
+there is only one copy of the original array elements' data, even though
+both arrays use the values.
@end quotation
address@hidden Array Sorting
address@hidden Controlling Array Traversal and Array Sorting
-
address@hidden lets you control the order in which a
address@hidden (@var{indx} in @var{array})}
-loop traverses an array.
address@hidden Document It And Call It A Feature. Sigh.
address@hidden @command{gawk}, @code{IGNORECASE} variable in
address@hidden arrays, sorting, and @code{IGNORECASE} variable
address@hidden @code{IGNORECASE} variable, and array sorting functions
+Because @code{IGNORECASE} affects string comparisons, the value
+of @code{IGNORECASE} also affects sorting for both @code{asort()} and
@code{asorti()}.
+Note also that the locale's sorting order does @emph{not}
+come into play; comparisons are based on character values address@hidden
+is true because locale-based comparison occurs only when in
+POSIX-compatibility mode, and because @code{asort()} and @code{asorti()} are
address@hidden extensions, they are not available in that case.}
-In addition, two built-in functions, @code{asort()} and @code{asorti()},
-let you sort arrays based on the array values and indices, respectively.
-These two functions also provide control over the sorting criteria used
-to order the elements during sorting.
-
address@hidden
-* Controlling Array Traversal:: How to use PROCINFO["sorted_in"].
-* Array Sorting Functions:: How to use @code{asort()} and @code{asorti()}.
address@hidden menu
-
address@hidden Controlling Array Traversal
address@hidden Controlling Array Traversal
-
-By default, the order in which a @samp{for (@var{indx} in @var{array})} loop
-scans an array is not defined; it is generally based upon
-the internal implementation of arrays inside @command{awk}.
-
-Often, though, it is desirable to be able to loop over the elements
-in a particular order that you, the programmer, choose. @command{gawk}
-lets you do this.
-
address@hidden Scanning} describes how you can assign special,
-predefined values to @code{PROCINFO["sorted_in"]} in order to
-control the order in which @command{gawk} traverses an array
-during a @code{for} loop.
-
-In addition, the value of @code{PROCINFO["sorted_in"]} can be a
-function address@hidden is why the predefined sorting orders
-start with an @samp{@@} character, which cannot be part of an identifier.}
-This lets you traverse an array based on any custom criterion.
-The array elements are ordered according to the return value of this
-function. The comparison function should be defined with at least
-four arguments:
+The following example demonstrates the use of a comparison function with
address@hidden()}. The comparison function, @code{case_fold_compare()}, maps
+both values to lowercase in order to compare them ignoring case.
@example
-function comp_func(i1, v1, i2, v2)
address@hidden
- @var{compare elements 1 and 2 in some fashion}
- @var{return < 0; 0; or > 0}
address@hidden
address@hidden example
-
-Here, @code{i1} and @code{i2} are the indices, and @code{v1} and @code{v2}
-are the corresponding values of the two elements being compared.
-Either @code{v1} or @code{v2}, or both, can be arrays if the array being
-traversed contains subarrays as values.
-(@xref{Arrays of Arrays} for more information about subarrays.)
-The three possible return values are interpreted as follows:
-
address@hidden @code
address@hidden comp_func(i1, v1, i2, v2) < 0
-Index @code{i1} comes before index @code{i2} during loop traversal.
-
address@hidden comp_func(i1, v1, i2, v2) == 0
-Indices @code{i1} and @code{i2}
-come together, but the relative order with respect to each other is undefined.
-
address@hidden comp_func(i1, v1, i2, v2) > 0
-Index @code{i1} comes after index @code{i2} during loop traversal.
address@hidden table
-
-Our first comparison function can be used to scan an array in
-numerical order of the indices:
+# case_fold_compare --- compare as strings, ignoring case
address@hidden
-function cmp_num_idx(i1, v1, i2, v2)
+function case_fold_compare(i1, v1, i2, v2, l, r)
@{
- # numerical index comparison, ascending order
- return (i1 - i2)
address@hidden
address@hidden example
-
-Our second function traverses an array based on the string order of
-the element values rather than by indices:
+ l = tolower(v1)
+ r = tolower(v2)
address@hidden
-function cmp_str_val(i1, v1, i2, v2)
address@hidden
- # string value comparison, ascending order
- v1 = v1 ""
- v2 = v2 ""
- if (v1 < v2)
+ if (l < r)
return -1
- return (v1 != v2)
+ else if (l == r)
+ return 0
+ else
+ return 1
@}
@end example
-The third
-comparison function makes all numbers, and numeric strings without
-any leading or trailing spaces, come out first during loop traversal:
+And here is the test program for it:
@example
-function cmp_num_str_val(i1, v1, i2, v2, n1, n2)
address@hidden
- # numbers before string value comparison, ascending order
- n1 = v1 + 0
- n2 = v2 + 0
- if (n1 == v1)
- return (n2 == v2) ? (n1 - n2) : -1
- else if (n2 == v2)
- return 1
- return (v1 < v2) ? -1 : (v1 != v2)
address@hidden
address@hidden example
-
-Here is a main program to demonstrate how @command{gawk}
-behaves using each of the previous functions:
+# Test program
address@hidden
BEGIN @{
- data["one"] = 10
- data["two"] = 20
- data[10] = "one"
- data[100] = 100
- data[20] = "two"
+ Letters = "abcdefghijklmnopqrstuvwxyz" \
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+ split(Letters, data, "")
- f[1] = "cmp_num_idx"
- f[2] = "cmp_str_val"
- f[3] = "cmp_num_str_val"
- for (i = 1; i <= 3; i++) @{
- printf("Sort function: %s\n", f[i])
- PROCINFO["sorted_in"] = f[i]
- for (j in data)
- printf("\tdata[%s] = %s\n", j, data[j])
- print ""
+ asort(data, result, "case_fold_compare")
+
+ j = length(result)
+ for (i = 1; i <= j; i++) @{
+ printf("%s", result[i])
+ if (i % (j/2) == 0)
+ printf("\n")
+ else
+ printf(" ")
@}
@}
@end example
-Here are the results when the program is run:
+When run, we get the following:
@example
-$ @kbd{gawk -f compdemo.awk}
address@hidden Sort function: cmp_num_idx @ii{Sort by numeric index}
address@hidden data[two] = 20
address@hidden data[one] = 10 @ii{Both strings are numerically
zero}
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden data[100] = 100
address@hidden
address@hidden Sort function: cmp_str_val @ii{Sort by element values as
strings}
address@hidden data[one] = 10
address@hidden data[100] = 100 @ii{String 100 is less than
string 20}
address@hidden data[two] = 20
address@hidden data[10] = one
address@hidden data[20] = two
address@hidden
address@hidden Sort function: cmp_num_str_val @ii{Sort all numeric values
before all strings}
address@hidden data[one] = 10
address@hidden data[two] = 20
address@hidden data[100] = 100
address@hidden data[10] = one
address@hidden data[20] = two
+$ @kbd{gawk -f case_fold_compare.awk}
address@hidden A a B b c C D d e E F f g G H h i I J j k K l L M m
address@hidden n N O o p P Q q r R S s t T u U V v w W X x y Y z Z
@end example
-Consider sorting the entries of a GNU/Linux system password file
-according to login name. The following program sorts records
-by a specific field position and can be used for this purpose:
-
address@hidden
-# passwd-sort.awk --- simple program to sort by field position
-# field position is specified by the global variable POS
address@hidden Two-way I/O
address@hidden Two-Way Communications with Another Process
-function cmp_field(i1, v1, i2, v2)
address@hidden
- # comparison by value, as string, and ascending order
- return v1[POS] < v2[POS] ? -1 : (v1[POS] != v2[POS])
address@hidden
address@hidden 8/2014. Neither Mike nor BWK saw this as relevant. Commenting it
out.
address@hidden
address@hidden Brennan, Michael
address@hidden programmers, attractiveness of
address@hidden
address@hidden Path:
cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
+From: brennan@@whidbey.com (Mike Brennan)
+Newsgroups: comp.lang.awk
+Subject: Re: Learn the SECRET to Attract Women Easily
+Date: 4 Aug 1997 17:34:46 GMT
address@hidden Organization: WhidbeyNet
address@hidden Lines: 12
+Message-ID: <5s53rm$eca@@news.whidbey.com>
address@hidden References: <address@hidden>
address@hidden Reply-To: address@hidden
address@hidden NNTP-Posting-Host: asn202.whidbey.com
address@hidden X-Newsreader: slrn (0.9.4.1 UNIX)
address@hidden Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
address@hidden
- for (i = 1; i <= NF; i++)
- a[NR][i] = $i
address@hidden
+On 3 Aug 1997 13:17:43 GMT, Want More Dates???
+<tracy78@@kilgrona.com> wrote:
+>Learn the SECRET to Attract Women Easily
+>
+>The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
-END @{
- PROCINFO["sorted_in"] = "cmp_field"
- if (POS < 1 || POS > NF)
- POS = 1
- for (i in a) @{
- for (j = 1; j <= NF; j++)
- printf("%s%c", a[i][j], j < NF ? ":" : "")
- print ""
- @}
address@hidden
address@hidden example
+The scent of awk programmers is a lot more attractive to women than
+the scent of perl programmers.
+--
+Mike Brennan
address@hidden brennan@@whidbey.com
address@hidden smallexample
address@hidden ignore
-The first field in each entry of the password file is the user's login name,
-and the fields are separated by colons.
-Each record defines a subarray,
-with each field as an element in the subarray.
-Running the program produces the
-following output:
address@hidden advanced features, address@hidden communicating with
address@hidden processes, two-way communications with
+It is often useful to be able to
+send data to a separate program for
+processing and then read the result. This can always be
+done with temporary files:
@example
-$ @kbd{gawk -v POS=1 -F: -f sort.awk /etc/passwd}
address@hidden adm:x:3:4:adm:/var/adm:/sbin/nologin
address@hidden apache:x:48:48:Apache:/var/www:/sbin/nologin
address@hidden avahi:x:70:70:Avahi daemon:/:/sbin/nologin
address@hidden
address@hidden example
+# Write the data for processing
+tempfile = ("mydata." PROCINFO["pid"])
+while (@var{not done with data})
+ print @var{data} | ("subprogram > " tempfile)
+close("subprogram > " tempfile)
-The comparison should normally always return the same value when given a
-specific pair of array elements as its arguments. If inconsistent
-results are returned, then the order is undefined. This behavior can be
-exploited to introduce random order into otherwise seemingly
-ordered data:
+# Read the results, remove tempfile when done
+while ((getline newdata < tempfile) > 0)
+ @var{process} newdata @var{appropriately}
+close(tempfile)
+system("rm " tempfile)
address@hidden example
address@hidden
-function cmp_randomize(i1, v1, i2, v2)
address@hidden
- # random order (caution: this may never terminate!)
- return (2 - 4 * rand())
address@hidden
address@hidden example
address@hidden
+This works, but not elegantly. Among other things, it requires that
+the program be run in a directory that cannot be shared among users;
+for example, @file{/tmp} will not do, as another user might happen
+to be using a temporary file with the same address@hidden
+Brennan suggests the use of @command{rand()} to generate unique
address@hidden This is a valid point; nevertheless, temporary files
+remain more difficult to use than two-way pipes.} @c 8/2014
-As already mentioned, the order of the indices is arbitrary if two
-elements compare equal. This is usually not a problem, but letting
-the tied elements come out in arbitrary order can be an issue, especially
-when comparing item values. The partial ordering of the equal elements
-may change the next time the array is traversed, if other elements are added
to or
-removed from the array. One way to resolve ties when comparing elements
-with otherwise equal values is to include the indices in the comparison
-rules. Note that doing this may make the loop traversal less efficient,
-so consider it only if necessary. The following comparison functions
-force a deterministic order, and are based on the fact that the
-(string) indices of two elements are never equal:
address@hidden coprocesses
address@hidden input/output, two-way
address@hidden @code{|} (vertical bar), @code{|&} operator (I/O)
address@hidden vertical bar (@code{|}), @code{|&} operator (I/O)
address@hidden @command{csh} utility, @code{|&} operator, comparison with
+However, with @command{gawk}, it is possible to
+open a @emph{two-way} pipe to another process. The second process is
+termed a @dfn{coprocess}, as it runs in parallel with @command{gawk}.
+The two-way connection is created using the @samp{|&} operator
+(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
+different from the same operator in the C shell and in Bash.}
@example
-function cmp_numeric(i1, v1, i2, v2)
address@hidden
- # numerical value (and index) comparison, descending order
- return (v1 != v2) ? (v2 - v1) : (i2 - i1)
address@hidden
-
-function cmp_string(i1, v1, i2, v2)
address@hidden
- # string value (and index) comparison, descending order
- v1 = v1 i1
- v2 = v2 i2
- return (v1 > v2) ? -1 : (v1 != v2)
address@hidden
+do @{
+ print @var{data} |& "subprogram"
+ "subprogram" |& getline results
address@hidden while (@var{data left to process})
+close("subprogram")
@end example
address@hidden Avoid using the term ``stable'' when describing the
unpredictable behavior
address@hidden if two items compare equal. Usually, the goal of a "stable
algorithm"
address@hidden is to maintain the original order of the items, which is a
meaningless
address@hidden concept for a list constructed from a hash.
+The first time an I/O operation is executed using the @samp{|&}
+operator, @command{gawk} creates a two-way pipeline to a child process
+that runs the other program. Output created with @code{print}
+or @code{printf} is written to the program's standard input, and
+output from the program's standard output can be read by the @command{gawk}
+program using @code{getline}.
+As is the case with processes started by @samp{|}, the subprogram
+can be any program, or pipeline of programs, that can be started by
+the shell.
-A custom comparison function can often simplify ordered loop
-traversal, and the sky is really the limit when it comes to
-designing such a function.
+There are some cautionary items to be aware of:
-When string comparisons are made during a sort, either for element
-values where one or both aren't numbers, or for element indices
-handled as strings, the value of @code{IGNORECASE}
-(@pxref{Built-in Variables}) controls whether
-the comparisons treat corresponding upper- and lowercase letters as
-equivalent or distinct.
address@hidden @value{BULLET}
address@hidden
+As the code inside @command{gawk} currently stands, the coprocess's
+standard error goes to the same place that the parent @command{gawk}'s
+standard error goes. It is not possible to read the child's
+standard error separately.
-Another point to keep in mind is that in the case of subarrays,
-the element values can themselves be arrays; a production comparison
-function should use the @code{isarray()} function
-(@pxref{Type Functions})
-to check for this, and choose a defined sorting order for subarrays.
address@hidden deadlocks
address@hidden buffering, input/output
address@hidden @code{getline} command, deadlock and
address@hidden
+I/O buffering may be a problem. @command{gawk} automatically
+flushes all output down the pipe to the coprocess.
+However, if the coprocess does not flush its output,
address@hidden may hang when doing a @code{getline} in order to read
+the coprocess's results. This could lead to a situation
+known as @dfn{deadlock}, where each process is waiting for the
+other one to do something.
address@hidden itemize
-All sorting based on @code{PROCINFO["sorted_in"]}
-is disabled in POSIX mode,
-because the @code{PROCINFO} array is not special in that case.
address@hidden @code{close()} function, two-way pipes and
+It is possible to close just one end of the two-way pipe to
+a coprocess, by supplying a second argument to the @code{close()}
+function of either @code{"to"} or @code{"from"}
+(@pxref{Close Files And Pipes}).
+These strings tell @command{gawk} to close the end of the pipe
+that sends data to the coprocess or the end that reads from it,
+respectively.
-As a side note, sorting the array indices before traversing
-the array has been reported to add a 15% to 20% overhead to the
-execution time of @command{awk} programs. For this reason,
-sorted array traversal is not the default.
address@hidden @command{sort} utility, coprocesses and
+This is particularly necessary in order to use
+the system @command{sort} utility as part of a coprocess;
address@hidden must read @emph{all} of its input
+data before it can produce any output.
+The @command{sort} program does not receive an end-of-file indication
+until @command{gawk} closes the write end of the pipe.
address@hidden The @command{gawk}
address@hidden maintainers believe that only the people who wish to use a
address@hidden feature should have to pay for it.
+When you have finished writing data to the @command{sort}
+utility, you can close the @code{"to"} end of the pipe, and
+then start reading sorted data via @code{getline}.
+For example:
address@hidden Array Sorting Functions
address@hidden Sorting Array Values and Indices with @command{gawk}
address@hidden
+BEGIN @{
+ command = "LC_ALL=C sort"
+ n = split("abcdefghijklmnopqrstuvwxyz", a, "")
address@hidden arrays, sorting
address@hidden
address@hidden @code{asort()} function (@command{gawk}), address@hidden sorting
address@hidden
address@hidden @code{asorti()} function (@command{gawk}), address@hidden sorting
address@hidden sort function, arrays, sorting
-In most @command{awk} implementations, sorting an array requires writing
-a @code{sort()} function. This can be educational for exploring
-different sorting algorithms, but usually that's not the point of the program.
address@hidden provides the built-in @code{asort()} and @code{asorti()}
-functions (@pxref{String Functions}) for sorting arrays. For example:
+ for (i = n; i > 0; i--)
+ print a[i] |& command
+ close(command, "to")
address@hidden
address@hidden the array} data
-n = asort(data)
-for (i = 1; i <= n; i++)
- @var{do something with} data[i]
+ while ((command |& getline line) > 0)
+ print "got", line
+ close(command)
address@hidden
@end example
-After the call to @code{asort()}, the array @code{data} is indexed from 1
-to some number @var{n}, the total number of elements in @code{data}.
-(This count is @code{asort()}'s return value.)
address@hidden @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on.
-The default comparison is based on the type of the elements
-(@pxref{Typing and Comparison}).
-All numeric values come before all string values,
-which in turn come before all subarrays.
+This program writes the letters of the alphabet in reverse order, one
+per line, down the two-way pipe to @command{sort}. It then closes the
+write end of the pipe, so that @command{sort} receives an end-of-file
+indication. This causes @command{sort} to sort the data and write the
+sorted data back to the @command{gawk} program. Once all of the data
+has been read, @command{gawk} terminates the coprocess and exits.
address@hidden side effects, @code{asort()} function
-An important side effect of calling @code{asort()} is that
address@hidden array's original indices are irrevocably lost}.
-As this isn't always desirable, @code{asort()} accepts a
-second argument:
+As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
+command ensures traditional Unix (ASCII) sorting from @command{sort}.
+This is not strictly necessary here, but it's good to know how to do this.
address@hidden
address@hidden the array} source
-n = asort(source, dest)
-for (i = 1; i <= n; i++)
- @var{do something with} dest[i]
address@hidden example
+Be careful when closing the @code{"from"} end of a two-way pipe; in this
+case @command{gawk} waits for the child process to exit, which may cause
+your program to hang. (Thus, this particular feature is of much less
+use in practice than being able to close the @code{"to"} end.)
-In this case, @command{gawk} copies the @code{source} array into the
address@hidden array and then sorts @code{dest}, destroying its indices.
-However, the @code{source} array is not affected.
address@hidden CAUTION
+Normally,
+it is a fatal error to write to the @code{"to"} end of a two-way
+pipe which has been closed, and it is also a fatal error to read
+from the @code{"from"} end of a two-way pipe that has been closed.
-Often, what's needed is to sort on the values of the @emph{indices}
-instead of the values of the elements. To do that, use the
address@hidden()} function. The interface and behavior are identical to
-that of @code{asort()}, except that the index values are used for sorting
-and become the values of the result array:
+You may set @code{PROCINFO["@var{command}", "NONFATAL"]} to
+make such operations become nonfatal. If you do so, you then need
+to check @code{ERRNO} after each @code{print}, @code{printf},
+or @code{getline}.
address@hidden, for more information.
address@hidden quotation
address@hidden
address@hidden source[$0] = some_func($0) @}
address@hidden @command{gawk}, @code{PROCINFO} array in
address@hidden @code{PROCINFO} array, and communications via ptys
+You may also use pseudo-ttys (ptys) for
+two-way communication instead of pipes, if your system supports them.
+This is done on a per-command basis, by setting a special element
+in the @code{PROCINFO} array
+(@pxref{Auto-set}),
+like so:
-END @{
- n = asorti(source, dest)
- for (i = 1; i <= n; i++) @{
- @ii{Work with sorted indices directly:}
- @var{do something with} dest[i]
- @dots{}
- @ii{Access original array via sorted indices:}
- @var{do something with} source[dest[i]]
- @}
address@hidden
address@hidden
+command = "sort -nr" # command, save in convenience variable
+PROCINFO[command, "pty"] = 1 # update PROCINFO
+print @dots{} |& command # start two-way pipe
address@hidden
@end example
-So far, so good. Now it starts to get interesting. Both @code{asort()}
-and @code{asorti()} accept a third string argument to control comparison
-of array elements. When we introduced @code{asort()} and @code{asorti()}
-in @ref{String Functions}, we ignored this third argument; however,
-now is the time to describe how this argument affects these two functions.
-
-Basically, the third argument specifies how the array is to be sorted.
-There are two possibilities. As with @code{PROCINFO["sorted_in"]},
-this argument may be one of the predefined names that @command{gawk}
-provides (@pxref{Controlling Scanning}), or it may be the name of a
-user-defined function (@pxref{Controlling Array Traversal}).
address@hidden
+If your system does not have ptys, or if all the system's ptys are in use,
address@hidden automatically falls back to using regular pipes.
-In the latter case, @emph{the function can compare elements in any way
-it chooses}, taking into account just the indices, just the values,
-or both. This is extremely powerful.
+Using ptys usually avoids the buffer deadlock issues described earlier,
+at some loss in performance. This is because the tty driver buffers
+and sends data line-by-line. On systems with the @command{stdbuf}
+(part of the @uref{http://www.gnu.org/software/coreutils/coreutils.html,
+GNU Coreutils package}), you can use that program instead of ptys.
-Once the array is sorted, @code{asort()} takes the @emph{values} in
-their final order and uses them to fill in the result array, whereas
address@hidden()} takes the @emph{indices} in their final order and uses
-them to fill in the result array.
+Note also that ptys are not fully transparent. Certain binary control
+codes, such @kbd{Ctrl-d} for end-of-file, are interpreted by the tty
+driver and not passed through.
address@hidden reference counting, sorting arrays
address@hidden NOTE
-Copying array indices and elements isn't expensive in terms of memory.
-Internally, @command{gawk} maintains @dfn{reference counts} to data.
-For example, when @code{asort()} copies the first array to the second one,
-there is only one copy of the original array elements' data, even though
-both arrays use the values.
address@hidden CAUTION
+Finally, coprocesses open up the possibility of @dfn{deadlock} between
address@hidden and the program running in the coprocess. This can occur
+if you send ``too much'' data to the coprocess before reading any back;
+each process is blocked writing data with noone available to read what
+they've already written. There is no workaround for deadlock; careful
+programming and knowledge of the behavior of the coprocess are required.
@end quotation
address@hidden Document It And Call It A Feature. Sigh.
address@hidden @command{gawk}, @code{IGNORECASE} variable in
address@hidden arrays, sorting, and @code{IGNORECASE} variable
address@hidden @code{IGNORECASE} variable, and array sorting functions
-Because @code{IGNORECASE} affects string comparisons, the value
-of @code{IGNORECASE} also affects sorting for both @code{asort()} and
@code{asorti()}.
-Note also that the locale's sorting order does @emph{not}
-come into play; comparisons are based on character values address@hidden
-is true because locale-based comparison occurs only when in
-POSIX-compatibility mode, and because @code{asort()} and @code{asorti()} are
address@hidden extensions, they are not available in that case.}
address@hidden TCP/IP Networking
address@hidden Using @command{gawk} for Network Programming
address@hidden advanced features, network programming
address@hidden networks, programming
address@hidden TCP/IP
address@hidden @code{/inet/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet/@dots{}} (@command{gawk})
address@hidden @code{/inet4/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet4/@dots{}} (@command{gawk})
address@hidden @code{/inet6/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet6/@dots{}} (@command{gawk})
address@hidden @code{EMRED}
address@hidden
address@hidden
address@hidden:@*
+@ @ @ @ @i{A host is a host from coast to coast,@*
+@ @ @ @ and nobody talks to a host that's close,@*
+@ @ @ @ unless the host that isn't address@hidden
+@ @ @ @ is busy, hung, or dead.}
address@hidden Mike O'Brien (aka Mr.@: Protocol)
address@hidden quotation
address@hidden ifnotdocbook
-The following example demonstrates the use of a comparison function with
address@hidden()}. The comparison function, @code{case_fold_compare()}, maps
-both values to lowercase in order to compare them ignoring case.
address@hidden
+<blockquote>
+<attribution>Mike O'Brien (aka Mr. Protocol)</attribution>
+<literallayout class="normal"><literal>EMRED</literal>:
+ <emphasis>A host is a host from coast to
coast,</emphasis>
+ <emphasis>and no-one can talk to host that's
close,</emphasis>
+ <emphasis>unless the host that isn't close</emphasis>
+ <emphasis>is busy, hung, or
dead.</emphasis></literallayout>
+</blockquote>
address@hidden docbook
address@hidden
-# case_fold_compare --- compare as strings, ignoring case
+In addition to being able to open a two-way pipeline to a coprocess
+on the same system
+(@pxref{Two-way I/O}),
+it is possible to make a two-way connection to
+another process on another system across an IP network connection.
-function case_fold_compare(i1, v1, i2, v2, l, r)
address@hidden
- l = tolower(v1)
- r = tolower(v2)
+You can think of this as just a @emph{very long} two-way pipeline to
+a coprocess.
+The way @command{gawk} decides that you want to use TCP/IP networking is
+by recognizing special @value{FN}s that begin with one of @samp{/inet/},
address@hidden/inet4/}, or @samp{/inet6/}.
- if (l < r)
- return -1
- else if (l == r)
- return 0
- else
- return 1
address@hidden
address@hidden example
+The full syntax of the special @value{FN} is
address@hidden/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
+The components are:
-And here is the test program for it:
address@hidden @var
address@hidden net-type
+Specifies the kind of Internet connection to make.
+Use @samp{/inet4/} to force IPv4, and
address@hidden/inet6/} to force IPv6.
+Plain @samp{/inet/} (which used to be the only option) uses
+the system default, most likely IPv4.
address@hidden
-# Test program
address@hidden protocol
+The protocol to use over IP. This must be either @samp{tcp}, or
address@hidden, for a TCP or UDP IP connection,
+respectively. TCP should be used for most applications.
-BEGIN @{
- Letters = "abcdefghijklmnopqrstuvwxyz" \
- "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
- split(Letters, data, "")
address@hidden local-port
address@hidden @code{getaddrinfo()} function (C library)
+The local TCP or UDP port number to use. Use a port number of @samp{0}
+when you want the system to pick a port. This is what you should do
+when writing a TCP or UDP client.
+You may also use a well-known service name, such as @samp{smtp}
+or @samp{http}, in which case @command{gawk} attempts to determine
+the predefined port number using the C @code{getaddrinfo()} function.
- asort(data, result, "case_fold_compare")
address@hidden remote-host
+The IP address or fully qualified domain name of the Internet
+host to which you want to connect.
- j = length(result)
- for (i = 1; i <= j; i++) @{
- printf("%s", result[i])
- if (i % (j/2) == 0)
- printf("\n")
- else
- printf(" ")
- @}
address@hidden
address@hidden example
address@hidden remote-port
+The TCP or UDP port number to use on the given @var{remote-host}.
+Again, use @samp{0} if you don't care, or else a well-known
+service name.
address@hidden table
-When run, we get the following:
address@hidden @command{gawk}, @code{ERRNO} variable in
address@hidden @code{ERRNO} variable
address@hidden NOTE
+Failure in opening a two-way socket will result in a nonfatal error
+being returned to the calling code. The value of @code{ERRNO} indicates
+the error (@pxref{Auto-set}).
address@hidden quotation
+
+Consider the following very simple example:
@example
-$ @kbd{gawk -f case_fold_compare.awk}
address@hidden A a B b c C D d e E F f g G H h i I J j k K l L M m
address@hidden n N O o p P Q q r R S s t T u U V v w W X x y Y z Z
+BEGIN @{
+ Service = "/inet/tcp/0/localhost/daytime"
+ Service |& getline
+ print $0
+ close(Service)
address@hidden
@end example
address@hidden Two-way I/O
address@hidden Two-Way Communications with Another Process
+This program reads the current date and time from the local system's
+TCP @code{daytime} server.
+It then prints the results and closes the connection.
address@hidden 8/2014. Neither Mike nor BWK saw this as relevant. Commenting it
out.
address@hidden
address@hidden Brennan, Michael
address@hidden programmers, attractiveness of
address@hidden
address@hidden Path:
cssun.mathcs.emory.edu!gatech!newsxfer3.itd.umich.edu!news-peer.sprintlink.net!news-sea-19.sprintlink.net!news-in-west.sprintlink.net!news.sprintlink.net!Sprint!204.94.52.5!news.whidbey.com!brennan
-From: brennan@@whidbey.com (Mike Brennan)
-Newsgroups: comp.lang.awk
-Subject: Re: Learn the SECRET to Attract Women Easily
-Date: 4 Aug 1997 17:34:46 GMT
address@hidden Organization: WhidbeyNet
address@hidden Lines: 12
-Message-ID: <5s53rm$eca@@news.whidbey.com>
address@hidden References: <address@hidden>
address@hidden Reply-To: address@hidden
address@hidden NNTP-Posting-Host: asn202.whidbey.com
address@hidden X-Newsreader: slrn (0.9.4.1 UNIX)
address@hidden Xref: cssun.mathcs.emory.edu comp.lang.awk:5403
+Because this topic is extensive, the use of @command{gawk} for
+TCP/IP programming is documented separately.
address@hidden
+See
address@hidden, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
address@hidden ifinfo
address@hidden
+See
address@hidden://www.gnu.org/software/gawk/manual/gawkinet/,
address@hidden@value{GAWKINETTITLE}}},
+which comes as part of the @command{gawk} distribution,
address@hidden ifnotinfo
+for a much more complete introduction and discussion, as well as
+extensive examples.
-On 3 Aug 1997 13:17:43 GMT, Want More Dates???
-<tracy78@@kilgrona.com> wrote:
->Learn the SECRET to Attract Women Easily
->
->The SCENT(tm) Pheromone Sex Attractant For Men to Attract Women
address@hidden NOTE
address@hidden can only open direct sockets. There is currently
+no way to access services available over Secure Socket Layer
+(SSL); this includes any web service whose URL starts with @samp{https://}.
address@hidden quotation
-The scent of awk programmers is a lot more attractive to women than
-the scent of perl programmers.
---
-Mike Brennan
address@hidden brennan@@whidbey.com
address@hidden smallexample
address@hidden ignore
address@hidden advanced features, address@hidden communicating with
address@hidden processes, two-way communications with
-It is often useful to be able to
-send data to a separate program for
-processing and then read the result. This can always be
-done with temporary files:
address@hidden Profiling
address@hidden Profiling Your @command{awk} Programs
address@hidden @command{awk} programs, profiling
address@hidden profiling @command{awk} programs
address@hidden @code{awkprof.out} file
address@hidden files, @code{awkprof.out}
address@hidden
-# Write the data for processing
-tempfile = ("mydata." PROCINFO["pid"])
-while (@var{not done with data})
- print @var{data} | ("subprogram > " tempfile)
-close("subprogram > " tempfile)
+You may produce execution traces of your @command{awk} programs.
+This is done by passing the option @option{--profile} to @command{gawk}.
+When @command{gawk} has finished running, it creates a profile of your program
in a file
+named @file{awkprof.out}. Because it is profiling, it also executes up to 45%
slower than
address@hidden normally does.
-# Read the results, remove tempfile when done
-while ((getline newdata < tempfile) > 0)
- @var{process} newdata @var{appropriately}
-close(tempfile)
-system("rm " tempfile)
address@hidden @option{--profile} option
+As shown in the following example,
+the @option{--profile} option can be used to change the name of the file
+where @command{gawk} will write the profile:
+
address@hidden
+gawk --profile=myprog.prof -f myprog.awk data1 data2
@end example
@noindent
-This works, but not elegantly. Among other things, it requires that
-the program be run in a directory that cannot be shared among users;
-for example, @file{/tmp} will not do, as another user might happen
-to be using a temporary file with the same address@hidden
-Brennan suggests the use of @command{rand()} to generate unique
address@hidden This is a valid point; nevertheless, temporary files
-remain more difficult to use than two-way pipes.} @c 8/2014
+In the preceding example, @command{gawk} places the profile in
address@hidden instead of in @file{awkprof.out}.
address@hidden coprocesses
address@hidden input/output, two-way
address@hidden @code{|} (vertical bar), @code{|&} operator (I/O)
address@hidden vertical bar (@code{|}), @code{|&} operator (I/O)
address@hidden @command{csh} utility, @code{|&} operator, comparison with
-However, with @command{gawk}, it is possible to
-open a @emph{two-way} pipe to another process. The second process is
-termed a @dfn{coprocess}, as it runs in parallel with @command{gawk}.
-The two-way connection is created using the @samp{|&} operator
-(borrowed from the Korn shell, @command{ksh}):@footnote{This is very
-different from the same operator in the C shell and in Bash.}
+Here is a sample session showing a simple @command{awk} program,
+its input data, and the results from running @command{gawk} with the
address@hidden option. First, the @command{awk} program:
@example
-do @{
- print @var{data} |& "subprogram"
- "subprogram" |& getline results
address@hidden while (@var{data left to process})
-close("subprogram")
address@hidden example
+BEGIN @{ print "First BEGIN rule" @}
-The first time an I/O operation is executed using the @samp{|&}
-operator, @command{gawk} creates a two-way pipeline to a child process
-that runs the other program. Output created with @code{print}
-or @code{printf} is written to the program's standard input, and
-output from the program's standard output can be read by the @command{gawk}
-program using @code{getline}.
-As is the case with processes started by @samp{|}, the subprogram
-can be any program, or pipeline of programs, that can be started by
-the shell.
+END @{ print "First END rule" @}
-There are some cautionary items to be aware of:
+/foo/ @{
+ print "matched /foo/, gosh"
+ for (i = 1; i <= 3; i++)
+ sing()
address@hidden
address@hidden @value{BULLET}
address@hidden
-As the code inside @command{gawk} currently stands, the coprocess's
-standard error goes to the same place that the parent @command{gawk}'s
-standard error goes. It is not possible to read the child's
-standard error separately.
address@hidden
+ if (/foo/)
+ print "if is true"
+ else
+ print "else is true"
address@hidden
address@hidden deadlocks
address@hidden buffering, input/output
address@hidden @code{getline} command, deadlock and
address@hidden
-I/O buffering may be a problem. @command{gawk} automatically
-flushes all output down the pipe to the coprocess.
-However, if the coprocess does not flush its output,
address@hidden may hang when doing a @code{getline} in order to read
-the coprocess's results. This could lead to a situation
-known as @dfn{deadlock}, where each process is waiting for the
-other one to do something.
address@hidden itemize
+BEGIN @{ print "Second BEGIN rule" @}
address@hidden @code{close()} function, two-way pipes and
-It is possible to close just one end of the two-way pipe to
-a coprocess, by supplying a second argument to the @code{close()}
-function of either @code{"to"} or @code{"from"}
-(@pxref{Close Files And Pipes}).
-These strings tell @command{gawk} to close the end of the pipe
-that sends data to the coprocess or the end that reads from it,
-respectively.
+END @{ print "Second END rule" @}
address@hidden @command{sort} utility, coprocesses and
-This is particularly necessary in order to use
-the system @command{sort} utility as part of a coprocess;
address@hidden must read @emph{all} of its input
-data before it can produce any output.
-The @command{sort} program does not receive an end-of-file indication
-until @command{gawk} closes the write end of the pipe.
+function sing( dummy)
address@hidden
+ print "I gotta be me!"
address@hidden
address@hidden example
-When you have finished writing data to the @command{sort}
-utility, you can close the @code{"to"} end of the pipe, and
-then start reading sorted data via @code{getline}.
-For example:
+Following is the input data:
@example
-BEGIN @{
- command = "LC_ALL=C sort"
- n = split("abcdefghijklmnopqrstuvwxyz", a, "")
-
- for (i = n; i > 0; i--)
- print a[i] |& command
- close(command, "to")
-
- while ((command |& getline line) > 0)
- print "got", line
- close(command)
address@hidden
+foo
+bar
+baz
+foo
+junk
@end example
-This program writes the letters of the alphabet in reverse order, one
-per line, down the two-way pipe to @command{sort}. It then closes the
-write end of the pipe, so that @command{sort} receives an end-of-file
-indication. This causes @command{sort} to sort the data and write the
-sorted data back to the @command{gawk} program. Once all of the data
-has been read, @command{gawk} terminates the coprocess and exits.
-
-As a side note, the assignment @samp{LC_ALL=C} in the @command{sort}
-command ensures traditional Unix (ASCII) sorting from @command{sort}.
-This is not strictly necessary here, but it's good to know how to do this.
+Here is the @file{awkprof.out} that results from running the
address@hidden profiler on this program and data (this example also
+illustrates that @command{awk} programmers sometimes get up very early
+in the morning to work):
-Be careful when closing the @code{"from"} end of a two-way pipe; in this
-case @command{gawk} waits for the child process to exit, which may cause
-your program to hang. (Thus, this particular feature is of much less
-use in practice than being able to close the @code{"to"} end.)
address@hidden @code{BEGIN} pattern, and profiling
address@hidden @code{END} pattern, and profiling
address@hidden
+ # gawk profile, created Mon Sep 29 05:16:21 2014
address@hidden CAUTION
-Normally,
-it is a fatal error to write to the @code{"to"} end of a two-way
-pipe which has been closed, and it is also a fatal error to read
-from the @code{"from"} end of a two-way pipe that has been closed.
+ # BEGIN rule(s)
-You may set @code{PROCINFO["@var{command}", "NONFATAL"]} to
-make such operations become nonfatal. If you do so, you then need
-to check @code{ERRNO} after each @code{print}, @code{printf},
-or @code{getline}.
address@hidden, for more information.
address@hidden quotation
+ BEGIN @{
+ 1 print "First BEGIN rule"
+ @}
address@hidden @command{gawk}, @code{PROCINFO} array in
address@hidden @code{PROCINFO} array, and communications via ptys
-You may also use pseudo-ttys (ptys) for
-two-way communication instead of pipes, if your system supports them.
-This is done on a per-command basis, by setting a special element
-in the @code{PROCINFO} array
-(@pxref{Auto-set}),
-like so:
+ BEGIN @{
+ 1 print "Second BEGIN rule"
+ @}
address@hidden
-command = "sort -nr" # command, save in convenience variable
-PROCINFO[command, "pty"] = 1 # update PROCINFO
-print @dots{} |& command # start two-way pipe
address@hidden
address@hidden example
+ # Rule(s)
address@hidden
-If your system does not have ptys, or if all the system's ptys are in use,
address@hidden automatically falls back to using regular pipes.
+ 5 /foo/ @{ # 2
+ 2 print "matched /foo/, gosh"
+ 6 for (i = 1; i <= 3; i++) @{
+ 6 sing()
+ @}
+ @}
-Using ptys usually avoids the buffer deadlock issues described earlier,
-at some loss in performance. This is because the tty driver buffers
-and sends data line-by-line. On systems with the @command{stdbuf}
-(part of the @uref{http://www.gnu.org/software/coreutils/coreutils.html,
-GNU Coreutils package}), you can use that program instead of ptys.
+ 5 @{
+ 5 if (/foo/) @{ # 2
+ 2 print "if is true"
+ 3 @} else @{
+ 3 print "else is true"
+ @}
+ @}
-Note also that ptys are not fully transparent. Certain binary control
-codes, such @kbd{Ctrl-d} for end-of-file, are interpreted by the tty
-driver and not passed through.
+ # END rule(s)
address@hidden CAUTION
-Finally, coprocesses open up the possibility of @dfn{deadlock} between
address@hidden and the program running in the coprocess. This can occur
-if you send ``too much'' data to the coprocess before reading any back;
-each process is blocked writing data with noone available to read what
-they've already written. There is no workaround for deadlock; careful
-programming and knowledge of the behavior of the coprocess are required.
address@hidden quotation
+ END @{
+ 1 print "First END rule"
+ @}
address@hidden TCP/IP Networking
address@hidden Using @command{gawk} for Network Programming
address@hidden advanced features, network programming
address@hidden networks, programming
address@hidden TCP/IP
address@hidden @code{/inet/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet/@dots{}} (@command{gawk})
address@hidden @code{/inet4/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet4/@dots{}} (@command{gawk})
address@hidden @code{/inet6/@dots{}} special files (@command{gawk})
address@hidden files, @code{/inet6/@dots{}} (@command{gawk})
address@hidden @code{EMRED}
address@hidden
address@hidden
address@hidden:@*
-@ @ @ @ @i{A host is a host from coast to coast,@*
-@ @ @ @ and nobody talks to a host that's close,@*
-@ @ @ @ unless the host that isn't address@hidden
-@ @ @ @ is busy, hung, or dead.}
address@hidden Mike O'Brien (aka Mr.@: Protocol)
address@hidden quotation
address@hidden ifnotdocbook
+ END @{
+ 1 print "Second END rule"
+ @}
address@hidden
-<blockquote>
-<attribution>Mike O'Brien (aka Mr. Protocol)</attribution>
-<literallayout class="normal"><literal>EMRED</literal>:
- <emphasis>A host is a host from coast to
coast,</emphasis>
- <emphasis>and no-one can talk to host that's
close,</emphasis>
- <emphasis>unless the host that isn't close</emphasis>
- <emphasis>is busy, hung, or
dead.</emphasis></literallayout>
-</blockquote>
address@hidden docbook
-In addition to being able to open a two-way pipeline to a coprocess
-on the same system
-(@pxref{Two-way I/O}),
-it is possible to make a two-way connection to
-another process on another system across an IP network connection.
+ # Functions, listed alphabetically
-You can think of this as just a @emph{very long} two-way pipeline to
-a coprocess.
-The way @command{gawk} decides that you want to use TCP/IP networking is
-by recognizing special @value{FN}s that begin with one of @samp{/inet/},
address@hidden/inet4/}, or @samp{/inet6/}.
+ 6 function sing(dummy)
+ @{
+ 6 print "I gotta be me!"
+ @}
address@hidden example
-The full syntax of the special @value{FN} is
address@hidden/@var{net-type}/@var{protocol}/@var{local-port}/@var{remote-host}/@var{remote-port}}.
-The components are:
-
address@hidden @var
address@hidden net-type
-Specifies the kind of Internet connection to make.
-Use @samp{/inet4/} to force IPv4, and
address@hidden/inet6/} to force IPv6.
-Plain @samp{/inet/} (which used to be the only option) uses
-the system default, most likely IPv4.
-
address@hidden protocol
-The protocol to use over IP. This must be either @samp{tcp}, or
address@hidden, for a TCP or UDP IP connection,
-respectively. TCP should be used for most applications.
-
address@hidden local-port
address@hidden @code{getaddrinfo()} function (C library)
-The local TCP or UDP port number to use. Use a port number of @samp{0}
-when you want the system to pick a port. This is what you should do
-when writing a TCP or UDP client.
-You may also use a well-known service name, such as @samp{smtp}
-or @samp{http}, in which case @command{gawk} attempts to determine
-the predefined port number using the C @code{getaddrinfo()} function.
-
address@hidden remote-host
-The IP address or fully qualified domain name of the Internet
-host to which you want to connect.
-
address@hidden remote-port
-The TCP or UDP port number to use on the given @var{remote-host}.
-Again, use @samp{0} if you don't care, or else a well-known
-service name.
address@hidden table
-
address@hidden @command{gawk}, @code{ERRNO} variable in
address@hidden @code{ERRNO} variable
address@hidden NOTE
-Failure in opening a two-way socket will result in a nonfatal error
-being returned to the calling code. The value of @code{ERRNO} indicates
-the error (@pxref{Auto-set}).
address@hidden quotation
-
-Consider the following very simple example:
-
address@hidden
-BEGIN @{
- Service = "/inet/tcp/0/localhost/daytime"
- Service |& getline
- print $0
- close(Service)
address@hidden
address@hidden example
-
-This program reads the current date and time from the local system's
-TCP @code{daytime} server.
-It then prints the results and closes the connection.
-
-Because this topic is extensive, the use of @command{gawk} for
-TCP/IP programming is documented separately.
address@hidden
-See
address@hidden, , General Introduction, gawkinet, @value{GAWKINETTITLE}},
address@hidden ifinfo
address@hidden
-See
address@hidden://www.gnu.org/software/gawk/manual/gawkinet/,
address@hidden@value{GAWKINETTITLE}}},
-which comes as part of the @command{gawk} distribution,
address@hidden ifnotinfo
-for a much more complete introduction and discussion, as well as
-extensive examples.
-
address@hidden NOTE
address@hidden can only open direct sockets. There is currently
-no way to access services available over Secure Socket Layer
-(SSL); this includes any web service whose URL starts with @samp{https://}.
address@hidden quotation
-
-
address@hidden Profiling
address@hidden Profiling Your @command{awk} Programs
address@hidden @command{awk} programs, profiling
address@hidden profiling @command{awk} programs
address@hidden @code{awkprof.out} file
address@hidden files, @code{awkprof.out}
-
-You may produce execution traces of your @command{awk} programs.
-This is done by passing the option @option{--profile} to @command{gawk}.
-When @command{gawk} has finished running, it creates a profile of your program
in a file
-named @file{awkprof.out}. Because it is profiling, it also executes up to 45%
slower than
address@hidden normally does.
-
address@hidden @option{--profile} option
-As shown in the following example,
-the @option{--profile} option can be used to change the name of the file
-where @command{gawk} will write the profile:
-
address@hidden
-gawk --profile=myprog.prof -f myprog.awk data1 data2
address@hidden example
-
address@hidden
-In the preceding example, @command{gawk} places the profile in
address@hidden instead of in @file{awkprof.out}.
-
-Here is a sample session showing a simple @command{awk} program,
-its input data, and the results from running @command{gawk} with the
address@hidden option. First, the @command{awk} program:
-
address@hidden
-BEGIN @{ print "First BEGIN rule" @}
-
-END @{ print "First END rule" @}
-
-/foo/ @{
- print "matched /foo/, gosh"
- for (i = 1; i <= 3; i++)
- sing()
address@hidden
-
address@hidden
- if (/foo/)
- print "if is true"
- else
- print "else is true"
address@hidden
-
-BEGIN @{ print "Second BEGIN rule" @}
-
-END @{ print "Second END rule" @}
-
-function sing( dummy)
address@hidden
- print "I gotta be me!"
address@hidden
address@hidden example
-
-Following is the input data:
-
address@hidden
-foo
-bar
-baz
-foo
-junk
address@hidden example
-
-Here is the @file{awkprof.out} that results from running the
address@hidden profiler on this program and data (this example also
-illustrates that @command{awk} programmers sometimes get up very early
-in the morning to work):
-
address@hidden @code{BEGIN} pattern, and profiling
address@hidden @code{END} pattern, and profiling
address@hidden
- # gawk profile, created Mon Sep 29 05:16:21 2014
-
- # BEGIN rule(s)
-
- BEGIN @{
- 1 print "First BEGIN rule"
- @}
-
- BEGIN @{
- 1 print "Second BEGIN rule"
- @}
-
- # Rule(s)
-
- 5 /foo/ @{ # 2
- 2 print "matched /foo/, gosh"
- 6 for (i = 1; i <= 3; i++) @{
- 6 sing()
- @}
- @}
-
- 5 @{
- 5 if (/foo/) @{ # 2
- 2 print "if is true"
- 3 @} else @{
- 3 print "else is true"
- @}
- @}
-
- # END rule(s)
-
- END @{
- 1 print "First END rule"
- @}
-
- END @{
- 1 print "Second END rule"
- @}
-
-
- # Functions, listed alphabetically
-
- 6 function sing(dummy)
- @{
- 6 print "I gotta be me!"
- @}
address@hidden example
-
-This example illustrates many of the basic features of profiling output.
-They are as follows:
+This example illustrates many of the basic features of profiling output.
+They are as follows:
@itemize @value{BULLET}
@item
@@ -30516,6 +30222,299 @@ program being debugged, but occasionally it can.
@end itemize
address@hidden name-spaces Name-space Name-spaces}
address@hidden Namespaces
address@hidden Namespaces in @command{gawk}
+
+This @value{CHAPTER} describes a feature that is specific to @command{gawk}.
+
address@hidden
+* Global Namespace:: The global namespace in standard @command{awk}.
+* Qualified Names:: How to qualify names with a namespace.
+* Default Namespace:: The default namespace.
+* Changing The Namespace:: How to change the namespace.
+* Internal Name Management:: How names are stored internally.
+* Namespace Example:: An example of code using a namespace.
+* Namespace Misc:: Namespace notes for developers.
address@hidden menu
+
address@hidden Global Namespace
address@hidden Standard @command{awk}'s Single Namespace
+
+In standard @command{awk}, there is a single, global, @dfn{namespace}.
+This means that @emph{all} function names and global variable names must
+be unique. For example, two different @command{awk} source files cannot
+both define a function named @code{min()}, or define an array named
@code{data}.
+
+This situation is okay when programs are small, say a few hundred
+lines, or even a few thousand, but it prevents the development of
+reusable libraries of @command{awk} functions, and can inadvertently
+cause independently-developed library files to accidentally step on each
+other's ``private'' global variables
+(@pxref{Library Names}).
+
+Most other programming languages solve this issue by providing some kind
+of namespace control: a way to say ``this function is in namespace @var{xxx},
+and that function is in namespace @var{yyy}.'' (Of course, there is then
+still a single namespace for the namespaces, but the hope is that there
+are much fewer namespaces in use by any given program, and thus much
+less chance for collisions.) These facilities are sometimes referred
+to as @dfn{packages} or @dfn{modules}.
+
+Starting with @value{PVERSION} @strong{FIXME} 5.0, @command{gawk} provides a
+mechanism to put functions and global variables into separate namespaces.
+
address@hidden Qualified Names
address@hidden Qualified Names
+
+A @dfn{qualified name} is an identifier that includes a namespace
+name and the namespace separator, @code{::}. For example, one
+might have a function named @code{posix::getpid()}. Here, the
+namespace is @code{posix} and the function name within the namespace
+is @code{getpid()}. The namespace and variable or function name are
+separated by a double-colon. Only one such separator is allowed in a
+qualified name.
+
address@hidden NOTE
+Unlike C++, the @code{::} is @emph{not} an operator. No spaces are
+allowed between the namespace name, the @code{::}, and the rest of
+the name.
address@hidden quotation
+
+You must use fully qualified names from one namespace to access variables
+and functions in another. This is especially important when using
+variable names to index the special @code{SYMTAB} array (@pxref{Auto-set}),
+and when making indirect function calls (@pxref{Indirect Calls}).
+
+It is a syntax error to use any @command{gawk} reserved word (such
+as @code{if} or @code{for}), or the name of any built-in function
+(such as @code{sin()} or @code{gsub()}) as the second part of a
+fully qualified name. Using such an identifier as a namespace
+name (currently) @emph{is} allowed, but produces a lint warning.
+
address@hidden pre-defined variable names may be used:
address@hidden::NR} is valid, if possibly not all that useful.
+
address@hidden Default Namespace
address@hidden The Default Namespace
+
+The default namespace, not surprisingly, is @samp{awk}.
+All of the predefined @command{awk} and @command{gawk} variables
+are in this namespace, and thus have qualified names like
address@hidden::ARGC}, @code{awk::NF}, and so on.
+
+Furthermore, even when you have changed the namespace for your
+current source file (@pxref{Changing The Namespace}), @command{gawk}
+forces unqualified identifiers whose names are all uppercase letters
+to be in the @samp{awk} namespace. This makes it possible for you to easily
+reference @command{gawk}'s global variables from different namespaces.
+
+It is a syntax error to use qualified names for function parameter names.
+
address@hidden Changing The Namespace
address@hidden Changing The Namespace
+
+In order to set the current namespace, use an @samp{@@namespace} directive
+at the top level of your program:
+
address@hidden
+@@namespace "passwd"
+
+BEGIN @{ @dots{} @}
address@hidden
address@hidden example
+
+After this directive, all simple non-completely-uppercase identifiers are
+placed into the @code{passwd} namespace.
+
+You can change the namespace multiple times within a single
+source file, although this is likely to become confusing if you
+do it too much.
+
address@hidden NOTE
+Association of unqualified identifiers to a namespace is handled while
+your program is being parsed by @command{gawk} and before it starts
+to run. There is no concept of a ``current'' namespace once your program
+starts executing. Be sure you understand this.
address@hidden quotation
+
+Each source file for @option{-i} and @option{-f} starts out with
+an implicit @samp{@@namespace "awk"}. Similarly, each chunk of
+command-line code supplied with @option{-e} has such an implicit
+initial statement (@pxref{Options}).
+
+The use of @samp{@@namespace} has no influence upon the order of execution
+of @code{BEGIN}, @code{BEGINFILE}, @code{END}, and @code{ENDFILE} rules.
+
address@hidden Internal Name Management
address@hidden Internal Name Management
+
+For backwards compatibility, all identifiers in the @samp{awk} namespace
+are stored internally as unadorned identifiers. This is mainly relevant
+when using such identifiers as indices for @code{SYMTAB}, @code{FUNCTAB},
+and @code{PROCINFO["identifiers"]} (@pxref{Auto-set}), and for use in
+indirect function calls (@pxref{Indirect Calls}).
+
+In program code, to refer to variables and functions in the @samp{awk}
+namespace from another namespace, you must still use the @samp{awk::}
+prefix. For example:
+
address@hidden
+@@namespace "awk" @ii{This is the default namespace}
+
+BEGIN @{
+ Title = "My Report" @ii{Fully qualified name is} awk::Title
address@hidden
+
+@@namespace "report" @ii{Now in} report @ii{namespace}
+
+function compute() @ii{This is really} report::compute()
address@hidden
+ print awk::Title @ii{But would be} SYMTAB["Title"]
+ @dots{}
address@hidden
address@hidden example
+
address@hidden Namespace Example
address@hidden Namespace Example
+
address@hidden
+# FIXME: fix this up for real, dates etc
+#
+# passwd.awk --- access password file information
+#
+# Arnold Robbins, arnold@@skeeve.com, Public Domain
+# May 1993
+# Revised October 2000
+# Revised December 2010
+#
+# Reworked for namespaces May 2017
+
+@@namespace "passwd"
+
+BEGIN @{
+ # tailor this to suit your system
+ Awklib = "/usr/local/libexec/awk/"
address@hidden
+
+function Init( oldfs, oldrs, olddol0, pwcat, using_fw, using_fpat)
address@hidden
+ if (Inited)
+ return
+
+ oldfs = FS
+ oldrs = RS
+ olddol0 = $0
+ using_fw = (PROCINFO["FS"] == "FIELDWIDTHS")
+ using_fpat = (PROCINFO["FS"] == "FPAT")
+ FS = ":"
+ RS = "\n"
+
+ pwcat = Awklib "pwcat"
+ while ((pwcat | getline) > 0) @{
+ Byname[$1] = $0
+ Byuid[$3] = $0
+ Bycount[++Total] = $0
+ @}
+ close(pwcat)
+ Count = 0
+ Inited = 1
+ FS = oldfs
+ if (using_fw)
+ FIELDWIDTHS = FIELDWIDTHS
+ else if (using_fpat)
+ FPAT = FPAT
+ RS = oldrs
+ $0 = olddol0
address@hidden
+
+function Getpwnam(name)
address@hidden
+ Init()
+ return Byname[name]
address@hidden
+
+function Getpwuid(uid)
address@hidden
+ Init()
+ return Byuid[uid]
address@hidden
+
+function Getpwent()
address@hidden
+ Init()
+ if (Count < Total)
+ return Bycount[++Count]
+ return ""
address@hidden
+
+function Endpwent()
address@hidden
+ Count = 0
address@hidden
+
+# Compatibility:
+
+@@namespace "awk"
+
+function getpwnam(name)
address@hidden
+ return passwd::Getpwnam(name)
address@hidden
+
+function getpwuid(uid)
address@hidden
+ return passwd::Getpwuid(uid)
address@hidden
+
+function getpwent()
address@hidden
+ return passwd::Getpwent()
address@hidden
+
+function endpwent()
address@hidden
+ passwd::Endpwent()
address@hidden
address@hidden example
+
address@hidden Namespace Misc
address@hidden Miscellaneous Notes
+
+Other notes for reviewers:
+
address@hidden @asis
address@hidden Profiler:
+When profiling, we include the namespace in the @code{Op_Rule}
+and @code{Op_Func} instructions. If the namespace
+is different from the previous
+one, output an @samp{@@namespace} statement. For each identifier,
+if it starts with the current namespace, output only the simple part.
+
address@hidden Debugger:
+Simply print fully qualified names all the time. Maybe allow a
address@hidden @var{xxx}} command in the debugger to set the
+namespace and it will use that to create fully qualified names?
+Have to be careful about all uppercase names though.
+
address@hidden How does this affect @code{@@include}?
+Basically @code{@@include} should push and pop the namespace. Each
address@hidden@@include} saves the current namespace and starts over with
+namespace @samp{awk} until an @code{@@namespace} is seen.
+
address@hidden Extension functions
+Revise the current macros to pass @code{"awk"} as the namespace
+argument and add new macros with @samp{_ns} or some such in the name that
+pass the namespace of the extension. This preserves backwards
+compatibility at the source level while providing access to namespaces
+as needed.
+
+Actually, since we've decided that @code{awk} namespace variables and
+function are stored unadorned, the current macros that pass @code{""}
+would continue to work. Internally, we need to recognize @code{"awk"} and
address@hidden fully qualify the name before storing it in the symbol table.
address@hidden table
+
@node Arbitrary Precision Arithmetic
@chapter Arithmetic and Arbitrary-Precision Arithmetic with @command{gawk}
@cindex arbitrary precision
-----------------------------------------------------------------------
Summary of changes:
doc/ChangeLog | 4 +
doc/gawk.info | 1231 +++++++++++++++---------------
doc/gawk.texi | 2289 +++++++++++++++++++++++++++----------------------------
doc/gawktexi.in | 2289 +++++++++++++++++++++++++++----------------------------
4 files changed, 2907 insertions(+), 2906 deletions(-)
hooks/post-receive
--
gawk
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [gawk-diffs] [SCM] gawk branch, feature/namespaces, updated. gawk-4.1.0-2608-gd4ec803,
Arnold Robbins <=