help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: puzzled by the result of this multi-dimensional array code


From: Neil R. Ormos
Subject: Re: puzzled by the result of this multi-dimensional array code
Date: Sat, 20 Apr 2024 21:31:27 -0500 (CDT)
User-agent: Alpine 2.20 (DEB 67 2015-01-07)

Peter Lindgren wrote:

> I'm trying to write a gawk program to analyze some simple csv data -
> values for 6 variables per 100+ countries. So far, I'm just reading
> in the data and loading it into a 2-dimensional array, called
> "data". The first data line consists of column titles; I split that
> into a separate array called "titles". As a check on what I'm doing
> so far, I chose to dump or show the USA data, and it comes out
> oddly. I was able to cut down the data to two lines - the column
> titles line and the data line for the USA. I changed the actual data
> values, since they're immaterial; nothing secret though - the source
> of the original data is noted in the code and it is publicly
> available on the web.

> test data:
>
> abr;full;item1;item2;item3;item4;item5;item6
> USA;U.S.A.;24;1;27;75;60;19
>
> code:
>
> #!/usr/bin/env -S gawk -f
>
> # program to analyze cultural data from Geert Hofstede's
> #   see the Culture's Consequences book or:
> #   http://www.geerthofstede.nl
>
> BEGIN    {FS=";"}
>
> NR==1    {split($0,titles)
>     for (x in titles) { print x, titles[x] }
>     next
> }
>
> # load the data
>     { for (i=1; i<=NF; i++) {
>         data[$1][titles[i]] = $i
>     }
> }
>
> # process the data
>
>
>
> END    { show("USA")
>     for (x in titles) { print x, titles[x] }
> }
>
> function show(c) {
>     for (x in titles) { print x, titles[x], data[c][titles[x]] }
>     print 8, titles[8],data[c][titles[8]]
> }
>
>
> result of ./countries.awk test.csv
> 1 abr
> 2 full
> 3 item1
> 4 item2
> 5 item3
> 6 item4
> 7 item5
> 8 item6
> 1 abr USA
> 2 full U.S.A.
> 3 item1 24
> 4 item2 1
> 5 item3 27
> 6 item4 75
> 7 item5 60
>  19tem6
>  19tem6
> 1 abr
> 2 full
> 3 item1
> 4 item2
> 5 item3
> 6 item4
> 7 item5
> 8 item6
>

> Question: why doesn't the title for the 8th
> column get printed like the others do, when I'm
> showing a real data line? The titles array
> itself shows up correctly just after being
> saved, and again at END. Hard-coding an 8 in the
> show function doesn't make any difference.

> I am totally ready to believe this is something
> obvious I'm just missing...

Testing with gawk 4.1.4 and gawk 5.2.2, I get the
results below, where the 8th column seems to print
like the other columns, but I had to strip out
many special characters (they looked like
non-breaking spaces) to get the program to run.
The special characters may be artifacts of the
e-mail MUA through which you sent your question,
but perhaps the special characters are in the
source file or the input data file.  For
debugging, you might consider:

(a) using the C locale;

(b) checking the program and the input data for
    characters other than printable ASCII and
    newlines; and

(c) using the --profile option to confirm that the
    parsed version of the program looks as you
    expect.

################################################################################
1 abr
2 full
3 item1
4 item2
5 item3
6 item4
7 item5
8 item6
1 abr USA
2 full U.S.A.
3 item1 24
4 item2 1
5 item3 27
6 item4 75
7 item5 60
8 item6 19
8 item6 19
1 abr
2 full
3 item1
4 item2
5 item3
6 item4
7 item5
8 item6
################################################################################



reply via email to

[Prev in Thread] Current Thread [Next in Thread]