pspp-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pspp-commits] [SCM] GNU PSPP branch, master, updated. v1.4.1-297-g88092


From: Ben Pfaff
Subject: [Pspp-commits] [SCM] GNU PSPP branch, master, updated. v1.4.1-297-g880924c
Date: Sat, 9 Jan 2021 19:17:57 -0500 (EST)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU PSPP".

The branch, master has been updated
       via  880924c2c10557f211e71af79d014cd1fd26607a (commit)
       via  f4cc5814b23b58ba474289e898bc96044c7a50b9 (commit)
       via  7aee1bc71c08f2c5b69243cb1ca792c8e7615faa (commit)
       via  64924ffc331a0aca047bf79d40b454c4da8f9021 (commit)
       via  899e97f2730db331e3bce41c29f09d5f31164463 (commit)
       via  1e823de8e6b08940a6e96e6a209dd4f9e82d058b (commit)
       via  2b29f671e938c39ab6c4f198e685951552b9cae8 (commit)
       via  b0269e285ccdf7acb8d6231f29d85b56cfdd676c (commit)
       via  82eefd6de0852d5ec93771fb06786cdc8fd8ed6f (commit)
       via  db9550227e0861cb1bfe139da6b0c6f389d7b368 (commit)
       via  8491d88610f4a0c48891be493a4bd0522aec297b (commit)
       via  b96adcc0447f11136edb0a4e957fb6bd5b3c0d93 (commit)
      from  def6f19d1b58929ed31ae6a7a90f89054ab3ace7 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 880924c2c10557f211e71af79d014cd1fd26607a
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 23:02:23 2021 -0800

    spv-light-decoder: Add back character set encoding support.
    
    Originally, SPV light member decoding would obtain the member's declared
    character set encoding and recode all strings from that encoding into
    UTF-8.
    
    The SPV file submitted along with bug #59837, however, showed that SPV
    files actually contain UTF-8 strings despite their declared character
    encoding.  This SPV file declared windows-1253 encoding.  It contained 221
    unique strings, 52 of which contained non-ASCII characters, and all of
    which were valid UTF-8.  Therefore, commit db9a44802bb9
    ("spv-light-decoder: Text strings are all UTF-8 encoded.") changed the
    SPV light member decoder to treat all strings as UTF-8 encoded.
    
    Unfortunately, further examination of the SPV corpus showed that there is
    no consistency.  Some files do contain all UTF-8 despite declaring another
    character set.  For example:
    
    00764b1c.spv: windows-1251: 481 unique strings, 123 non-ASCII, 123 UTF-8.
    009bf8ba.spv: windows-1252: 76 unique strings, 9 non-ASCII, 9 UTF-8.
    00b033b6.spv: windows-1252: 389 unique strings, 16 non-ASCII, 16 UTF-8.
    014fc3df.spv: ISO_8859-1:1987: 81 unique strings, 4 non-ASCII, 4 UTF-8.
    01a20a32.spv: windows-1254: 71 unique strings, 10 non-ASCII, 10 UTF-8.
    01d41135.spv: windows-1251: 142 unique strings, 51 non-ASCII, 51 UTF-8.
    01d7942a.spv: windows-1251: 64 unique strings, 43 non-ASCII, 0 UTF-8.
    0203e88a.spv: windows-1254: 82 unique strings, 4 non-ASCII, 0 UTF-8.
    0247cf5a.spv: windows-1256: 236 unique strings, 5 non-ASCII, 0 UTF-8.
    026777ed.spv: 027b66dd.spv: windows-1252: 224 unique strings, 2 non-ASCII, 
0 UTF-8.
    02cc8c22.spv: windows-1254: 79 unique strings, 1 non-ASCII, 1 UTF-8.
    ...
    
    Others are entirely non-UTF-8:
    
    00029fbf.spv: windows-1251: 115 unique strings, 88 non-ASCII, 0 UTF-8.
    00f101f5.spv: windows-1252: 94 unique strings, 1 non-ASCII, 0 UTF-8.
    00ff0628.spv: windows-1252: 112 unique strings, 14 non-ASCII, 0 UTF-8.
    01d7942a.spv: windows-1251: 64 unique strings, 43 non-ASCII, 0 UTF-8.
    0203e88a.spv: windows-1254: 82 unique strings, 4 non-ASCII, 0 UTF-8.
    0247cf5a.spv: windows-1256: 236 unique strings, 5 non-ASCII, 0 UTF-8.
    027b66dd.spv: windows-1252: 224 unique strings, 2 non-ASCII, 0 UTF-8.
    03235aa7.spv: windows-1254: 198 unique strings, 18 non-ASCII, 0 UTF-8.
    07a43c5c.spv: ISO-8859-15: 124 unique strings, 13 non-ASCII, 0 UTF-8.
    07a85498.spv: windows-1254: 86 unique strings, 1 non-ASCII, 0 UTF-8.
    07a91f3e.spv: windows-1252: 111 unique strings, 13 non-ASCII, 0 UTF-8.
    0ad295d8.spv: windows-1252: 81 unique strings, 1 non-ASCII, 0 UTF-8.
    0ceb843b.spv: windows-1252: 3108 unique strings, 392 non-ASCII, 0 UTF-8.
    
    and still others are a mix:
    
    02f274e6.spv: windows-1252: 746 unique strings, 77 non-ASCII, 27 UTF-8.
    0a7be05b.spv: windows-1255: 334 unique strings, 122 non-ASCII, 113 UTF-8.
    4c7a575d.spv: windows-1250: 400 unique strings, 73 non-ASCII, 72 UTF-8.
    785b0737.spv: windows-1250: 322 unique strings, 164 non-ASCII, 158 UTF-8.
    94365e08.spv: windows-1250: 353 unique strings, 77 non-ASCII, 2 UTF-8.
    
    This commit just gives up and interprets any string that is valid UTF-8 as
    UTF-8.  So far, it works in practice.

commit f4cc5814b23b58ba474289e898bc96044c7a50b9
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Sat Jan 9 11:26:37 2021 -0800

    pspp-output: New "strings" developer command.

commit 7aee1bc71c08f2c5b69243cb1ca792c8e7615faa
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Sat Jan 9 11:24:22 2021 -0800

    string-array: New function string_array_uniq().

commit 64924ffc331a0aca047bf79d40b454c4da8f9021
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:20:42 2021 -0800

    pspp-output: Minor coding style, comment fixes.

commit 899e97f2730db331e3bce41c29f09d5f31164463
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:20:26 2021 -0800

    pspp-output: Don't write binary data to a terminal for dump-legacy-data.

commit 1e823de8e6b08940a6e96e6a209dd4f9e82d058b
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:20:07 2021 -0800

    pspp-output: Add --help-developer option.

commit 2b29f671e938c39ab6c4f198e685951552b9cae8
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:18:35 2021 -0800

    pivot-table: Minor coding style improvements.

commit b0269e285ccdf7acb8d6231f29d85b56cfdd676c
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:18:12 2021 -0800

    pivot-table: Tolerate nulls in pivot_value_clone().
    
    These members should not be null, but there is little cost and perhaps some
    benefit in allowing it.

commit 82eefd6de0852d5ec93771fb06786cdc8fd8ed6f
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:17:30 2021 -0800

    spv-legacy-decoder: Initialize all members for SPV_VALUE_TEXT.
    
    These were supposed to always be nonnull but this code didn't do it
    properly.

commit db9550227e0861cb1bfe139da6b0c6f389d7b368
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:16:11 2021 -0800

    spv-legacy-decoder: Set data_index and presentation_index in leaves.
    
    This allows the new assertions in clone_category() to pass.

commit 8491d88610f4a0c48891be493a4bd0522aec297b
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 22:14:47 2021 -0800

    spv-select: Allow structure_member and png_member to be selected also.

commit b96adcc0447f11136edb0a4e957fb6bd5b3c0d93
Author: Ben Pfaff <blp@cs.stanford.edu>
Date:   Fri Jan 8 21:24:13 2021 -0800

    pivot-table: Fix cut and paste error in pivot_value_clone().

-----------------------------------------------------------------------

Summary of changes:
 doc/dev/spv-file-format.texi        |  57 ++++++-
 src/libpspp/string-array.c          |  19 +++
 src/libpspp/string-array.h          |   1 +
 src/output/pivot-table.c            |  14 +-
 src/output/pivot-table.h            |   4 +
 src/output/spv/spv-legacy-decoder.c |   8 +
 src/output/spv/spv-light-decoder.c  | 327 +++++++++++++++++++++++++++++-------
 src/output/spv/spv-light-decoder.h  |   5 +
 src/output/spv/spv-select.c         |  25 ++-
 utilities/pspp-output.c             | 192 ++++++++++++++++++++-
 10 files changed, 565 insertions(+), 87 deletions(-)


hooks/post-receive
-- 
GNU PSPP



reply via email to

[Prev in Thread] Current Thread [Next in Thread]