bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Patches for --vnlog support


From: Erik Auerswald
Subject: Re: Patches for --vnlog support
Date: Wed, 10 Aug 2022 10:00:56 +0200

Hi Tim,

On Tue, Aug 09, 2022 at 09:08:24PM +0000, Tim Rice wrote:
> 
> I've had a play around with the vnlog patches and done some tidying
> up, both to help them make sense to me, and also to make them more
> consistent with GNU and Datamash code style [1].
> 
> [1] https://www.gnu.org/prep/standards/standards.html#Writing-C

Thanks!

> The attached diff is against commit 5fd50ec. The default test suite
> is currently passing.

I've skimmed your patch and would like to add some comments (this is
not exhaustive).

> [...]
> diff --git c/src/datamash.c w/src/datamash.c
> index 8c0d4ce..88b9292 100644
> --- c/src/datamash.c
> +++ w/src/datamash.c
> @@ -76,12 +76,6 @@ static size_t line_number = 0 ;
>  /* Lines in the current group */
>  static size_t lines_in_group = 0 ;
>  
> -/* Print Output Header */
> -static bool output_header = false;
> -
> -/* Input file has a header line */
> -static bool input_header = false;
> -

I think those changes are no longer appropriate.  See also below where
those initializations have been moved to.

>  /* If true, print the entire input line. Otherwise, print only the key 
> fields */
>  static bool print_full_line = false;
>  
> @@ -115,6 +109,7 @@ enum
>    OUTPUT_DELIMITER_OPTION,
>    CUSTOM_FORMAT_OPTION,
>    SORT_PROGRAM_OPTION,
> +  VNLOG_OPTION,
>    UNDOC_PRINT_INF_OPTION,
>    UNDOC_PRINT_NAN_OPTION,
>    UNDOC_PRINT_PROGNAME_OPTION,
> @@ -134,6 +129,8 @@ static struct option const long_options[] =
>    {"header-in", no_argument, NULL, INPUT_HEADER_OPTION},
>    {"header-out", no_argument, NULL, OUTPUT_HEADER_OPTION},
>    {"headers", no_argument, NULL, 'H'},
> +  {"vnlog", no_argument, NULL, VNLOG_OPTION},
> +  {"vnl", no_argument, NULL, VNLOG_OPTION},

There is no need to add a prefix of a long option as a separate
long option.  GNU's getopt_long() will automatically accept any unique
prefix of a long option[0].  You can try this out using "datamash --v",
because "v" is currently a unique prefix of the "version" option.  This,
of course, would change with the addition of --vnlog, but "--ve" and
"--vn" would suffice to select "--version" resp. "--vnlog".  I'd say
it is quite likely that "--vnl" will stay a unique prefix selecting
"--vnlog" for quite some time.

[0] The getopt_long() documentation[1] contains the following:

    "The option name may be abbreviated as long as the abbreviation
    is unique."

[1] https://www.gnu.org/software/libc/manual/html_node/Getopt-Long-Options.html

>    {"full", no_argument, NULL, 'f'},
>    {"filler", required_argument, NULL, 'F'},
>    {"format", required_argument, NULL, CUSTOM_FORMAT_OPTION},
> @@ -253,6 +250,10 @@ which require a pair of fields (e.g. 'pcov 2:6').\n"), 
> stdout);
>  "), stdout);
>        fputs (_("\
>    -H, --headers             same as '--header-in --header-out'\n\
> +"), stdout);
> +      fputs (_("\
> +  --vnl, --vnlog            Reads and writes data in the vnlog format.\n\
> +                              Implies -C -H -W\n\

I think that there should be just one option name, and thus just one
option name documented.

This new option should also be added to the texinfo documentation.

> [...]
> diff --git c/src/text-lines.c w/src/text-lines.c
> index 2d4544b..dbcb892 100644
> --- c/src/text-lines.c
> +++ w/src/text-lines.c
> @@ -34,6 +34,7 @@
>  
>  #include "text-options.h"
>  #include "text-lines.h"
> +#include "die.h"
>  
>  void
>  line_record_init (struct line_record_t* lr)
> @@ -91,21 +92,32 @@ line_record_reserve_fields (struct line_record_t* lr, 
> const size_t n)
>  }
>  
>  static void
> -line_record_parse_fields (struct line_record_t *lr, int field_delim)
> +line_record_parse_fields (/* The buffer. May or may not be the one in the
> +                             following argument */
> +                          const struct linebuffer* lbuf,
> +
> +                          /* Used ONLY for the fields. The buffer is picked 
> up
> +                             from the above argument */
> +                          struct line_record_t *lr,
> +                          int field_delim,
> +                          bool ignore_trailing_comments,
> +                          bool ignore_trailing_whitespace)
>  {
>    size_t num_fields = 0;
>    size_t pos = 0;
> -  const size_t buflen = line_record_length (lr);
> -  const char* fptr = line_record_buffer (lr);
> +  const size_t buflen = lbuf->length;
> +  const char*  fptr   = lbuf->buffer;
> +
> +#define IS_TRAILING_COMMENT (ignore_trailing_comments && (*fptr == '#' || 
> *fptr == ';'))

This is problematic, because vnlog inline comments are always started
with a Number Sign ('#'), not a Semicolon (';').  I would suggest to
remove the "|| *fptr == ';'" part from the macro.

> [...]
> diff --git c/src/text-options.c w/src/text-options.c
> index 7739d8f..1ddb1a7 100644
> --- c/src/text-options.c
> +++ w/src/text-options.c
> @@ -67,6 +67,14 @@ char* missing_field_filler = "N/A";
>     followed by '#' or ';'. See line_record_is_comment.  */
>  bool skip_comments = false;
>  
> +/* Print Output Header */
> +bool output_header = false;
> +
> +/* Input file has a header line */
> +bool input_header = false;

I am quite sure that those variables are only used in src/datamash.c.
Thus I do not see a reason to move the initialization here.

> [...]
> diff --git c/src/text-options.h w/src/text-options.h
> index 4282016..46ac37d 100644
> --- c/src/text-options.h
> +++ w/src/text-options.h
> @@ -68,6 +68,14 @@ extern char* missing_field_filler;
> [...]
> +/* Print Output Header */
> +extern bool output_header;
> +
> +/* Input file has a header line */
> +extern bool input_header;
> +

I am quite sure that those variables are only used in src/datamash.c.
Thus I do not see a reason to move the definition here.

> [...]
> diff --git c/tests/datamash-vnlog.pl w/tests/datamash-vnlog.pl
> new file mode 100755

I'd like to have a test with an input field containing a Semicolon (';')
to ensure this is not treated as a comment, because it is none in vnlog.

Thanks,
Erik



reply via email to

[Prev in Thread] Current Thread [Next in Thread]