bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] split: --chunks option


From: Chen Guo
Subject: Re: [PATCH] split: --chunks option
Date: Sun, 20 Dec 2009 06:19:00 -0800 (PST)

Hi guys,

    Below is the source code portion of the patch. It's ended up a lot bigger
than what I thought it would be; split.c has almost doubled in size. Feedback
is welcome, especially suggestions wrt parsing or --help output.

    I've also taken a couple liberties with the -t option... When --bytes=SIZE
is specified with an eol char, for example, i decided it would be logical to
interpret that as --line-bytes=SIZE. Hope this is what you guys feel as well.

    I've stopped short of writing documentation and tests because I have a few
questions:

    Documentation wise: regarding news and thanks/authorship, I'm guessing
you maintainers handle that? As for coreutils.texi, texinfo is new to me but it
seemsI'll only need to know the basics, so hopefully this can be done soon.

    Regarding the test script, how exhaustive do we want to be? I mean, a lot
ofstuff is being added, and a thorough list would be quite long:

The minimum would probably entail
  Check chunks by bytes:
    invoke --bytes=/N, diff output files with --bytes=K/N for each K
    cat output files and diff with original.
  Check chunks by eol:
    Like above, but with --lines instead of --bytes.
  Check chunks by round robin distribution:
    Invoke --round-robin=/N, with input file piped in, and diff output files 
with
    --bytes=K/N for each K.
    cat output files and sort, diff with sorted original.
  Check setting common delim characters with -t. '\0' comes to mind.

More thorough ones might be
  Check --number yields identical results as chunking with --bytes.
  Check each of -t escape sequences, such as -t\\t and \xFF. Checking
    octal and hex sequences would involve a several tests each.
  Check that -bN -tC equates -CN -tC, and -b/N -tC equates -l/N -tC
  Check -t for each functionality that is affected, meaning all 3 variations of
     --lines,
    -C, and both variations of --round-robin.
  Check that chunking with --bytes and --lines exits gracefully on pipe/stdin
    input.

    ...And so on. If we need to be absolutely thorough, I'll just think up the
more whileI'm writing it.

And here's the code.


From: Chen Guo <address@hidden>
Date: Sun, 20 Dec 2009 04:58:20 -0800
Subject: [PATCH] split: divide file into equal sized chunks; add -r and -t 
options.

Extend --bytes and --lines to divide file into N equal pieces, or
extract Kth of N said pieces. Add -n/--number alias for BSB
compatibility.

Add -r/--round-robin option to allow division and extraction of
chunks in round robin fashion, in support of nonseekable files.

Add -t/--term option to allow user to choose delineation character;
supports parsing C escape sequences such as \n or \xdd.

src/split.c: (eol): new global variable.
(usage, long_options, main): new options -n/--number, -r, and -t.
(bytes_split): add max_files argument. This allows for trivial
implementaton for byte chunking, similar to BSD.
(lines_split, line_bytes_split): delineate line by global eol char
instead of '\n'.
(lines_chunk_split): new function. Split file into eol delineated
chunks.
(bytes_chunk_extract): new function. Extract a chunk of file.
(lines_chunk_extract): new function. Extract a eol delineated chunk
of file.
(of_info): new struct. Used by new functions lines_rr and ofd_check
to keep track of file descriptors associated with output files.
(ofd_check): new function. Shuffle file descriptors in case output
files out number available file descriptors.
(lines_rr): new function. Split file into chunks in round-robin
fashion.
(lines_rr_extract): new function. Extract a chunk of file, as if
chunks were created in round-robin fashion.
(chunk_parse): new function. Parses /N and K/N syntax.
(eol_parse): new function. Parses -t option argument.
---
 src/split.c |  578 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 567 insertions(+), 11 deletions(-)

diff --git a/src/split.c b/src/split.c
index d1a0e0d..3b9abdd 100644
--- a/src/split.c
+++ b/src/split.c
@@ -17,8 +17,7 @@
 /* By address@hidden, with rms.
 
    To do:
-   * Implement -t CHAR or -t REGEX to specify break characters other
-     than newline. */
+   * Extend -t CHAR to -t REGEX */
 
 #include <config.h>
 
@@ -72,6 +71,9 @@ static int output_desc;
    output file is opened. */
 static bool verbose;
 
+/* End of line character */
+static char eol;
+
 /* For long options that have no equivalent short option, use a
    non-character as a pseudo short option, starting with CHAR_MAX + 1.  */
 enum
@@ -84,8 +86,11 @@ static struct option const longopts[] =
   {"bytes", required_argument, NULL, 'b'},
   {"lines", required_argument, NULL, 'l'},
   {"line-bytes", required_argument, NULL, 'C'},
+  {"number", required_argument, NULL, 'n'},
+  {"round-robin", required_argument, NULL, 'r'},
   {"suffix-length", required_argument, NULL, 'a'},
   {"numeric-suffixes", no_argument, NULL, 'd'},
+  {"term", required_argument, NULL, 't'},
   {"verbose", no_argument, NULL, VERBOSE_OPTION},
   {GETOPT_HELP_OPTION_DECL},
   {GETOPT_VERSION_OPTION_DECL},
@@ -116,9 +121,23 @@ Mandatory arguments to long options are mandatory for 
short options too.\n\
       fprintf (stdout, _("\
   -a, --suffix-length=N   use suffixes of length N (default %d)\n\
   -b, --bytes=SIZE        put SIZE bytes per output file\n\
+  -b, --bytes=/N          generate N output files\n\
+  -b, --bytes=K/N         print Kth of N chunks of file\n\
   -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file\n\
   -d, --numeric-suffixes  use numeric suffixes instead of alphabetic\n\
   -l, --lines=NUMBER      put NUMBER lines per output file\n\
+  -l, --lines=/N          generate N eol delineated output files\n\
+  -l, --lines=K/N         print Kth of N eol delineated chunks\n\
+  -n, --number=N          same as --bytes=/N\n\
+  -n, --number=K/N        same as --bytes=K/N\n\
+  -r, --round-robin=N     generate N eol delineated output files using\n\
+                            round-robin style distribution.\n\
+  -r. --round-robin=K/N   print Kth of N eol delineated chunk as -rN would\n\
+                            have generated.\n\
+  -t, --term=CHAR         specify CHAR as eol. This will also convert\n\
+                            options such as -b to their delineated\n\
+                            equivalent (-l or -C, depending on context). C\n\
+                            escape sequences are accepted.\n\
 "), DEFAULT_SUFFIX_LENGTH);
       fputs (_("\
       --verbose           print a diagnostic just before each\n\
@@ -218,13 +237,14 @@ cwrite (bool new_file_flag, const char *bp, size_t bytes)
    Use buffer BUF, whose size is BUFSIZE.  */
 
 static void
-bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize)
+bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize, uintmax_t max_files)
 {
   size_t n_read;
   bool new_file_flag = true;
   size_t to_read;
   uintmax_t to_write = n_bytes;
   char *bp_out;
+  uintmax_t opened = 1;
 
   do
     {
@@ -251,7 +271,8 @@ bytes_split (uintmax_t n_bytes, char *buf, size_t bufsize)
               cwrite (new_file_flag, bp_out, w);
               bp_out += w;
               to_read -= w;
-              new_file_flag = true;
+              new_file_flag = (opened++ < max_files || !max_files)?
+                              true : false;
               to_write = n_bytes;
             }
         }
@@ -277,10 +298,10 @@ lines_split (uintmax_t n_lines, char *buf, size_t bufsize)
         error (EXIT_FAILURE, errno, "%s", infile);
       bp = bp_out = buf;
       eob = bp + n_read;
-      *eob = '\n';
+      *eob = eol;
       for (;;)
         {
-          bp = memchr (bp, '\n', eob - bp + 1);
+          bp = memchr (bp, eol, eob - bp + 1);
           if (bp == eob)
             {
               if (eob != bp_out) /* do not write 0 bytes! */
@@ -340,7 +361,7 @@ line_bytes_split (size_t n_bytes)
       bp = buf + n_buffered;
       if (n_buffered == n_bytes)
         {
-          while (bp > buf && bp[-1] != '\n')
+          while (bp > buf && bp[-1] != eol)
             bp--;
         }
 
@@ -362,6 +383,330 @@ line_bytes_split (size_t n_bytes)
   free (buf);
 }
 
+/* Split into NUMBER eol chunks. */
+
+static void
+lines_chunk_split (size_t number, char *buf, size_t bufsize, size_t file_size)
+{
+  size_t n_read;
+  size_t chunk_no = 1;
+  off_t chunk_end = file_size / number - 1;
+  off_t offset = 0;
+  bool new_file_flag = true;
+  char *bp, *bp_out, *eob;
+
+  while (offset < file_size)
+    {
+      n_read = full_read (STDIN_FILENO, buf, bufsize);
+      if (n_read == SAFE_READ_ERROR)
+        error (EXIT_FAILURE, errno, "%s", infile);
+      bp = buf;
+      eob = buf + n_read;
+
+      while (1)
+        {
+          /* Begin lookng for eol at last byte of chunk. */
+          bp_out = (offset < chunk_end)? bp + chunk_end - offset : bp;
+          if (bp_out > eob)
+            bp_out = eob;
+          bp_out = memchr (bp_out, eol, eob - bp_out);
+          if (!bp_out)
+            {
+              /* Buffer exhausted. */
+              cwrite (new_file_flag, bp, eob - bp);
+              new_file_flag = false;
+              offset += eob - bp;
+              break;
+           }
+          else
+            bp_out++;
+
+          cwrite (new_file_flag, bp, bp_out - bp);
+          chunk_end = (++chunk_no < number)?
+                       chunk_end + file_size / number : file_size;
+          new_file_flag = true;
+          offset += bp_out - bp;
+          bp = bp_out;
+          /* A line could have been so long that it skipped
+             entire chunks. */
+          while (chunk_end < offset)
+            {
+              chunk_end += file_size / number;
+              chunk_no++;
+              /* Create blank file: this ensures NUMBER files are
+                 created. */
+              cwrite (true, bp, 0);
+            }
+        }
+    }
+}
+
+/* Extract Nth of TOTAL chunks. */
+
+static void
+bytes_chunk_extract (size_t n, size_t total, char *buf, size_t bufsize,
+                     size_t file_size)
+{
+  off_t start = (n == 0)? 0 : (n - 1) * (file_size / total);
+  off_t end = (n == total)? file_size : n * (file_size / total);
+  ssize_t n_read;
+  size_t n_write;
+
+  while (1)
+    {
+      n_read = pread (STDIN_FILENO, buf, bufsize, start);
+      if (n_read < 0)
+        error (EXIT_FAILURE, errno, "%s", infile);
+      n_write = (start + n_read <= end)? n_read : end - start;
+      if (full_write (STDOUT_FILENO, buf, n_write) != n_write)
+        error (EXIT_FAILURE, errno, "output error");
+      start += n_read;
+      if (end <= start)
+        return;
+    }
+}
+
+/* Extract lines whose first byte is in the Nth of TOTAL chunks. */
+
+static void
+lines_chunk_extract (size_t n, size_t total, char* buf, size_t bufsize,
+                     size_t file_size)
+{
+  ssize_t n_read;
+  bool end_of_chunk = false;
+  bool skip = true;
+  char *bp = buf, *bp_out = buf, *eob;
+  off_t start;
+  off_t end;
+
+  /* For n != 1, start reading 1 byte before nth chunk of file. This is to
+     detect if the first byte of chunk is the first byte of a line. */
+  if (n == 1)
+    {
+      start = 0;
+      skip = false;
+    }
+  else
+    start = (n - 1) * (file_size / total) - 1;
+  end = (n == total)? file_size - 1 : n * (file_size / total) - 1;
+
+  do
+    {
+      n_read = pread (STDIN_FILENO, buf, bufsize, start);
+      //      fprintf (stderr, "n_read %u\n", n_read);
+      if (n_read < 0)
+        error (EXIT_FAILURE, errno, "%s", infile);
+      bp = buf;
+      bp_out = buf + n_read;
+      eob = bp_out;
+
+      /* Find starting point. */
+      if (skip)
+        {
+          bp = memchr (buf, eol, n_read);
+          if (bp && bp - buf < end - start)
+            {
+              bp++;
+              skip = false;
+            }
+          else if (!bp && start + n_read < end)
+            {
+              start += n_read;
+              continue;
+            }
+          else
+            return;
+        }
+
+      /* Find ending point. */
+      if (end < start + n_read && end == file_size - 1)
+         end_of_chunk = true;
+      else if (start + n_read >= end)
+        {
+          bp_out = (buf + end - start < buf)? buf : buf + end - start;
+          bp_out = memchr (bp_out, eol, eob - bp_out);
+          if (bp_out)
+            {
+              bp_out++;
+              end_of_chunk = true;
+            }
+          else
+            bp_out = eob;
+        }
+
+      //      fprintf (stderr, "wrote %u, %c\n", bp_out - bp, *bp);
+      if (write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp)
+        error (EXIT_FAILURE, errno, "output error");
+      start += n_read;
+    }
+  while (!end_of_chunk);
+}
+
+
+
+typedef struct of_info
+{
+  char *of_name;
+  int ofd;
+} of_t;
+
+/* Rotates file descriptors when we're writing to more output files than we
+   have available file descriptors. */
+
+static void
+ofd_check (of_t *ofiles, size_t i, size_t n)
+{
+  if (0 < ofiles[i].ofd)
+    return;
+  else
+    {
+      int fd;
+      int j = i - 1;
+
+      /* Another process could have opened a file in between the calls to
+         close and open, so we should keep trying until open succeeds or
+         we've closed all of our files. */
+      while (1)
+        {
+          /* Attempt to open file. */
+          fd = open (ofiles[i].of_name,
+                     O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
+                     (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP
+                      | S_IROTH | S_IWOTH));
+          if (-1 < fd)
+            break;
+          /* Find an open file to close. */
+          while (ofiles[j].ofd < 0)
+            {
+              if (--j == 0)
+                j = n - 1;
+              /* No more open files to close, exit with failure. */
+              if (j == i)
+                error (EXIT_FAILURE, 0, "%s", ofiles[i].of_name);
+            }
+          close (ofiles[j].ofd);
+        }
+      ofiles[i].ofd = fd;
+    }
+}
+
+/* Divide file into N chunks in round robin fashion. */
+
+static void
+lines_rr (size_t n, char *buf, size_t bufsize)
+{
+  of_t *ofiles = xnmalloc (n, sizeof *ofiles);
+  char *bp, *bp_out, *eob;
+  size_t n_read;
+  bool eof = false;
+  size_t i;
+  bool inc;
+
+  /* Generate output file names. */
+  for (i = 0; i < n; i++)
+    {
+      next_file_name ();
+      ofiles[i].of_name = xmalloc (strlen (outfile) + 1);
+      strcpy (ofiles[i].of_name, outfile);
+      ofiles[i].ofd = -1;
+    }
+  i = 0;
+
+  do
+    {
+      n_read = full_read (STDIN_FILENO, buf, bufsize);
+      if (n_read == SAFE_READ_ERROR)
+        error (EXIT_FAILURE, errno, "%s", infile);
+      if (n_read < bufsize)
+        {
+          if (n_read == 0)
+            break;
+          eof = true;
+        }
+      bp = buf;
+      eob = buf + n_read;
+
+
+      while (bp != eob)
+        {
+          /* Find end of line. */
+          bp_out = memchr (bp, eol, eob - bp);
+          if (bp_out)
+            {
+              bp_out++;
+              inc = true;
+            }
+          else
+            bp_out = eob;
+
+          /* Secure file descriptor. */
+          ofd_check (ofiles, i, n);
+
+          if (full_write (ofiles[i].ofd, bp, bp_out - bp) != bp_out - bp)
+            error (EXIT_FAILURE, errno, "%s", ofiles[i].of_name);
+          if (inc && ++i == n)
+            i = 0;
+          bp = bp_out;
+          inc = false;
+        }
+    }
+  while (!eof);
+
+  /* Close any open file descriptors. */
+  for (i = 0; i < n; i++)
+    if (-1 < ofiles[i].ofd)
+      close (ofiles[i].ofd);
+}
+
+/* Extract Nth of TOT eol delineated, round robin distributed chunks. */
+
+static void
+lines_rr_extract (uintmax_t n, uintmax_t tot, char *buf, size_t bufsize)
+{
+  int line_no = 1;
+  char *bp, *bp_out, *eob;
+  size_t n_read;
+  bool eof = false;
+  bool inc = false;
+
+  do
+    {
+      n_read = full_read (STDIN_FILENO, buf, bufsize);
+      if (n_read == SAFE_READ_ERROR)
+        error (EXIT_FAILURE, errno, "%s", infile);
+      if (n_read != bufsize)
+        {
+          if (n_read == 0)
+            break;
+          eof = true;
+        }
+      bp = buf;
+      eob = buf + n_read;
+
+      while (bp != eob)
+        {
+          /* Find end of line. */
+          bp_out = memchr (bp, eol, eob - bp);
+          if (bp_out)
+            {
+              bp_out++;
+              inc = true;
+            }
+          else
+            bp_out = eob;
+
+          if (line_no == n
+              && full_write (STDOUT_FILENO, bp, bp_out - bp) != bp_out - bp)
+            error (EXIT_FAILURE, errno, "output error");
+          if (inc)
+            line_no = (line_no == tot)? 1 : line_no + 1;
+          bp = bp_out;
+          inc = false;
+        }
+    }
+  while (!eof);
+}
+
 #define FAIL_ONLY_ONE_WAY()                    \
   do                                \
     {                                \
@@ -370,21 +715,140 @@ line_bytes_split (size_t n_bytes)
     }                                \
   while (0)
 
+/* Parse K/N syntax of chunk options. */
+
+static void
+chunk_parse (uintmax_t *m_units, uintmax_t *n_units, char *slash)
+{
+  *slash = '\0';
+  if (slash != optarg
+      && xstrtoumax (optarg, NULL, 10, m_units, "") != LONGINT_OK
+      || SIZE_MAX < *m_units)
+    {
+      error (0, 0, _("%s: invalid chunk number"), optarg);
+      usage (EXIT_FAILURE);
+    }
+  if (xstrtoumax (++slash, NULL, 10, n_units, "") != LONGINT_OK
+      || *n_units == 0 || *n_units < *m_units || SIZE_MAX < *n_units)
+    {
+      error (0, 0, _("%s: invalid number of total chunks"), slash);
+      usage (EXIT_FAILURE);
+    }
+}
+
+/* Parse eol character for -t option.
+   TODO: support octal and hex escape sequences? */
+
+static void
+eol_parse ()
+{
+  if (*optarg == '\\')
+    switch (*(optarg+1))
+      {
+      case 'a':
+        eol = '\a';
+        break;
+
+      case 'b':
+        eol = '\b';
+        break;
+
+      case 'f':
+        eol = '\f';
+        break;
+
+      case 'n':
+        eol = '\n';
+        break;
+
+      case 'r':
+        eol = '\r';
+        break;
+
+      case 't':
+        eol = '\t';
+        break;
+
+      case 'v':
+        eol = '\v';
+        break;
+
+      case '\'':
+        eol = '\'';
+        break;
+
+      case '\"':
+        eol = '\"';
+        break;
+
+      case '?':
+        eol = '\?';
+        break;
+
+      case '\\':
+        eol = '\\';
+        break;
+
+      case '0':
+      case '1':
+      case '2':
+      case '3':
+      case '4':
+      case '5':
+      case '6':
+      case '7':
+        {
+          char *term;
+          long int tmp;
+          if (xstrtol (optarg + 1, &term, 8, &tmp, "") != LONGINT_OK
+              || tmp < 0 || 255 < tmp ||4 + optarg < term || *term != 0)
+            error (EXIT_FAILURE, 0, _("%s: invalid octal esacpe sequence"),
+                   optarg);
+          eol = (char) tmp;
+          break;
+        }
+
+      case 'x':
+        {
+          char *term;
+          long int tmp;
+          if (xstrtol (optarg + 2, &term, 16, &tmp, "") != LONGINT_OK
+              || tmp < 0 || 255 < tmp || 4 + optarg < term || *term != 0)
+            error (EXIT_FAILURE, 0, _("%s: invalid hex escape sequence"),
+                   optarg);
+          eol = (char) tmp;
+          break;
+        }
+
+      default:
+        error (0, 0, _("%s: invalid escape sequence"), optarg);
+        usage (EXIT_FAILURE);
+      }
+  else
+    eol = *optarg;
+}
+
+
 int
 main (int argc, char **argv)
 {
   struct stat stat_buf;
   enum
     {
-      type_undef, type_bytes, type_byteslines, type_lines, type_digits
+      type_undef, type_bytes, type_byteslines, type_lines, type_digits,
+      type_chunk_bytes, type_chunk_eol, type_rr
     } split_type = type_undef;
   size_t in_blk_size;        /* optimal block size of input file device */
   char *buf;            /* file i/o buffer */
   size_t page_size = getpagesize ();
+  uintmax_t m_units = 0;
   uintmax_t n_units;
   static char const multipliers[] = "bEGKkMmPTYZ0";
   int c;
   int digits_optind = 0;
+  size_t file_size;
+  char *slash;
+  bool eol_char = false;
 
   initialize_main (&argc, &argv);
   set_program_name (argv[0]);
@@ -404,7 +868,7 @@ main (int argc, char **argv)
       /* This is the argv-index of the option we will read next.  */
       int this_optind = optind ? optind : 1;
 
-      c = getopt_long (argc, argv, "0123456789C:a:b:dl:", longopts, NULL);
+      c = getopt_long (argc, argv, "0123456789C:a:b:c:dl:n:r:t:", longopts, 
NULL);
       if (c == -1)
         break;
 
@@ -426,6 +890,13 @@ main (int argc, char **argv)
         case 'b':
           if (split_type != type_undef)
             FAIL_ONLY_ONE_WAY ();
+          slash = strchr (optarg, '/');
+          if (slash)
+            {
+              split_type = type_chunk_bytes;
+              chunk_parse (&m_units, &n_units, slash);
+              break;
+            }
           split_type = type_bytes;
           if (xstrtoumax (optarg, NULL, 10, &n_units, multipliers) != 
LONGINT_OK
               || n_units == 0)
@@ -438,6 +909,13 @@ main (int argc, char **argv)
         case 'l':
           if (split_type != type_undef)
             FAIL_ONLY_ONE_WAY ();
+          slash = strchr (optarg, '/');
+          if (slash)
+            {
+              split_type = type_chunk_eol;
+              chunk_parse (&m_units, &n_units, slash);
+              break;
+            }
           split_type = type_lines;
           if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK
               || n_units == 0)
@@ -459,6 +937,42 @@ main (int argc, char **argv)
             }
           break;
 
+        case 'n':
+          if (split_type != type_undef)
+            FAIL_ONLY_ONE_WAY ();
+          split_type = type_chunk_bytes;
+          slash = strchr (optarg, '/');
+          if (slash)
+            {
+              chunk_parse (&m_units, &n_units, slash);
+              break;
+            }
+          if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK
+              || n_units == 0 || SIZE_MAX < n_units)
+            {
+              error (0, 0, _("%s: invalid number of chunks"), optarg);
+              usage (EXIT_FAILURE);
+            }
+          break;
+
+        case 'r':
+          if (split_type != type_undef)
+            FAIL_ONLY_ONE_WAY ();
+          split_type = type_rr;
+          slash = strchr (optarg, '/');
+          if (slash)
+            {
+              chunk_parse (&m_units, &n_units, slash);
+              break;
+            }
+          if (xstrtoumax (optarg, NULL, 10, &n_units, "") != LONGINT_OK
+              || n_units == 0 || SIZE_MAX < n_units)
+            {
+              error (0, 0, _("%s: invalid number of chunks"), optarg);
+              usage (EXIT_FAILURE);
+            }
+          break;
+
         case '0':
         case '1':
         case '2':
@@ -492,6 +1006,12 @@ main (int argc, char **argv)
           suffix_alphabet = "0123456789";
           break;
 
+        case 't':
+          eol_parse ();
+          eol_char = true;
+          fprintf (stderr, "%u\n", (uint8_t) eol);
+          break;
+
         case VERBOSE_OPTION:
           verbose = true;
           break;
@@ -505,6 +1025,17 @@ main (int argc, char **argv)
         }
     }
 
+  /* Default eol to \n if none specified. */
+  if (!eol_char)
+    eol = '\n';
+  else
+    {
+      if (split_type == type_chunk_bytes)
+        split_type = type_chunk_eol;
+      if (split_type == type_bytes)
+        split_type = type_byteslines;
+    }
+
   /* Handle default case.  */
   if (split_type == type_undef)
     {
@@ -546,10 +1077,14 @@ main (int argc, char **argv)
   output_desc = -1;
 
   /* Get the optimal block size of input device and make a buffer.  */
-
   if (fstat (STDIN_FILENO, &stat_buf) != 0)
     error (EXIT_FAILURE, errno, "%s", infile);
   in_blk_size = io_blksize (stat_buf);
+  file_size = stat_buf.st_size;
+
+  if (split_type == type_chunk_bytes || split_type == type_chunk_eol)
+    if (file_size < n_units)
+      error (EXIT_FAILURE, errno, "number of chunks exceed file size");
 
   buf = ptr_align (xmalloc (in_blk_size + 1 + page_size - 1), page_size);
 
@@ -561,13 +1096,34 @@ main (int argc, char **argv)
       break;
 
     case type_bytes:
-      bytes_split (n_units, buf, in_blk_size);
+      bytes_split (n_units, buf, in_blk_size, 0);
       break;
 
     case type_byteslines:
       line_bytes_split (n_units);
       break;
 
+    case type_chunk_bytes:
+      if (m_units == 0)
+        bytes_split (file_size / n_units, buf, in_blk_size, n_units);
+      else
+        bytes_chunk_extract (m_units, n_units, buf, in_blk_size, file_size);
+      break;
+
+    case type_chunk_eol:
+      if (m_units == 0)
+        lines_chunk_split (n_units, buf, in_blk_size, file_size);
+      else
+        lines_chunk_extract (m_units, n_units, buf, in_blk_size, file_size);
+      break;
+
+    case type_rr:
+      if (m_units == 0)
+        lines_rr (n_units, buf, in_blk_size);
+      else
+        lines_rr_extract (m_units, n_units, buf, in_blk_size);
+      break;
+
     default:
       abort ();
     }
-- 
1.6.3.3




reply via email to

[Prev in Thread] Current Thread [Next in Thread]