bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] split: --chunks option


From: Pádraig Brady
Subject: Re: [PATCH] split: --chunks option
Date: Wed, 16 Dec 2009 02:22:20 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0

On 15/12/09 22:46, Chen Guo wrote:
Hey guys,
     Alright as of now everything that we originally talked about has been
implemented; tests were done on an 980K ASCII file while limiting buffer
size to 2 bytes (will test on binary later). Everything works great

great stuff.

As of now, I've got the following syntax:
     -bN or --bytes=N is original usage
     -b/N or --bytes=/N is split into N equal sized files
     -bK/N or --bytes=K/N is extract Kth of N equal chunks to stdout
-nN is equivalent to -b/N, and -nK/N is equivalent to -bK/N

Right. Also doing -n lines:4 would allow one to specify
a distribution method which may be required. I.E. this
could be used to specify round robin distribution of lines
which might be required.

-n lines-rr:4
I haven't handled the non-seekable file case yet, but yeah this works.
As for extracting byte-chunks to stdout, I see no other way than to
read from the file's start and start outputting when the desired chunk
is read.

Also specifying other delimiters might be useful like:

-n nul:4

Actually at the top of split.c I see a TODO that talks about a -t option
which specifies a CHAR or REGEX deliminator. REGEX might be
kind of complicated, but a delim char as a global char eol should
be trivial to implement. We can leave eol = '\n' by default, and the -t
option can override it.

Right, -t is probably more general as it would also support
the existing --lines option.

But then this begs the question... How would you enter say, '\0' into
the terminal? And the way I know of entering newline is rather awkward:
-t '
'

I'd probably use escapes like the join command does.
For example, it supports: -t '\0'

For reference bash and ksh support ansi c quoting like $'\0'
so you could specify -t $'\0'. Also more generally one could do:
-t $(printf '\0'), though I wouldn't depend on those being available,
and also passing NULs at least through the command line will be problematic.

And last thing, would I be wrong to say we can't support splitting by
chunks with stdin? Barring of course, the round robin line splitting.

Right, that's all I can see possible for non seekable files.

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]