help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] Is there a way to read the first empty field in a TSV in


From: Greg Wooledge
Subject: Re: [Help-bash] Is there a way to read the first empty field in a TSV input?
Date: Fri, 6 Oct 2017 08:51:34 -0400
User-agent: NeoMutt/20170113 (1.7.2)

On Thu, Oct 05, 2017 at 11:37:35PM +0200, Felipe Salvador wrote:
> On Thu, Sep 28, 2017 at 12:05:10PM -0400, Greg Wooledge wrote:
> > 2) Convert the separators into some other character that bash treats
> >    the way you want.
> > 
> > For example:
> > 
> > IFS=$'\005' read -r a b c < <(tr $'\t' $'\005' <<< $'\tx\ty')
> 
> 
> IFS=$'\005' read -r A B C D < <(tr $'\t' $'\005' <<< $'\t2\t4')
> 
> Hi,
> I'm a bit confused, is IFS=$'\005' treated as a field?
> If so I would expect $A= ,$B=2,$C= ,$D=4 an so on...

IFS is a list of characters that may act as field separators/terminators.
In this example, IFS has been set to the single character 0x05 (Ctrl-E,
or ASCII "ENQ").  0x05 is the only charater that 'read' will use to
separate input fields.

The input that's sent to read is a stream of 5 characters:

0x05 2 0x05 4 \n

Therefore 'read' will split it into fields as follows:

first field empty
second field '2'
third field '4'

> But I get $A= ,$B=2,$C=4,$D=

That's correct.

Remember, the entire PURPOSE of this example was to take a tab-separated-
value input file and parse it in such a way that there can be empty
fields before/between the tabs.

The way bash normally handles tabs in IFS is to consider a sequence of
multiple tabs as a single field separator, and to ignore leading and
trailing tabs.

The OP wanted each tab to be significant, as if they were commas or
pipe signs or colons.

The proposed workaround is to transform the tabs into something that
isn't treated as whitespace by bash.

If this example isn't clicking for you, then let's try colons instead
of 0x05 characters.  (This can't be used if there are possibly colons
in the actual input data.)

wooledg:~$ tr '\t' : <<< $'\t2\t4'
:2:4
wooledg:~$ IFS=: read -r a b c d < <(tr '\t' : <<< $'\t2\t4')
wooledg:~$ declare -p a b c d
declare -- a=""
declare -- b="2"
declare -- c="4"
declare -- d=""

Using 0x05 instead of : is exactly the same, except that 0x05 is a bit
less likely to appear in the actual input.

> , while echoing \"$'\t2\t4'\"
> return "       2       4" correctly.

What are you trying to do, exactly?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]