help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] Is there a way to read the first empty field in a TSV in


From: Felipe Salvador
Subject: Re: [Help-bash] Is there a way to read the first empty field in a TSV input?
Date: Sat, 7 Oct 2017 16:39:34 +0200
User-agent: NeoMutt/20170113 (1.7.2)

On Fri, Oct 06, 2017 at 08:51:34AM -0400, Greg Wooledge wrote:
> On Thu, Oct 05, 2017 at 11:37:35PM +0200, Felipe Salvador wrote:
> > On Thu, Sep 28, 2017 at 12:05:10PM -0400, Greg Wooledge wrote:
> > > 2) Convert the separators into some other character that bash treats
> > >    the way you want.
> > > 
> > > For example:
> > > 
> > > IFS=$'\005' read -r a b c < <(tr $'\t' $'\005' <<< $'\tx\ty')
> > 
> > 
> > IFS=$'\005' read -r A B C D < <(tr $'\t' $'\005' <<< $'\t2\t4')
> > 
> > Hi,
> > I'm a bit confused, is IFS=$'\005' treated as a field?
> > If so I would expect $A= ,$B=2,$C= ,$D=4 an so on...
> 
> IFS is a list of characters that may act as field separators/terminators.
> In this example, IFS has been set to the single character 0x05 (Ctrl-E,
> or ASCII "ENQ").  0x05 is the only charater that 'read' will use to
> separate input fields.
> 
> The input that's sent to read is a stream of 5 characters:
> 
> 0x05 2 0x05 4 \n
> 
> Therefore 'read' will split it into fields as follows:
> 
> first field empty
> second field '2'
> third field '4'
> 
> > But I get $A= ,$B=2,$C=4,$D=
> 
> That's correct.
> 
> Remember, the entire PURPOSE of this example was to take a tab-separated-
> value input file and parse it in such a way that there can be empty
> fields before/between the tabs.
> 
> The way bash normally handles tabs in IFS is to consider a sequence of
> multiple tabs as a single field separator, and to ignore leading and
> trailing tabs.

I was wrongly considering $'\005' as a field rather than a separator,
as below:

$'\005'|  2  |$'\005'|  4  

Now I get it:

$IFS=$'\005' read -r A B C D E F G < <(tr $'\t' $'\005' <<< $'\t2\t\t4\t\t6\t\t'

$ declare -p A B C D E F G
declare -- A=""
declare -- B="2"
declare -- C=""
declare -- D="4"
declare -- E=""
declare -- F="6"
declare -- G=""



> The OP wanted each tab to be significant, as if they were commas or
> pipe signs or colons.
> 
> The proposed workaround is to transform the tabs into something that
> isn't treated as whitespace by bash.
> 
> If this example isn't clicking for you, then let's try colons instead
> of 0x05 characters.  (This can't be used if there are possibly colons
> in the actual input data.)
> 
> wooledg:~$ tr '\t' : <<< $'\t2\t4'
> :2:4
> wooledg:~$ IFS=: read -r a b c d < <(tr '\t' : <<< $'\t2\t4')
> wooledg:~$ declare -p a b c d
> declare -- a=""
> declare -- b="2"
> declare -- c="4"
> declare -- d=""
> 
> Using 0x05 instead of : is exactly the same, except that 0x05 is a bit
> less likely to appear in the actual input.
> 
> > , while echoing \"$'\t2\t4'\"
> > return "       2       4" correctly.
> 
> What are you trying to do, exactly?

I'm trying to learn something more, for the sake of knowledge.

Thank you very much Greg, for your patience an your thorough
explanation.

-- 
Felipe Salvador



reply via email to

[Prev in Thread] Current Thread [Next in Thread]