[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] Is there a way to read the first empty field in a TSV in
From: |
Felipe Salvador |
Subject: |
Re: [Help-bash] Is there a way to read the first empty field in a TSV input? |
Date: |
Sat, 7 Oct 2017 16:39:34 +0200 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
On Fri, Oct 06, 2017 at 08:51:34AM -0400, Greg Wooledge wrote:
> On Thu, Oct 05, 2017 at 11:37:35PM +0200, Felipe Salvador wrote:
> > On Thu, Sep 28, 2017 at 12:05:10PM -0400, Greg Wooledge wrote:
> > > 2) Convert the separators into some other character that bash treats
> > > the way you want.
> > >
> > > For example:
> > >
> > > IFS=$'\005' read -r a b c < <(tr $'\t' $'\005' <<< $'\tx\ty')
> >
> >
> > IFS=$'\005' read -r A B C D < <(tr $'\t' $'\005' <<< $'\t2\t4')
> >
> > Hi,
> > I'm a bit confused, is IFS=$'\005' treated as a field?
> > If so I would expect $A= ,$B=2,$C= ,$D=4 an so on...
>
> IFS is a list of characters that may act as field separators/terminators.
> In this example, IFS has been set to the single character 0x05 (Ctrl-E,
> or ASCII "ENQ"). 0x05 is the only charater that 'read' will use to
> separate input fields.
>
> The input that's sent to read is a stream of 5 characters:
>
> 0x05 2 0x05 4 \n
>
> Therefore 'read' will split it into fields as follows:
>
> first field empty
> second field '2'
> third field '4'
>
> > But I get $A= ,$B=2,$C=4,$D=
>
> That's correct.
>
> Remember, the entire PURPOSE of this example was to take a tab-separated-
> value input file and parse it in such a way that there can be empty
> fields before/between the tabs.
>
> The way bash normally handles tabs in IFS is to consider a sequence of
> multiple tabs as a single field separator, and to ignore leading and
> trailing tabs.
I was wrongly considering $'\005' as a field rather than a separator,
as below:
$'\005'| 2 |$'\005'| 4
Now I get it:
$IFS=$'\005' read -r A B C D E F G < <(tr $'\t' $'\005' <<< $'\t2\t\t4\t\t6\t\t'
$ declare -p A B C D E F G
declare -- A=""
declare -- B="2"
declare -- C=""
declare -- D="4"
declare -- E=""
declare -- F="6"
declare -- G=""
> The OP wanted each tab to be significant, as if they were commas or
> pipe signs or colons.
>
> The proposed workaround is to transform the tabs into something that
> isn't treated as whitespace by bash.
>
> If this example isn't clicking for you, then let's try colons instead
> of 0x05 characters. (This can't be used if there are possibly colons
> in the actual input data.)
>
> wooledg:~$ tr '\t' : <<< $'\t2\t4'
> :2:4
> wooledg:~$ IFS=: read -r a b c d < <(tr '\t' : <<< $'\t2\t4')
> wooledg:~$ declare -p a b c d
> declare -- a=""
> declare -- b="2"
> declare -- c="4"
> declare -- d=""
>
> Using 0x05 instead of : is exactly the same, except that 0x05 is a bit
> less likely to appear in the actual input.
>
> > , while echoing \"$'\t2\t4'\"
> > return " 2 4" correctly.
>
> What are you trying to do, exactly?
I'm trying to learn something more, for the sake of knowledge.
Thank you very much Greg, for your patience an your thorough
explanation.
--
Felipe Salvador