[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
quoted fields and decimal separators
From: |
code |
Subject: |
quoted fields and decimal separators |
Date: |
Mon, 03 Jan 2022 08:49:07 +0100 |
User-agent: |
Roundcube Webmail/1.2.3 |
Hello,
Thanks for datamash, it totally made my day. I am trying to proof to my
wife that I can do on the commandline what she can do on a spradsheet
and datamash has been of great help.
I have run into difficulties when analyzing some bank statements
unfortunately.
1. They are automatically generated by the bank and therefore all fields
are quoted
(even fields that contain pure numbers).
2. Also (since it is a german bank) it uses "," for a decimal separator
instead of ".".
This is not a german problem though, many countries use commas
instead of dots (see
https://en.wikipedia.org/wiki/Decimal_separator#Usage_worldwide )
At the end of this mail you will find some example csvs to illustrate
the point.
I was able to solve my problems with some sed/tr magic (I just stripped
away the quoting and replaced all ","-characters.)
But it was neither easy, fast, elegant nor stable against more
complicated input.
It would probably have failed completely if the text fields had
contained any of the following characters: ,;"
I therefore would like to see the following features in datamash:
1. allow to specify a quoting character for fields (default none)
2. allow to specify a character that is then used as a decimal separator
(ideally in both input and output)
I would like to hear your thoughts on this.
Thanks a lot,
Johannes
Original Format of the bank statement (heavily censored so that I am
comfortable sharing it, but still formatted like a MT940 CSV).
What I want to do is this to sum up the "Betrag" column, gouped bt the
"Buchungstext": datamash -H groupby 4 sum 9
"Auftragskonto";"Buchungstag";"Valutadatum";"Buchungstext";"Verwendungszweck";"Beguenstigter/Zahlungspflichtiger";"Kontonummer";"BLZ";"Betrag";"Waehrung";"Info"
"DE123456";"07.12.21";"07.12.21";"BARGELDAUSZAHLUNG";"07.12/12.05UHR
Somewhere";"GA NR00000000 BLZ123456
0";"";"7654321";"-50,00";"EUR";"Umsatz gebucht"
"DE123456";"06.12.21";"06.12.21";"GUTSCHR.
UEBERWEISUNG";"SVWZ+Taschengeld";"Johannes";"DE654321";"G123456";"10,00";"EUR";"Umsatz
gebucht"
"DE123456";"03.12.21";"03.12.21";"KARTENZAHLUNG";"SVWZ+2021-12-01T08.37
Something";"VERKEHRS AG
SOMEWHERE";"DE123456";"ZZZYYYXXX";"-21,98";"EUR";"Umsatz gebucht"
"DE123456";"01.12.21";"01.12.21";"ENTGELTABSCHLUSS";"Entgeltabrechnungsiehe
Anlage";"";"";"7654321";"-4,90";"EUR";"Umsatz gebucht"
"DE123456";"04.11.21";"04.11.21";"GUTSCHR.
UEBERWEISUNG";"SVWZ+Taschengeld";"Johannes";"DE654321";"G123456";"10,00";"EUR";"Umsatz
gebucht"
"DE123456";"02.11.21";"01.11.21";"DAUERAUFTRAG";"SVWZ+Spende Lage der
Nation";"Lage der Nation Media GmbH& Co.
KG";"NL123456";"BUNABCDEF";"-5,00";"EUR";"Umsatz gebucht"
"DE123456";"02.11.21";"01.11.21";"DAUERAUFTRAG";"SVWZ+Spende
Qutebrowser";"Qutebrowser";"CH123456";"POABCDEF";"-5,00";"EUR";"Umsatz
gebucht"
"DE123456";"02.11.21";"01.11.21";"DAUERAUFTRAG";"SVWZ+Spende
Netzpolitik.org";"netzpolitik.org e.
V.";"DE123456";"GENABCDEF";"-7,50";"EUR";"Umsatz gebucht"
"DE123456";"02.11.21";"30.10.21";"ENTGELTABSCHLUSS";"Entgeltabrechnungsiehe
Anlage";"";"";"123456";"-4,90";"EUR";"Umsatz gebucht"
A more minimal example (I want datamash -H groupby 1 sum 2):
"method";"money_difference"
"transfer";"10,00"
"credit card";"-4,90"
"transfer";"10,00"
An evil example that makes scripting around more difficult (I want
datamash -H groupby 2 sum 3):
"evil_string";"method";"money_difference"
"say "hello"";"transfer, immediately";"10,00"
";";"credit card, at some point";"-4,90"
",";"transfer, immediately";"10,00"
- quoted fields and decimal separators,
code <=