When I first started poking around the datamash source, I was surprised to see it uses long double for numbers internally instead of double. On x86 systems with an 80 bit long double type, this means that operations have to go through the old x87 FPU, making it impossible for compilers to use SSE/AVX instructions and potentially vectorize loops. On 64-bit ARM, I think it ends up using software emulation of a 128 bit type, which is even worse. I'm not really convinced the extra precision is worth it...
What's the reasoning for using long double?