[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Rounding floats from 64bit to 32bit (double to single) with 0.5 rule
From: |
hale812 |
Subject: |
Rounding floats from 64bit to 32bit (double to single) with 0.5 rule |
Date: |
Sun, 25 Dec 2016 23:25:48 -0800 (PST) |
Seems like single() function truncates IEEE 754 double float by simply
omitting irrelevant bits.
This however becomes a problem of error accumulation, when converting data
for 32bit DSP with a long path of computation.
For better results, the number should be rounded to Sgn1Exp8/Sig23 in binary
representation before truncating.
Is there a tool for Octave for rounded conversion to Single; or just binary
rounding(while maintaining irrelevant bits as zeroes in Double numbers) ?
--
View this message in context:
http://octave.1599824.n4.nabble.com/Rounding-floats-from-64bit-to-32bit-double-to-single-with-0-5-rule-tp4681146.html
Sent from the Octave - General mailing list archive at Nabble.com.
- Rounding floats from 64bit to 32bit (double to single) with 0.5 rule,
hale812 <=