[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] Narrow UTF-16 and UTF-32 to ASCII when loading .mat files.
From: |
Jason Riedy |
Subject: |
[PATCH] Narrow UTF-16 and UTF-32 to ASCII when loading .mat files. |
Date: |
Thu, 27 Sep 2007 16:12:29 -0700 |
User-agent: |
Gnus/5.110007 (No Gnus v0.7) Emacs/23.0.50 (gnu/linux) |
This is somewhat nasty, but something is necessary for loading
the UF sparse matrix collection. Rather than play with iconv
and "true" conversions, just carry along the convention of
replacing out-of-ASCII-range entries with '?'.
Signed-off-by: Jason Riedy <address@hidden>
---
Note: With this patch, the main CVS branch can load every
matrix from the UF collection using Dr. Davis's UFget
interface. Kinda handy.
src/ls-mat5.cc | 18 ++++++++++++++++--
1 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/src/ls-mat5.cc b/src/ls-mat5.cc
index a9128ed..57e10e7 100644
--- a/src/ls-mat5.cc
+++ b/src/ls-mat5.cc
@@ -140,6 +140,7 @@ read_mat5_binary_data (std::istream& is, double *data,
read_doubles (is, data, LS_SHORT, count, swap, flt_fmt);
break;
+ case miUTF16:
case miUINT16:
read_doubles (is, data, LS_U_SHORT, count, swap, flt_fmt);
break;
@@ -148,6 +149,7 @@ read_mat5_binary_data (std::istream& is, double *data,
read_doubles (is, data, LS_INT, count, swap, flt_fmt);
break;
+ case miUTF32:
case miUINT32:
read_doubles (is, data, LS_U_INT, count, swap, flt_fmt);
break;
@@ -1251,8 +1253,20 @@ read_mat5_binary_element (std::istream& is, const
std::string& filename,
{
if (type == miUTF16 || type == miUTF32)
{
- error ("load: can not read Unicode UTF16 and UTF32 encoded
characters");
- goto data_read_error;
+ bool found_big_char = false;
+ for (int i = 0; i < n; i++)
+ {
+ if (re(i) > 127) {
+ re(i) = '?';
+ found_big_char = true;
+ }
+ }
+
+ if (found_big_char)
+ {
+ warning ("load: can not read non-ASCII portions of UTF
characters.");
+ warning (" Replacing unreadable characters with
'?'.");
+ }
}
else if (type == miUTF8)
{
--
1.5.3.2
- [PATCH] Narrow UTF-16 and UTF-32 to ASCII when loading .mat files.,
Jason Riedy <=