bug-gnu-pspp
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

PSPP-BUG: [bug #61809] Import dialog: Preview shows correct data but imp


From: Ben Pfaff
Subject: PSPP-BUG: [bug #61809] Import dialog: Preview shows correct data but import results in empty data view (Text Delimeter)
Date: Sun, 10 Apr 2022 22:29:38 -0400 (EDT)

Follow-up Comment #3, bug #61809 (project pspp):

The assistant also guesses the wrong delimiters for this file.  The heuristic
that the code uses to guess delimiters is to count the number of times each
potential delimiter occurs on each line, then select the delimiter whose
occurrences are most consistent.  The idea is that, if some delimiter always
occurs the same number of times, then it's probably the delimiter, even if it
didn't occur the greatest number of times overall.

For this file, by applying the following patch:


diff --git a/src/ui/gui/psppire-import-textfile.c
b/src/ui/gui/psppire-import-textfile.c
index f73280b66b..a58cddb235 100644
--- a/src/ui/gui/psppire-import-textfile.c
+++ b/src/ui/gui/psppire-import-textfile.c
@@ -145,6 +145,8 @@ choose_likely_separators (PsppireImportAssistant *ia)
           struct separator_count_node *next;
           HMAP_FOR_EACH_SAFE (cn, next, struct separator_count_node, node,
&count_map[j])
             {
+              printf ("%d occurrences of %s happened %d times\n",
+                      cn->occurance, separators[j].name, cn->quantity);
               if (largest < cn->quantity)
                 {
                   largest = cn->quantity;


we see that colon always occurs the same number of times (6 times on every
line) and thus it gets chosen. The file has a lot more semicolons but the
number varies slightly between 87 and 88:


14 occurrences of space happened 2 times
19 occurrences of space happened 2 times
15 occurrences of space happened 1 times
6 occurrences of colon happened 5 times
12 occurrences of comma happened 1 times
9 occurrences of comma happened 2 times
7 occurrences of comma happened 2 times
12 occurrences of hyphen happened 2 times
11 occurrences of hyphen happened 3 times
88 occurrences of semicolon happened 2 times
87 occurrences of semicolon happened 3 times
4 occurrences of slash happened 3 times
5 occurrences of slash happened 2 times


Maybe we can find a better heuristic.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?61809>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]