guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with sxml simple parser for the quicklisp importer


From: swedebugia
Subject: Re: Help with sxml simple parser for the quicklisp importer
Date: Wed, 23 Jan 2019 17:32:20 +0100

On 2019-01-23 16:58, Ricardo Wurmus wrote:

swedebugia <address@hidden> writes:

The second “link” tag opens but is never closed.  This may be valid
HTML, but it is not valid XML, which is what xml->sxml expects.

Thanks for the quick answer!
I will try to remove this line before handling over to the parser.

I would recommend looking for a better source of package information.
Parsing HTML is not fun and is often brittle.

I understand. Hm. Will try asking the author.

Got a little further. Added this:

(define (sanitize-html html)
  "Correct an offending invalid line from the html source"
  (let* ((html1 (regexp-substitute #f (string-match "main.css\">" html)
                                   'pre "main.css\" />" 'post))
         (result (regexp-substitute #f (string-match "utf-8\">" html1)
                                    'pre "utf-8\" />" 'post)))
    result))

Which results in a new error:

Starting download of /tmp/guix-file.uAoKMD
From http://quickdocs.org/1am/...
1am/ 7KiB 2.0MiB/s 00:00 [##################] 100.0%
Backtrace:
          13 (apply-smob/1 #<catch-closure 17a84e0>)
In ice-9/boot-9.scm:
705:2 12 (call-with-prompt _ _ #<procedure default-prompt-handler (k proc)>)
In ice-9/eval.scm:
    619:8 11 (_ #(#(#<directory (guile-user) 18cc140>)))
In ice-9/boot-9.scm:
   2312:4 10 (save-module-excursion _)
  3831:12  9 (_)
In guix/import/quicklisp.scm:
    239:9  8 (_)
In guix/utils.scm:
618:8 7 (call-with-temporary-output-file #<procedure 305f440 at guix/import/quicklisp.scm:236:3 (temp port)>)
In sxml/simple.scm:
143:4 6 (xml->sxml _ #:namespaces _ #:declare-namespaces? _ #:trim-whitespace? _ #:entities _ #:default-entity-handler _ # _)
    143:4  5 (loop #<input: string 24fdaf0> () #f _)
    143:4  4 (loop #<input: string 24fdaf0> () #f _)
    143:4  3 (loop #<input: string 24fdaf0> () #f _)
    143:4  2 (loop #<input: string 24fdaf0> () #f _)
    143:4  1 (loop #<input: string 24fdaf0> () #f _)
    143:4  0 (loop #<input: string 24fdaf0> () #f _)

sxml/simple.scm:143:4: In procedure loop:
Throw to key `parser-error' with args `(#<input: string 24fdaf0> "[wf-entdeclared] broken for " copy)'.

Any ideas?

--
Cheers Swedebugia

Attachment: quicklisp.scm
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]