[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help with sxml simple parser for the quicklisp importer
From: |
swedebugia |
Subject: |
Re: Help with sxml simple parser for the quicklisp importer |
Date: |
Wed, 23 Jan 2019 17:32:20 +0100 |
On 2019-01-23 16:58, Ricardo Wurmus wrote:
swedebugia <address@hidden> writes:
The second “link” tag opens but is never closed. This may be valid
HTML, but it is not valid XML, which is what xml->sxml expects.
Thanks for the quick answer!
I will try to remove this line before handling over to the parser.
I would recommend looking for a better source of package information.
Parsing HTML is not fun and is often brittle.
I understand. Hm. Will try asking the author.
Got a little further. Added this:
(define (sanitize-html html)
"Correct an offending invalid line from the html source"
(let* ((html1 (regexp-substitute #f (string-match "main.css\">" html)
'pre "main.css\" />" 'post))
(result (regexp-substitute #f (string-match "utf-8\">" html1)
'pre "utf-8\" />" 'post)))
result))
Which results in a new error:
Starting download of /tmp/guix-file.uAoKMD
From http://quickdocs.org/1am/...
1am/ 7KiB 2.0MiB/s 00:00
[##################] 100.0%
Backtrace:
13 (apply-smob/1 #<catch-closure 17a84e0>)
In ice-9/boot-9.scm:
705:2 12 (call-with-prompt _ _ #<procedure default-prompt-handler
(k proc)>)
In ice-9/eval.scm:
619:8 11 (_ #(#(#<directory (guile-user) 18cc140>)))
In ice-9/boot-9.scm:
2312:4 10 (save-module-excursion _)
3831:12 9 (_)
In guix/import/quicklisp.scm:
239:9 8 (_)
In guix/utils.scm:
618:8 7 (call-with-temporary-output-file #<procedure 305f440 at
guix/import/quicklisp.scm:236:3 (temp port)>)
In sxml/simple.scm:
143:4 6 (xml->sxml _ #:namespaces _ #:declare-namespaces? _
#:trim-whitespace? _ #:entities _ #:default-entity-handler _ # _)
143:4 5 (loop #<input: string 24fdaf0> () #f _)
143:4 4 (loop #<input: string 24fdaf0> () #f _)
143:4 3 (loop #<input: string 24fdaf0> () #f _)
143:4 2 (loop #<input: string 24fdaf0> () #f _)
143:4 1 (loop #<input: string 24fdaf0> () #f _)
143:4 0 (loop #<input: string 24fdaf0> () #f _)
sxml/simple.scm:143:4: In procedure loop:
Throw to key `parser-error' with args `(#<input: string 24fdaf0>
"[wf-entdeclared] broken for " copy)'.
Any ideas?
--
Cheers Swedebugia
quicklisp.scm
Description: Text Data