[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#29902] [PATCH] gnu: Add html-xml-utils.
From: |
Stefan Reichör |
Subject: |
[bug#29902] [PATCH] gnu: Add html-xml-utils. |
Date: |
Sun, 31 Dec 2017 09:22:52 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Hi Catonano!
Thanks for your review.
> Hi Stefan !
>
> Thanks for contributing !
>
> I linted your patch and I get
>
> gnu/packages/xml.scm:1120:1: address@hidden: line 1153 is way too long
> (96 characters)
I fixed this.
> Also, I couldn't run
>
> ./pre-inst-env guix build --rounds=2 html-xml-utils
>
> it just returns the store item as I had already built it without thinking
> :-/
>
> Apart from this, I'd say it's ok
>
> It builds. I didn't try to run any of these commands.
>
> Can you suggest me a command line and a set of html files to test them ?
I am not aware of a lot of documentation with examples for these tools.
Here is some stuff I found on the web:
http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
https://superuser.com/questions/528709/command-line-css-selector-tool
https://www.joyofdata.de/blog/using-linux-shell-web-scraping/
This is a command line that I use to extract links from h2 elements:
cat ~/tmp/document.html | hxnormalize -x | hxselect -i h2 | hxwls
> Well this is just to be super scrupolous, anyway. If you say this works, I
> believe you
>
> So, as far as I'm concerned: lgtm !
Below is the corrected patch (I added the missing copyright line as well)
* gnu/packages/xml.scm (html-xml-utils): New variable.
---
gnu/packages/xml.scm | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..548cd1a 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -18,6 +18,7 @@
;;; Copyright © 2017 Gregor Giesen <address@hidden>
;;; Copyright © 2017 Alex Vong <address@hidden>
;;; Copyright © 2017 Petter <address@hidden>
+;;; Copyright © 2017 Stefan Reichör <address@hidden>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -1116,6 +1117,61 @@ match and extract data, and elements can be added,
deleted or modified using
XSLT and EXSLT.")
(license license:x11)))
+(define-public html-xml-utils
+ (package
+ (name "html-xml-utils")
+ (version "7.4")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (string-append
+ "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+ version ".tar.gz"))
+ (sha256
+ (base32
+ "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+ (build-system gnu-build-system)
+ (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+ (synopsis "Command line utilities to manipulate HTML and XML files")
+ (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways. The suite
+consists of the following tools:
+
address@hidden
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements
+ to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or
AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an
HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to
@code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
address@hidden itemize
+")
+ (license license:expat)))
+
(define-public xlsx2csv
(package
(name "xlsx2csv")