guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Add Blast+.


From: Ricardo Wurmus
Subject: Re: [PATCH] Add Blast+.
Date: Tue, 23 Jun 2015 10:06:23 +0200

Hi Mark,

thank you for the review!

>> +     `(;; There are three(!) tests for this massive library, and all fail 
>> with
>> +       ;; "unparsable timing stats".
>> +       ;; ERR [127] --  [util/regexp] test_pcre.sh     (unparsable timing 
>> stats)
>> +       ;; ERR [127] --  [serial/datatool] datatool.sh     (unparsable 
>> timing stats)
>> +       ;; ERR [127] --  [serial/datatool] datatool_xml.sh     (unparsable 
>> timing stats)
>> +       #:tests? #f
>
> Just a guess, but maybe this is because you replaced "/bin/date" with
> "echo -n 0".  How about replacing it with "date -d @0" instead?

I tried that but I still get the same problem.  The test script is
generated from a template in common/check/check_make_unix.sh.  We are
substituting "/var/tmp" for "/tmp" and the "(/usr)/bin" prefix, but this
should not have an impact on the functionality of the generated script.

I'm not sure why it's failing -- it works fine in "guix environment".
It's very time-consuming to recompile the whole thing until the tests
are reached.

> It would be great to get the tests working, even if we have to disable
> some of them.  Otherwise we have no way of knowing that we're not
> distributing broken garbage :)

It's hard to know in this case even with the tests, because it's only
three tests and they seem hardly representative of the library.

>> +
>> +            ;; rewrite "/var/tmp" in check script
>> +            (substitute* "scripts/common/check/check_make_unix.sh"
>> +              (("/var/tmp") (string-append (getcwd) "/build/build")))
>
> Or maybe just "/tmp" ?

Yes, that also works.

> All of these plus the ones for 'sh' could be combined into something
> like this: (untested)

[...]

This works great!  Thank you.

>> +              (("^ *PATH=.*") "")
>> +              (("action=/bin/") "action=")
>> +              (("export PATH") "echo -n 0"))
>
> Why "echo -n 0" here?  Maybe ":" would be better?  It is a no-op
> built-in command in Bourne shell.

Yes, this works.  I needed to replace it with a no-op because "export
PATH" is also found in the middle of a long chain of commands, so
replacing it with the empty string results in a syntax error.

> Is everything in here really in the public domain?  I'd guess that in
> order to make this true, you'd need to remove bzip2 and zlib in a
> snippet, and even then I'd doubtful :)

I've moved the code to remove the bundled stuff to a snippet.

The NCBI code was released into the public domain.  However, it appears
that some third-party headers and some build scripts are under a
different license:

  * Expat:
    * ncbi-blast-2.2.30+-src/c++/include/util/bitset/
    * ncbi-blast-2.2.30+-src/c++/src/html/ncbi_menu*.js
  * Boost license:
    * ncbi-blast-2.2.30+-src/c++/include/util/impl/floating_point_comparison.hpp
  * LGPL 2+:
    * ncbi-blast-2.2.30+-src/c++/include/dbapi/driver/odbc/unix_odbc/
  * ASL 2.0:
    * ncbi-blast-2.2.30+-src/c++/src/corelib/teamcity_*

I could not find mention of any other licenses.  Is this correct, then:

    ;; Most of the sources are in the public domain, with the following 
exceptions:
    ;; ...(the above list)...
    (license (list license:public-domain
                   license:expat
                   license:boost1.0
                   license:lgpl2.0+
                   license:asl2.0))

What do you think?

~~ Ricardo

>From 8d131f66ba0378738e5b837f78c411edb241d35a Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <address@hidden>
Date: Tue, 16 Jun 2015 16:24:24 +0200
Subject: [PATCH] gnu: Add Blast+.

* gnu/packages/bioinformatics.scm (blast+): New variable.
---
 gnu/packages/bioinformatics.scm | 155 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)

diff --git a/gnu/packages/bioinformatics.scm b/gnu/packages/bioinformatics.scm
index 3defac8..4603329 100644
--- a/gnu/packages/bioinformatics.scm
+++ b/gnu/packages/bioinformatics.scm
@@ -31,6 +31,7 @@
   #:use-module (gnu packages base)
   #:use-module (gnu packages boost)
   #:use-module (gnu packages compression)
+  #:use-module (gnu packages cpio)
   #:use-module (gnu packages file)
   #:use-module (gnu packages java)
   #:use-module (gnu packages linux)
@@ -258,6 +259,160 @@ into separate processes; and more.")
     (inputs
      `(("python2-numpy" ,python2-numpy)))))
 
+(define-public blast+
+  (package
+    (name "blast+")
+    (version "2.2.30")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/";
+                    version "/ncbi-blast-" version "+-src.tar.gz"))
+              (sha256
+               (base32
+                "0h0fj5cpx6zpfwixgx5f5xbr4rn3cnai0x3j7grrg50vr18jvxr6"))
+              (modules '((guix build utils)))
+              (snippet
+               '(begin
+                  ;; Remove bundled bzip2 and zlib
+                  (delete-file-recursively "c++/src/util/compress/bzip2")
+                  (delete-file-recursively "c++/src/util/compress/zlib")
+                  (substitute* "c++/src/util/compress/Makefile.in"
+                    (("bzip2 zlib api") "api"))
+                  ;; Remove useless msbuild directory
+                  (delete-file-recursively
+                   "c++/src/build-system/project_tree_builder/msbuild")))))
+    (build-system gnu-build-system)
+    (arguments
+     `(;; There are three(!) tests for this massive library, and all fail with
+       ;; "unparsable timing stats".
+       ;; ERR [127] --  [util/regexp] test_pcre.sh     (unparsable timing 
stats)
+       ;; ERR [127] --  [serial/datatool] datatool.sh     (unparsable timing 
stats)
+       ;; ERR [127] --  [serial/datatool] datatool_xml.sh     (unparsable 
timing stats)
+       #:tests? #f
+       #:out-of-source? #t
+       #:parallel-build? #f ; not supported
+       #:phases
+       (modify-phases %standard-phases
+         (add-before
+          'configure 'set-HOME
+          ;; $HOME needs to be set at some point during the configure phase
+          (lambda _ (setenv "HOME" "/tmp") #t))
+         (add-after
+          'unpack 'enter-dir
+          (lambda _ (chdir "c++") #t))
+         (add-after
+          'enter-dir 'fix-build-system
+          (lambda _
+            (define (which* cmd)
+              (cond ((string=? cmd "date")
+                     ;; make call to "date" deterministic
+                     "date -d @0")
+                    ((which cmd)
+                     => identity)
+                    (else
+                     (format (current-error-port)
+                             "WARNING: Unable to find absolute path for ~s~%"
+                             cmd)
+                     #f)))
+
+            ;; Proceed even though the weird build system says that generated
+            ;; files are out of date
+            (setenv "NCBICXX_RECONF_POLICY" "warn")
+
+            ;; Rewrite hardcoded paths to various tools
+            (substitute* (append (find-files "scripts/common/check" "\\.sh$")
+                                 '("scripts/common/impl/if_diff.sh"
+                                   "scripts/common/impl/run_with_lock.sh"
+                                   
"src/build-system/Makefile.configurables.real"
+                                   "src/build-system/Makefile.in.top"
+                                   "src/build-system/Makefile.meta.gmake=no"
+                                   "src/build-system/Makefile.meta.in"
+                                   "src/build-system/Makefile.meta_l"
+                                   "src/build-system/Makefile.meta_p"
+                                   "src/build-system/Makefile.meta_r"
+                                   "src/build-system/Makefile.mk.in"
+                                   "src/build-system/Makefile.requirements"
+                                   
"src/build-system/Makefile.rules_with_autodep.in"
+                                   "src/build-system/configure"
+                                   "src/build-system/configure.ac"))
+              (("(/usr/bin/|/bin/)([a-z][-_.a-z]*)" all dir cmd)
+               (or (which* cmd) all)))
+
+            ;; Some of the files we're patching are
+            ;; ISO-8859-1-encoded, so choose it as the default
+            ;; encoding so the byte encoding is preserved.
+            (with-fluids ((%default-port-encoding #f))
+              (substitute* (find-files "src/build-system" "^config.*")
+                (("LN_S=/bin/\\$LN_S") (string-append "LN_S=" (which "ln")))
+                (("^PATH=.*") "")))
+
+            ;; rewrite "/var/tmp" in check script
+            (substitute* "scripts/common/check/check_make_unix.sh"
+              (("/var/tmp") "/tmp"))
+
+            ;; do not reset PATH
+            (substitute* (find-files "scripts/common/impl/" "\\.sh$")
+              (("^ *PATH=.*") "")
+              (("action=/bin/") "action=")
+              (("export PATH") ":"))
+            #t))
+         (replace
+          'configure
+          (lambda* (#:key inputs outputs #:allow-other-keys)
+            (let ((out     (assoc-ref outputs "out"))
+                  (lib     (string-append (assoc-ref outputs "lib") "/lib"))
+                  (include (string-append (assoc-ref outputs "include")
+                                          "/include/ncbi-tools++")))
+              ;; The 'configure' script doesn't recognize things like
+              ;; '--enable-fast-install'.
+              (zero? (system* "./configure.orig"
+                              (string-append "--with-build-root=" (getcwd) 
"/build")
+                              (string-append "--prefix=" out)
+                              (string-append "--libdir=" lib)
+                              (string-append "--includedir=" include)
+                              (string-append "--with-bz2="
+                                             (assoc-ref inputs "bzip2"))
+                              (string-append "--with-z="
+                                             (assoc-ref inputs "zlib"))
+                              ;; Each library is built twice by default, once
+                              ;; with "-static" in its name, and again
+                              ;; without.
+                              "--without-static"
+                              "--with-dll"))))))))
+    (outputs '("out"       ;  19 MB
+               "lib"       ; 203 MB
+               "include")) ;  32 MB
+    (inputs
+     `(("bzip2" ,bzip2)
+       ("zlib" ,zlib)))
+    (native-inputs
+     `(("cpio" ,cpio)))
+    (home-page "http://blast.ncbi.nlm.nih.gov";)
+    (synopsis "Basic local alignment search tool")
+    (description
+     "BLAST is a popular method of performing a DNA or protein sequence
+similarity search, using heuristics to produce results quickly.  It also
+calculates an “expect value” that estimates how many matches would have
+occurred at a given score by chance, which can aid a user in judging how much
+confidence to have in an alignment.")
+    ;; Most of the sources are in the public domain, with the following
+    ;; exceptions:
+    ;;   * Expat:
+    ;;     * ./c++/include/util/bitset/
+    ;;     * ./c++/src/html/ncbi_menu*.js
+    ;;   * Boost license:
+    ;;     * ./c++/include/util/impl/floating_point_comparison.hpp
+    ;;   * LGPL 2+:
+    ;;     * ./c++/include/dbapi/driver/odbc/unix_odbc/
+    ;;   * ASL 2.0:
+    ;;     * ./c++/src/corelib/teamcity_*
+    (license (list license:public-domain
+                   license:expat
+                   license:boost1.0
+                   license:lgpl2.0+
+                   license:asl2.0))))
+
 (define-public bowtie
   (package
     (name "bowtie")
-- 
2.1.0


reply via email to

[Prev in Thread] Current Thread [Next in Thread]