[bug#57151] [PATCH 2/2] gnu: tesseract-ocr: Make the default install min

guix-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#57151] [PATCH 2/2] gnu: tesseract-ocr: Make the default install min

From:	Maxim Cournoyer
Subject:	[bug#57151] [PATCH 2/2] gnu: tesseract-ocr: Make the default install minimally useful.
Date:	Fri, 12 Aug 2022 01:07:52 -0400

* gnu/packages/ocr.scm (tesseract-ocr)
[phases]{adjust-TESSDATA_PREFIX-macro}: New phase.
{install-minimal-tessdata}: New phase.
[native-inputs]: Add tesseract-ocr-tessdata-fast.
[search-paths]: New field.
[description]: Mention how to add support for more languages.
---
 gnu/packages/ocr.scm | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/gnu/packages/ocr.scm b/gnu/packages/ocr.scm
index e2c9f561cc..21d257ef24 100644
--- a/gnu/packages/ocr.scm
+++ b/gnu/packages/ocr.scm
@@ -132,6 +132,15 @@ (define-public tesseract-ocr
               (substitute* "configure.ac"
                 (("AC_SUBST\\(\\[XML_CATALOG_FILES])")
                  ""))))
+          (add-after 'unpack 'adjust-TESSDATA_PREFIX-macro
+            (lambda _
+              ;; Use a deeper TESSDATA_PREFIX hierarchy so that a more
+              ;; specific search-path than '/share' can be specified.  The
+              ;; build system uses CPPFLAGS for itself, so we can't simply set
+              ;; a make flag.
+              (substitute* "Makefile.am"
+                (("-DTESSDATA_PREFIX='\"@datadir@\"'")
+                 "-DTESSDATA_PREFIX='\"@datadir@/tesseract-ocr\"'"))))
           (add-after 'build 'build-training
             (lambda* (#:key parallel-build? #:allow-other-keys)
               (define n (if parallel-build? (number->string
@@ -140,7 +149,18 @@ (define n (if parallel-build? (number->string
               (invoke "make" "-j" n "training")))
           (add-after 'install 'install-training
             (lambda _
-              (invoke "make" "training-install"))))))
+              (invoke "make" "training-install")))
+          (add-after 'install 'install-minimal-tessdata
+            ;; tesseract-ocr cannot be used without its trained models data;
+            ;; install the English language as a minimal base which can be
+            ;; extended via TESSDATA_PREFIX.
+            (lambda* (#:key native-inputs inputs #:allow-other-keys)
+              (define eng.traineddata
+                "/share/tesseract-ocr/tessdata/eng.traineddata")
+              (install-file (search-input-file (or native-inputs inputs)
+                                               eng.traineddata)
+                            (dirname (string-append #$output
+                                                    eng.traineddata))))))))
     (native-inputs
      (list asciidoc
            autoconf
@@ -152,13 +172,18 @@ (define n (if parallel-build? (number->string
            libtool
            libxml2                      ;for XML_CATALOG_FILES
            libxslt
-           pkg-config))
+           pkg-config
+           tesseract-ocr-tessdata-fast))
     (inputs
      (list cairo
            icu4c
            leptonica
            pango
            python-wrapper))
+    (native-search-paths (list (search-path-specification
+                                (variable "TESSDATA_PREFIX")
+                                (files (list "share/tesseract-ocr/tessdata"))
+                                (separator #f)))) ;single value
     (home-page "https://github.com/tesseract-ocr/tesseract";)
     (synopsis "Optical character recognition engine")
     (description
@@ -166,7 +191,9 @@ (define n (if parallel-build? (number->string
 high accuracy.  It supports many languages, output text formatting, hOCR
 positional information and page layout analysis.  Several image formats are
 supported through the Leptonica library.  It can also detect whether text is
-monospaced or proportional.")
+monospaced or proportional.  Support for the English language is included by
+default.  To add support for more languages, the
+@code{tesseract-ocr-tessdata-fast} package should be installed.")
     (license license:asl2.0)))
 
 (define-public gimagereader
-- 
2.36.1

[Prev in Thread]

Current Thread

[Next in Thread]

[bug#57151] [PATCH 0/2] *** Add trained data models for Tesseract OCR ***, Maxim Cournoyer, 2022/08/12
- [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast., Maxim Cournoyer, 2022/08/12
  - [bug#57151] [PATCH 2/2] gnu: tesseract-ocr: Make the default install minimally useful., Maxim Cournoyer <=
  - [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast., Simon South, 2022/08/12
    - [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast., Maxim Cournoyer, 2022/08/12
    - Message not available
    - bug#57151: [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast., Maxim Cournoyer, 2022/08/12

Prev by Date: [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast.
Next by Date: [bug#57126] [PATCH v2] gnu: Add espeakup.
Previous by thread: [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast.
Next by thread: [bug#57151] [PATCH 1/2] gnu: Add tesseract-ocr-tessdata-fast.
Index(es):
- Date
- Thread