[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/eev 92702c7 49/64: Made `find-pdf-text' ignore spurious
From: |
Stefan Monnier |
Subject: |
[elpa] externals/eev 92702c7 49/64: Made `find-pdf-text' ignore spurious formfeeds. |
Date: |
Sun, 7 Apr 2019 16:59:11 -0400 (EDT) |
branch: externals/eev
commit 92702c742df913d4094cdffe60640a3917bceca5
Author: Eduardo Ochs <address@hidden>
Commit: Eduardo Ochs <address@hidden>
Made `find-pdf-text' ignore spurious formfeeds.
---
ChangeLog | 18 ++++-
VERSION | 4 +-
eev-codings.el | 26 ++++++-
eev-intro.el | 235 ++++++++++++++++++++++++++++++++++-----------------------
eev-pdflike.el | 32 ++++++--
eev-wrap.el | 2 +
6 files changed, 208 insertions(+), 109 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index e153724..9d620b1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,10 +1,24 @@
+2019-03-05 Eduardo Ochs <address@hidden>
+
+ * eev-intro.el (find-eev-quick-intro): added material in the
+ sections about links to PDFs.
+
+2019-03-04 Eduardo Ochs <address@hidden>
+
+ * eev-pdflike.el (ee-pdftotext-replace-bad-ffs): new function.
+ (find-sh-page): use `find-callprocess00' and
+ `ee-pdftotext-replace-bad-ffs'.
+ (ee-find-pdf-text): return a list instead of a string.
+ (ee-find-pdftotext-text): return a list instead of a string.
+
2019-03-03 Eduardo Ochs <address@hidden>
+ * eejump.el: rewrote most comments, deleted some `eejump-<nnn>'s, and
made
+ `eejump-6': point to (find-escripts-intro).
+
* eev-elinks.el (ee-find-intro-links): set the correct default
value for `stem'.
- * eejump.el (eejump-6): point to (find-escripts-intro).
-
2019-03-02 Eduardo Ochs <address@hidden>
* eev-anchors.el: converted to utf-8.
diff --git a/VERSION b/VERSION
index 00ed0a0..4a3d8a9 100644
--- a/VERSION
+++ b/VERSION
@@ -1,2 +1,2 @@
-Mon Mar 4 00:46:47 GMT 2019
-Sun Mar 3 21:46:47 -03 2019
+Tue Mar 5 03:20:28 GMT 2019
+Tue Mar 5 00:20:28 -03 2019
diff --git a/eev-codings.el b/eev-codings.el
index 468a905..2f9b0a6 100644
--- a/eev-codings.el
+++ b/eev-codings.el
@@ -19,7 +19,7 @@
;;
;; Author: Eduardo Ochs <address@hidden>
;; Maintainer: Eduardo Ochs <address@hidden>
-;; Version: 2019feb24
+;; Version: 2019mar04
;; Keywords: e-scripts
;;
;; Latest version: <http://angg.twu.net/eev-current/eev-coding.el>
@@ -37,7 +37,7 @@
;; files; the functions defined here make the local variables section
;; trick unneccessary - `ee-format-as-anchor' now uses `ee-tolatin1'
;; to produce a search string that works both unibyte, on UTF-8, on
-;; latin-1 files and some (most?) other encodings.
+;; latin-1 files and some (most of?) other encodings.
;;
;; NOTE: `ee-tolatin1' a hack! Conversion to latin-1 seems to work in
;; most cases, but I don't understand very well the reasons why... I
@@ -52,6 +52,28 @@
;; http://angg.twu.net/e/emacs.e.html#unibyte-2019-search
;; http://angg.twu.net/e/emacs.e.html#creating-utf8-files
;; http://angg.twu.net/e/emacs.e.html#ee-re-to
+;;
+;;
+;; NOTE 2: Sorry for taking so long!! Here's what happened. This page
+;;
+;; http://angg.twu.net/glyphs.html
+;;
+;; tells a bit about the hacked 256-char fonts that I created many
+;; years before UTF-8 became standard, and that I used for ages in
+;; some of my notes and .tex files... I wanted to maintain
+;; compatibility with the files that used those fonts, and this turned
+;; out to be very hard - these hacked fonts only worked in files and
+;; buffers in which the encoding was "raw-text",
+;;
+;; (find-elnode "Non-ASCII Characters")
+;; (find-elnode "Disabling Multibyte" "unibyte")
+;; (find-elnode "Disabling Multibyte" "raw-text")
+;;
+;; and before 2019 I had a *very* poor understanding of how Emacs
+;; converts between unibyte and multibyte and between raw-text,
+;; latin-1 and utf-8...
+
+
;; «.ee-tolatin1» (to "ee-tolatin1")
;; «.ee-tolatin1-re» (to "ee-tolatin1-re")
diff --git a/eev-intro.el b/eev-intro.el
index 5fb50fe..b7ca4e7 100644
--- a/eev-intro.el
+++ b/eev-intro.el
@@ -20,7 +20,7 @@
;;
;; Author: Eduardo Ochs <address@hidden>
;; Maintainer: Eduardo Ochs <address@hidden>
-;; Version: 2019mar03
+;; Version: 2019mar05
;; Keywords: e-scripts
;;
;; Latest version: <http://angg.twu.net/eev-current/eev-intro.el>
@@ -1396,6 +1396,12 @@ If you run these sexps
then these hyperlinks should work:
+ (find-livesofanimalspage)
+ (find-livesofanimalstext)
+ (find-livesofanimalspage (+ -110 113))
+ (find-livesofanimalstext (+ -110 113))
+ (find-livesofanimalspage (+ -110 113) \"LECTURE I.\")
+ (find-livesofanimalstext (+ -110 113) \"LECTURE I.\")
(find-livesofanimalspage (+ -110 127) \"wrong thoughts\")
(find-livesofanimalstext (+ -110 127) \"wrong thoughts\")
(find-livesofanimalspage (+ -110 132) \"into the place of their victims\")
@@ -1413,7 +1419,46 @@ then these hyperlinks should work:
(find-livesofanimalspage (+ -110 164) \"last common ground\")
(find-livesofanimalstext (+ -110 164) \"last common ground\")
-[To do: explain them]
+The sexps like `(+ -110 113)' are a bit mysterious at first
+sight. We are accessing a PDF that is an excerpt of a book. The
+third page of the PDF has a \"[113]\" at its footer to indicate
+that it is the page 113 of the book. Let's use the terms _page
+number_ and _page label_ to distinguish the two numberings: in
+this case, the page whose page number is 3 is the page whose page
+label is 113. These two sexps
+
+ (find-livesofanimalspage (+ -110 113))
+ (find-livesofanimalspage 3)
+
+are equivalent, but the first one is more human-friendly: the 113
+is a page label, and the -110 is adjustment (we call it the
+\"offset\") to convert the 113 that humans prefer to see intto
+the 3 that xpdf needs to receive.
+
+Note that the sexp
+
+ (find-livesofanimalstext 3)
+
+converts the PDF of the \"Lives of Animals\" book to text and
+goes to \"page 3\" on it by counting formfeeds from the beginning
+of the buffer, as explained here:
+
+ (find-enode \"Pages\" \"formfeed\")
+
+In this pairs of sexps,
+
+ (find-livesofanimalspage (+ -110 113) \"LECTURE I.\")
+ (find-livesofanimalstext (+ -110 113) \"LECTURE I.\")
+
+the first one goes to page 3 of the PDF and ignores the string
+\"LECTURE I.\" (that is there just for humans, as a reminder of
+what is important in that page); the second sexp goes to the page
+3 of the PDF converted to text, searches for the string \"LECTURE
+I.\" and places the cursor right after the end of it.
+
+In section 10.3 we will see how to generate with just a few
+keystrokes a short hyperlink to a page of a PDF and a short
+hyperlink to a string in a page of a PDF.
@@ -1473,14 +1518,89 @@ that will run something similar to:
(find-einfo-links \"(elisp)Top\")
+The code that produces the short hyperlink to an info node is not
+currently very smart. If you look at the definition of
+`find-elnode' here
+
+ (find-code-c-d \"el\" ee-emacs-lisp-directory \"elisp\")
+
+you will see that it saves the \"el\" and the \"elisp\" in global
+variables by running this:
+
+ (setq ee-info-code \"el\")
+ (setq ee-info-file \"elisp\")
+
+The short hyperlink to an info node is only produced when Info is
+visting a node in a manual whose name matches the variable
+`ee-info-file'.
+
10.3. Generating short hyperlinks to intros
-------------------------------------------
+Let's see an example. If you follow this link and type `M-h M-h',
+
+ (find-multiwindow-intro)
+
+you will get an \"*Elisp hyperlinks*\" buffer whose last line
+will be:
+
+ # (find-multiwindow-intro)
+
+which is a short hyperlink to the intro.
+
+
+
10.3. Generating short hyperlinks to PDFs
-----------------------------------------
+We saw in sections 9.3 and 9.4 that after the right preparations
+the first of these hyperlinks
+
+ (find-livesofanimalspage (+ -110 134) \"woke up haggard in the mornings\")
+ (find-livesofanimalstext (+ -110 134) \"woke up haggard in the mornings\")
+
+opens a PDF in a certain page using xpdf, and the second one
+opens in an Emacs buffer the result of converting that PDF to
+text, goes to a certain page in it an searches for a string.
+
+It is difficult to make xpdf send information to Emacs, so this
+trick uses the second link. Run this,
+
+ (find-livesofanimalstext (+ -110 134) \"woke up haggard in the mornings\")
+
+mark a piece of text in it - for example, the \"no punishment\"
+in the end of the first paragraph - and copy it to the kill ring
+with `M-w'. Then type `M-h M-p' (`find-pdflike-page-links'); note
+that `M-h M-h' won't work here because `find-here-links' is not
+smart enough to detect that we are on a PDF converted to text.
+You will get an \"*Elisp hyperlinks*\" buffer that contains these
+links:
+
+ # (find-livesofanimalspage 24)
+ # (find-livesofanimalstext 24)
+ # (find-livesofanimalspage (+ -110 134))
+ # (find-livesofanimalstext (+ -110 134))
+
+ # (find-livesofanimalspage 24 \"no punishment\")
+ # (find-livesofanimalstext 24 \"no punishment\")
+ # (find-livesofanimalspage (+ -110 134) \"no punishment\")
+ # (find-livesofanimalstext (+ -110 134) \"no punishment\")
+
+Remember that we called `code-pdf-page' and `code-pdf-text' as:
+
+ (code-pdf-page \"livesofanimals\" l-o-a)
+ (code-pdf-text \"livesofanimals\" l-o-a -110)
+
+The extra argument \"-110\" to `code-pdf-text' tells `M-h M-p' to
+used \"-110\" as the offset.
+
+
+
+
+10.4. Generating short hyperlinks to anchors
+--------------------------------------------
@@ -6024,113 +6144,36 @@ For more information see:
\(Re)generate: (find-templates-intro)
Source code: (find-eev \"eev-intro.el\" \"find-templates-intro\")
More intros: (find-eev-quick-intro)
- (find-eval-intro)
- (find-eepitch-intro)
+ (find-escripts-intro)
+ (find-links-conv-intro)
+ (find-eev-intro)
This buffer is _temporary_ and _editable_.
Is is meant as both a tutorial and a sandbox.
-`ee-template0'
-==============
-\(find-efunctiondescr 'ee-template0)
-\(find-efunction 'ee-template0)
-
-
-`ee-H', `ee-S', `ee-HS'
-=======================
-
-
-
-`find-find-links-links'
-=======================
-\(find-links-intro)
-\(find-find-links-links)
-\(find-efunction 'ee-stuff-around-point)
-interactive
-
-
-`find-elinks'
-=============
-\(find-efunction 'find-elinks)
-
-
-
- (find-intro-links)
-\(find-eev \"eev-tlinks.el\" \"find-intro-links\")
-\(find-eevfile \"eev-tlinks.el\")
-
-
-The innards: templates
-======================
-Several functions in eev besides `code-c-d' work by replacing
-some substrings in \"templates\"; they all involve calls to
-either the function `ee-template0', which is simpler, or to
-`ee-template', which is much more complex.
-
-The function `ee-template0' receives a single argument - a
-string, in which each substring surrounded by `{...}'s is to be
-replaced, and replaces each `{...}' by the result of evaluating
-the `...' in it. For example:
-
- (ee-template0 \"a{(+ 2 3)}b\")
- --> \"a5b\"
+This into is currently GARBAGE.
+It should be rewritten to become a tutorial on:
-Usually the contents of each `{...}' is the name of a variable,
-and when the result of evaluating a `{...}' is a string the
-replacement does not get `\"\"'s.
+ 1) How to use `ee-template0' and `find-elinks':
-The function `ee-template' receives two arguments, a list and a
-template string, and the list describes which `{...}' are to be
-replaced in the template string, and by what. For example, here,
+ (find-eev \"eev-wrap.el\" \"ee-template0\")
+ (find-eev \"eev-elinks.el\" \"find-elinks\")
- (let ((a \"AA\")
- (b \"BB\"))
- (ee-template '(a
- b
- (c \"CC\"))
- \"_{a}_{b}_{c}_{d}_\"))
+ 2) A review of the conventions here:
- --> \"_AA_BB_CC_{d}_\"
+ (find-links-conv-intro)
+ (find-links-conv-intro \"3. Classification\")
-the \"{d}\" is not replaced. Note that the list (a b (c \"CC\"))
-contains some variables - which get replaced by their values -
-and a pair, that specifies explicitly that every \"{c}\" should
-be replaced by \"CC\".
+ 3) How some template functions like these
+ (find-eev \"eev-tlinks.el\" \"find-find-links-links\")
+ (find-eev \"eev-tlinks.el\" \"find-intro-links\")
+ (find-eev \"eev-wrap.el\" \"find-eewrap-links\")
+ are used to create first versions for several functions in
+ eev...
-
-Templated buffers
-=================
-Introduction
-Conventions:
- the first line regenerates the buffer,
- buffer names with \"**\"s,
- (find-evariable 'ee-buffer-name)
- code
-
-`find-elinks'
-=============
-Variant: `find-elinks-elisp'
-
-`find-e*-links'
-===============
-\(find-eev \"eev-elinks.el\")
-
-`find-*-intro'
-==============
-
-`eewrap-*'
-==========
-
-Experiments
-===========
-\(find-efunction 'find-youtubedl-links)
-\(find-efunction 'ee-hyperlinks-prefix)
-\(find-efunction 'find-newhost-links)
-\(find-efunction 'find-eface-links)
- Note that there is no undo.
" rest)))
;; (find-templates-intro)
diff --git a/eev-pdflike.el b/eev-pdflike.el
index 8e193c4..1ce2477 100644
--- a/eev-pdflike.el
+++ b/eev-pdflike.el
@@ -19,7 +19,7 @@
;;
;; Author: Eduardo Ochs <address@hidden>
;; Maintainer: Eduardo Ochs <address@hidden>
-;; Version: 2019mar02
+;; Version: 2019mar04
;; Keywords: e-scripts
;;
;; Latest version: <http://angg.twu.net/eev-current/eev-pdflike.el>
@@ -173,13 +173,25 @@
(t (error "This is not a valid pos-spec: %S" pos-spec)))
(if rest (ee-goto-rest rest))))
+(defun ee-pdftotext-replace-bad-ffs (bigstr)
+"Convert formfeeds that are preceded by non-newline chars into something else.
+Sometimes pdftotext return \"spurious formfeeds\" that correspond
+not to page breaks but to special printable characters, and these
+spurious formfeeds confuse `ee-goto-position-page'. This function
+finds sequence of spurious formfeeds using a heuristic that works
+in most cases - formfeeds following something that is not a
+newline are spurious - and replaces them by \"(ff)\"."
+ (replace-regexp-in-string
+ "\\([^\n\f]\\)\\(\f+\\)" "\\1(ff)" bigstr t))
+
;; «find-sh-page» (to ".find-sh-page")
-(defun find-sh-page (command &rest pos-spec-list)
- "Like `find-sh', but interpreting the car of POS-SPEC-LIST as a page."
+(defun find-sh-page (program-and-args &rest pos-spec-list)
+ "Like `find-sh', but interpreting the car of POS-SPEC-LIST as a page number."
(interactive "sShell command: ")
(find-eoutput-reuse
- command
- `(insert (shell-command-to-string ,command)))
+ (ee-unsplit program-and-args)
+ `(insert (ee-pdftotext-replace-bad-ffs
+ (find-callprocess00 ,'program-and-args))))
(apply 'ee-goto-position-page pos-spec-list))
@@ -323,11 +335,17 @@
;; (find-code-xxxpdftext-family "pdf-text")
(code-xxxpdftext-family "pdf-text")
+;; (defun ee-find-pdf-text (fname)
+;; (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+;;
+;; (defun ee-find-pdftotext-text (fname)
+;; (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+
(defun ee-find-pdf-text (fname)
- (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+ `("pdftotext" "-layout" "-enc" "Latin1" ,(ee-expand fname) "-"))
(defun ee-find-pdftotext-text (fname)
- (format "pdftotext -layout -enc Latin1 '%s' -" (ee-expand fname)))
+ `("pdftotext" "-layout" "-enc" "Latin1" ,(ee-expand fname) "-"))
diff --git a/eev-wrap.el b/eev-wrap.el
index cea487a..e0ab95c 100644
--- a/eev-wrap.el
+++ b/eev-wrap.el
@@ -40,6 +40,7 @@
;; «.ee-template0» (to "ee-template0")
;; «.ee-S» (to "ee-S")
;; «.ee-this-line-wrapn» (to "ee-this-line-wrapn")
+;; «.find-eewrap-links» (to "find-eewrap-links")
@@ -554,6 +555,7 @@ cd {dir}"))
{<}(ee-HS `(find-{stem} ,{args})){>}\"))\n")))
+;; «find-eewrap-links» (to ".find-eewrap-links")
;; A more standard way to create `eewrap-*' functions.
;; (find-find-links-links "<none>" "eewrap" "C stem args")
;;
- [elpa] externals/eev e4d30ff 39/64: Rewrote parts of `find-escripts-intro' and `find-links-intro'., (continued)
- [elpa] externals/eev e4d30ff 39/64: Rewrote parts of `find-escripts-intro' and `find-links-intro'., Stefan Monnier, 2019/04/07
- [elpa] externals/eev f974ef7 37/64: Added lots of comments to eev-plinks.el., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 41323e0 58/64: Moved the function `ee-template0' to the file eev-template0.el., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 3325b83 41/64: Some clean-ups in the code for `find-eintro-links' and `find-einfo-links'., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 8afeb43 45/64: Cleaned up eev-mode.el and other files., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 8179ff8 19/64: Make `find-eev-quick-intro\' and `find-emacs-keys-intro\' more important, Stefan Monnier, 2019/04/07
- [elpa] externals/eev 4e07159 57/64: Added some comments to eepitch.el., Stefan Monnier, 2019/04/07
- [elpa] externals/eev f6fd28a 62/64: Cleaned up the comments about `ee-code-c-d-filter-2'., Stefan Monnier, 2019/04/07
- [elpa] externals/eev fc52991 32/64: Fontify the `find-xxx-intro' buffers., Stefan Monnier, 2019/04/07
- [elpa] externals/eev ca3dd97 53/64: Added sections to `find-channels-intro'., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 92702c7 49/64: Made `find-pdf-text' ignore spurious formfeeds.,
Stefan Monnier <=
- [elpa] externals/eev 7c396d0 26/64: Many changes in the intros; added eev-load.el, Stefan Monnier, 2019/04/07
- [elpa] externals/eev 8a09aa4 59/64: Made the files eev-code.el and eev-brxxx.el more compatible with lexical binding., Stefan Monnier, 2019/04/07
- [elpa] externals/eev fb9f4c6 64/64: Added an explanation of why eev is not in ELPA yet., Stefan Monnier, 2019/04/07
- [elpa] externals/eev b0f124e 48/64: Added material to `find-escripts-intro' and `find-eev-quick-intro'., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 9d39df9 36/64: Several clean-ups in eev-elinks.el and eev-tlinks.el., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 39a2cea 60/64: New file: eev-template0.el., Stefan Monnier, 2019/04/07
- [elpa] externals/eev c77ca3b 27/64: Rewrote eev-pdflike.el, added sections about it to (find-eev-quick-intro)., Stefan Monnier, 2019/04/07
- [elpa] externals/eev 7162184 51/64: Added `find-texworkspdf-page'., Stefan Monnier, 2019/04/07
- [elpa] externals/eev b9baae6 17/64: First commit after an HD crash; lots of changes, Stefan Monnier, 2019/04/07
- [elpa] externals/eev 2518e75 50/64: Use "eev-beginner.el" instead of "eev-readme.el"., Stefan Monnier, 2019/04/07