[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mhfixmsg character set conversion
From: |
Steven Winikoff |
Subject: |
Re: mhfixmsg character set conversion |
Date: |
Wed, 09 Feb 2022 21:08:37 -0500 |
>> >I would look at output from mime_helper and see if it's UTF-8.
>>
>> Please forgive me for having to ask this, but how is mime_helper even
>> involved? Isn't that used only when I read the message? It isn't in
>> the procmail chain that saves the original copy, and it's the original
>> copy that we've been looking at.
>
>I don't know how mime_helper might fit in. The lynx invocation is still
>my pick for the root cause but you said you're not clear on how it is
>involved.
I understand how it's involved for reading a message; the part I don't
understand is how it's involved in the sequence of steps that occurs when
a new message is received.
Specifically, to the best of my knowlege:
1) sendmail hands the message off to procmail
2) this procmail recipe is activated:
:0 HBfw
* ^Content-Type:.*text/
| /home/smw/bin/email_decoder
I'll append a copy of email_decoder, but the gist of it is:
- explicitly unset LC_ALL and set LANG to en_CA.UTF-8
- save the incoming standard input in $source (a file in /tmp)
- run ~smw/bin/decode_headers using $source as stdin (this explicitly
decodes headers which are RFC 2047-encoded, and passes the body
through unchanged)
- feed stdout from decode_headers into the same mhfixmsg command
I've already quoted a few times; I'll quote it again here to
keep everything in one place:
mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
-reformat -fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream -noverbose -file - \
-outfile "${tf}.fixed"
...where ${tf}.fixed is another, newly created file in /tmp
- use cmp to compare $source and ${tf}.fixed; if they differ, save
$source as a new message in +reformatted
The file which started this discussion is the one from +reformatted, and
I still can't see how lynx would have been involved in its creation.
>I would do this if you haven't already:
>1. download nmh HEAD, build, and install somewhere
I got this far, but I've been unable to proceed since the build failed as
described previously. (To be fair, I also haven't had time to try to get
farther as yet.)
>2. move your $(mhpath +)/mhn.defaults
>3. move your profile and create one with just a Path: entry
>4. run the "mhfixmsg -file original_copy -out -" from 1. and see if the
> output looks good or bad
>
>If it's good, then start adding things back in one at a time in reverse
>order (starting with mhfixmsg switches) until it's bad.
This sounds like an excellent plan, and I intend to follow through with it
on Friday; unfortunately I'll be busy with other things until then.
...although I may need help getting past the build problem.
- Steven
8<----------------------------- cut here ---------------------------->8
#!/bin/sh
#
# email_decoder -- rewrite quoted-printable and base64 text in a message
#
# Steven Winikoff
# 2008/09/11
# 2010/01/22 -- use mhshow to decode
# 2014/05/19 -- always exit with status 0 (see note below)
# 2018/01/22 -- rewrite using mhfixmsg to do the heavy lifting
# 2019/10/17 -- ...and use ~smw/bin/decode_headers to decode RFC 2047
# headers (for use with procmail, grep and mairix)
#
# Given an email message on standard input with at least one portion
# containing text encoded in base64 or quoted-printable format, the
# object of the game is to send the same message back to stdout with
# the text part(s) decoded.
#
# A copy of the original message will also be saved in +reformatted
# (AKA ~smw/Mail/reformatted/) unless the -t (test mode) option is
# specified.
#
# This is intended to be invoked in a procmail filter recipe.
#
# Note that this is the reason why we always exit with status 0, even
# when something goes wrong; this prevents procmail from cluttering its
# log with messages similar to these:
#
# procmail: Program failure (3) of "/home/smw/bin/email_decoder"
# procmail: Rescue of unfiltered data succeeded
#
# usage: email_decoder [-t]
#
#--------------------------------------------------------------------------
# setup:
PATH="/local/paths:/bin:/usr/bin:$PATH"
export PATH
unset LC_ALL; LANG="en_CA.UTF-8"; export LC_ALL LANG
tf="/tmp/decoder.`date +%Y%m%d.%H%M%S.$$`"
trap 'rm -rf ${tf}* >/dev/null 2>&1' 1 2 3 15
save_folder="+reformatted"
test_mode=0
#--------------------------------------------------------------------------
# are we operating in test mode?
if [ ! -z "${1}" ]
then
# officially test mode is indicated by the -t option, but in
# practice we'll accept any argument at all to mean test mod;
test_mode=1
fi
#--------------------------------------------------------------------------
# save a copy of the original message:
#
# if any changes are made (and if not operating in test mode), a copy
# of the original will be left in +reformatted -- but we won't know
# whether that's necessary until later
source="${tf}.original"
cat > ${source}
#--------------------------------------------------------------------------
# run the message through decode_headers and mxfixmsg in that order:
#
# notes:
#
# - this relies on mhfixmsg having been patched to allow output
# lines wider than 998 characters!) to decode base64 and
# quoted-printable text parts:
#
# - the -fixtype option to mxfixmsg (introduced in nmh-1.7) allows
# uninformative MIME types to be replaced by something more
# useful; it can be repeated as many times as necessary, with
# a different type specified each time
#
# - mxfixmsg changes the structure of some messages; for example:
#
# before:
#
# msg part type/subtype size description
# 279 multipart/mixed 540K
# 1 multipart/related 32K
# 1.1 text/html 20K
# 1.2 image/jpeg 2540 image001.jpg
# 2 application/pdf 187K 162160.PDF
# 3 application/pdf 187K 161858.PDF
#
# after:
#
# msg part type/subtype size description
# 280 multipart/mixed 542K
# 1 multipart/related 34K
# 1.1 multipart/alternative 30K
# 1.1.1 text/html 21K
# 1.1.2 text/plain 8829
# 1.2 image/jpeg 2540 image001.jpg
# 2 application/pdf 187K 162160.PDF
# 3 application/pdf 187K 161858.PDF
decode_headers < ${source} | \
mhfixmsg -decodetext 8bit -decodetypes text -textcharset UTF-8 \
-reformat -fixcte -fixboundary -noreplacetextplain \
-fixtype application/octet-stream -noverbose -file - \
-outfile "${tf}.fixed"
#--------------------------------------------------------------------------
# if we didn't actually change anything, just blat the original message to
# stdout; otherwise save the original file (if not in test mode) and send
# the modified version to stdout:
if cmp -s ${source} "${tf}.fixed"
then
cat ${source}
else
original="`mhpath ${save_folder} new`"
[ ${test_mode} -lt 1 ] && cat ${source} > "${original}"
formail -fA "X-Reformatted-From: ${original}" < ${tf}.fixed
fi
#--------------------------------------------------------------------------
# done! clean up and exit:
rm -rf ${tf}* >/dev/null 2>&1
exit 0
8<----------------------------- cut here ---------------------------->8
--
___________________________________________________________________________
Steven Winikoff | Sometimes you will never know the value
Montreal, QC, Canada | of a moment until it becomes a memory.
smw@smwonline.ca |
http://smwonline.ca | - Dr. Seuss
- Re: mhfixmsg character set conversion, (continued)
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion, Robert Elz, 2022/02/11
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/12
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, David Levine, 2022/02/08
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/09
- Re: mhfixmsg character set conversion, David Levine, 2022/02/09
- Re: mhfixmsg character set conversion,
Steven Winikoff <=
- Re: mhfixmsg character set conversion, David Levine, 2022/02/09
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12
- Re: mhfixmsg character set conversion, Ralph Corderoy, 2022/02/10
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/11
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12
- Re: mhfixmsg character set conversion, David Levine, 2022/02/12
- Re: mhfixmsg character set conversion, Ken Hornstein, 2022/02/12
- Re: mhfixmsg character set conversion, Steven Winikoff, 2022/02/12