help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: viewing docx files


From: Yuri Khan
Subject: Re: viewing docx files
Date: Mon, 30 Jan 2017 15:30:53 +0700

On Mon, Jan 30, 2017 at 2:21 PM, Jude DaShiell <jdashiel@panix.com> wrote:
> I wonder if the file utility can tell the difference between a docx-utf-8
> file and a docx-non-utf-8 file.  If that can work it may be possible to do a
> little docx inspection to find when to trigger the unzip->iconv->zip process
> and only trigger that process when necessary.

Instead of iconv, use xmllint --encode utf-8. It will extract the
source encoding from the XML declaration at the top of the file, and
reencode from that to UTF-8. Trigger it unconditionally, for each
*.xml file in the archive.

Consider also trying to persuade Pandoc developers to support
non-UTF-8-encoded XML data.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]