[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: viewing docx files
From: |
Yuri Khan |
Subject: |
Re: viewing docx files |
Date: |
Mon, 30 Jan 2017 15:30:53 +0700 |
On Mon, Jan 30, 2017 at 2:21 PM, Jude DaShiell <jdashiel@panix.com> wrote:
> I wonder if the file utility can tell the difference between a docx-utf-8
> file and a docx-non-utf-8 file. If that can work it may be possible to do a
> little docx inspection to find when to trigger the unzip->iconv->zip process
> and only trigger that process when necessary.
Instead of iconv, use xmllint --encode utf-8. It will extract the
source encoding from the XML declaration at the top of the file, and
reencode from that to UTF-8. Trigger it unconditionally, for each
*.xml file in the archive.
Consider also trying to persuade Pandoc developers to support
non-UTF-8-encoded XML data.
- Re: viewing docx files, (continued)
Re: viewing docx files, Eli Zaretskii, 2017/01/28
Re: viewing docx files, Joost Kremers, 2017/01/28
Re: viewing docx files, Tomas Nordin, 2017/01/28
Re: viewing docx files, Tomas Nordin, 2017/01/29
Re: viewing docx files, Tomas Nordin, 2017/01/30