libextractor
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[libextractor] Microsoft Office mimetype (OLE2) is not recognized reliab


From: Marc
Subject: [libextractor] Microsoft Office mimetype (OLE2) is not recognized reliable
Date: Sat, 30 Aug 2008 23:44:15 +0200
User-agent: KMail/1.9.9

Hi,

great work the libextractor, I like to learn with it and figure things out, 
starting to learn python.

One problem I noticed:

I try to distinguish file formats of the different Microsoft-Office 
formats using the mimetype information provided by libextractor (I have no 
filename extansions of the files to investigate). The problem is that often 
only a general information e.g. "application/vnd.ms-office" are extracted. 
The result depends on the specific application which has been used at last 
save of the document/spreadsheet/presentation.

I found out that other programms have similar problems to do this job:
- In the Linux-Distro Kubuntu Hardy that I use - e.g. XLS-files without 
filename extension appears as DOC in Konqueror
- Windows XP can't do so either (in filemanager)
- I also tried NLNZ Metadata Extractor v3.0 without success
- The file command on the shell gives wrong application type too

Although e.g. OpenOffice can open all the formats without filename extension 
and imports the correct way (Writer/Calc/Presenter).

I use use libextractor 0.5.18a and Python-Extractor 0.5-2. In ChangeLog I 
didn't found changes regarding OLE2 plugin since 0.5.18a version.

Anyone has encountered the same problem? How could this be solved?

Best regards,
Marc




reply via email to

[Prev in Thread] Current Thread [Next in Thread]