[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[libextractor] Hachoir project and some comments about libextractor
From: |
Victor STINNER |
Subject: |
[libextractor] Hachoir project and some comments about libextractor |
Date: |
Wed, 2 Aug 2006 17:16:27 +0200 (CEST) |
User-agent: |
SquirrelMail/1.4.3a |
Hi,
I'm one of the authors of Hachoir project:
http://hachoir.python-hosting.com/
This project is a generic binary (and only binary) file parser. It's in
development since 10 months, but it's already interesting to test it.
I'm writting to you because I wrote a small tool based on Hachoir:
hachoir-metadata which extract many informations from known files. "known"
means that it needs a Hachoir parser and a metadata extractor. List of
supported files is here:
http://hachoir.python-hosting.com/wiki/Metadata
It's hard to say if it's fast or not since I don't have good test, but on
supported files it gives more informations than extract. I don't know if
your goal is to extract the more informations as possible or just to
extract informations useful to search a specific file.
We worked on optimisation last weeks. Best result was with svn version 479
: on one file, Hachoir was just 4 times slower than extract. Test is "time
extract file.png" and "time hachoir --metadata file.png". But this test is
stupid because Python take some millisecond to load (whereas extract is
pure C code).
--
I think that you use Hachoir source code to improve your parsers. Example:
PNG parser is poor. It doesn't extract create date not comments. You can
look at "parser/image/png.py" and "metadata/image.py".
To download Hachoir:
svn co https://svn.hachoir.python-hosting.com/hachoir/trunk hachoir
To test Hachoir:
cd <hachoir directory>
export PYTHONPATH=$(cd src; pwd)
script/hachoir-metadata file
script/hachoir-metadata file1 file2 ...
Options:
script/hachoir-metadata --level LEVEL file, filter informations
script/hachoir-metadata --mime LEVEL file, just display MIME type
You can also test file explorer (need python "urwid" module):
script/hachoir-urwid file
Or you can install it using "./setup.py install" ;-) (but now it's broken,
I will fix it next hours)
Haypo
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [libextractor] Hachoir project and some comments about libextractor,
Victor STINNER <=