gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] r9982 - in Extractor: doc src/main src/plugins


From: gnunet
Subject: [GNUnet-SVN] r9982 - in Extractor: doc src/main src/plugins
Date: Wed, 13 Jan 2010 14:42:34 +0100

Author: grothoff
Date: 2010-01-13 14:42:34 +0100 (Wed, 13 Jan 2010)
New Revision: 9982

Added:
   Extractor/src/plugins/id3_extractor.c
Modified:
   Extractor/doc/extractor.texi
   Extractor/doc/version.texi
   Extractor/src/main/extractor.c
   Extractor/src/plugins/Makefile.am
   Extractor/src/plugins/mp3_extractor.c
Log:
adding support for tail extraction, documenting, using it for ID3v1

Modified: Extractor/doc/extractor.texi
===================================================================
--- Extractor/doc/extractor.texi        2010-01-11 22:13:37 UTC (rev 9981)
+++ Extractor/doc/extractor.texi        2010-01-13 13:42:34 UTC (rev 9982)
@@ -10,9 +10,11 @@
 @c %**end of header
 @copying
 This manual is for GNU libextractor
-(version @value{VERSION}, @value{UPDATED}),
-which is GNU's library for meta data extraction.
+(version @value{VERSION}, @value{UPDATED}).
 
+GNU libextractor is a GNU package.
+
+
 Copyright @copyright{} 2007, 2010 Christian Grothoff
 
 @quotation
@@ -73,7 +75,7 @@
 @code{NULL}
 @end macro
 
address@hidden le{}
address@hidden gnule{}
 @acronym{GNU libextractor}
 @end macro
 
@@ -84,24 +86,22 @@
 @insertcopying
 @end ifnottex
 
-GNU libextractor is a GNU package.
-
 @menu
-* Introduction::                 What is @le{}.
+* Introduction::                 What is @gnule{}.
 * Preparation::                  What you should do before using the library.
 * Generalities::                 General library functions and data types.
-* Extracting meta data::         How to use @le{} to obtain meta data.
-* Language bindings::            How to use @le{} from languages other than C.
-* Utility functions::            Utility functions of @le{}.
+* Extracting meta data::         How to use @gnule{} to obtain meta data.
+* Language bindings::            How to use @gnule{} from languages other than 
C.
+* Utility functions::            Utility functions of @gnule{}.
 * Existing Plugins::             What plugins are available.
-* Writing new Plugins::          How to write new plugins for @le{}.
-* Internal utility functions::   Utility functions of @le{} for writing 
plugins.
+* Writing new Plugins::          How to write new plugins for @gnule{}.
+* Internal utility functions::   Utility functions of @gnule{} for writing 
plugins.
 * Reporting bugs::               How to report bugs or request new features.
 
 Appendices
 
 * Copying::                     The GNU General Public License says how you
-                                can copy and share some parts of @le{}.
+                                can copy and share some parts of @gnule{}.
 
 Indices
 
@@ -120,7 +120,7 @@
 @chapter Introduction
 
 @cindex error handling
address@hidden is GNU's library for extracting meta data from
address@hidden is GNU's library for extracting meta data from
 files.  Meta data includes format information (such as mime type,
 image dimensions, color depth, recording frequency), content
 descriptions (such as document title or document description) and
@@ -128,55 +128,55 @@
 Meta data extraction is an inherently uncertain business --- a parse
 error can be a corrupt file, an incompatibility in the file format
 version, an entirely different file format or a bug in the parser.  As
-a result of this uncertainty, @le{} deliberately
+a result of this uncertainty, @gnule{} deliberately
 avoids to ever report any errors.  Unexpected file contents simply
 result in less or possibly no meta data being extracted.  
 
 @cindex plugin
address@hidden uses plugins to handle various file formats.
address@hidden uses plugins to handle various file formats.
 Technically a plugin can support multiple file formats; however, most
 plugins only support one particular format.  By default,
address@hidden will use all plugins that are available and found
address@hidden will use all plugins that are available and found
 in the plugin installation directory.  Applications can
 request the use of only specific plugins or the exclusion of
 certain plugins.
 
address@hidden is distributed with the @command{extract} 
address@hidden is distributed with the @command{extract} 
 address@hidden distributions ship @command{extract} in a
 seperate package.} which is a command-line tool for extracting
 meta data.  @command{extract} is given a list of filenames and 
 prints the resulting meta data to the console.  The @command{extract}
 source code also serves as an advanced example for how to use
address@hidden  
address@hidden  
 
 This manual focuses on providing documentation for writing software
-with @le{}.  The only relevant parts for end-users
-are the chapter on compiling and installing @le{}
+with @gnule{}.  The only relevant parts for end-users
+are the chapter on compiling and installing @gnule{}
 (@xref{Preparation}.).  Also, the chapter on existing plugins maybe of
 interest (@xref{Existing Plugins}.).  Additional documentation for
 end-users can be find in the man page on @command{extract} (using
 @verb{|man extract|}).
 
 @cindex license
address@hidden is licensed under the GNU General Public License.  The
address@hidden is licensed under the GNU General Public License.  The
 developers have frequently received requests to license GNU
-libextractor under alternative terms.  However, @le{}
+libextractor under alternative terms.  However, @gnule{}
 borrows plenty of GPL-licensed code from various other projects.
 Hence we cannot change the license (even if we wanted to)address@hidden
 maybe possible to switch to GPLv3 in the future.  For this, an audit
 of the license status of our dependencies would be required.  The new
-code that was developed specifically for @le{} has
+code that was developed specifically for @gnule{} has
 always been licensed under GPLv2 @emph{or any later version}.}
 
 @node Preparation
 @chapter Preparation
 
-Compiling @le{} follows the standard GNU autotools
+Compiling @gnule{} follows the standard GNU autotools
 build process using @command{configure} and @command{make}.  For
 details, read the @file{INSTALL} file and query 
 @verb{|./configure --help|} for additional options.
 
address@hidden has various dependencies, some of which are optional. 
address@hidden has various dependencies, some of which are optional. 
 Instead of specifying the names of the software packages, we
 will give the list in terms of the names of the respective
 Debian (unstable) packages that should be installed.
@@ -246,29 +246,29 @@
 supposed to only list direct dependencies, not transitive
 dependencies).
 
-Once you have compiled and installed @le{}, you should have a file
+Once you have compiled and installed @gnule{}, you should have a file
 @file{extractor.h} installed in your @file{include/} directory.  This
 file should be the starting point for your C and C++ development with
address@hidden  The build process also installs the @file{extract} binary and
-man pages for @file{extract} and @le{}.  The @file{extract} man page
-documents the @file{extract} tool.  The @le{} man page gives a brief
-summary of the C API for @le{}.
address@hidden  The build process also installs the @file{extract} binary and
+man pages for @file{extract} and @gnule{}.  The @file{extract} man page
+documents the @file{extract} tool.  The @gnule{} man page gives a brief
+summary of the C API for @gnule{}.
 
 @cindex packageing
 @cindex directory structure
 @cindex plugin
 @cindex environment variables
 @vindex LIBEXTRACTOR_PREFIX
-When you install @le{}, various plugins will be
+When you install @gnule{}, various plugins will be
 installed in the @file{lib/libextractor/} directory.  The main library
 will be installed as @file{lib/libextractor.so}.  Note that
address@hidden will attempt to find the plugins relative to the
address@hidden will attempt to find the plugins relative to the
 path of the main library.  Consequently, a package manager can move
 the library and its plugins to a different location later --- as long
 as the relative path between the main library and the plugins is
 preserved.  As a method of last resort, the user can specify an
 environment variable @verb{|LIBEXTRACTOR_PREFIX|}.  If
address@hidden cannot locate a plugin, it will look in
address@hidden cannot locate a plugin, it will look in
 @verb{|LIBEXTRACTOR_PREFIX/lib/libextractor/|}.
 
 @section Note to package maintainers
@@ -304,9 +304,9 @@
 @node Generalities
 @chapter Generalities
 
-Each public symbol exported by @le{} has the prefix
+Each public symbol exported by @gnule{} has the prefix
 @verb{|EXTRACTOR_|}.  All-caps names are used for constants.  For the
-impatient, the minimal C code for using @le{} (on the
+impatient, the minimal C code for using @gnule{} (on the
 executing binary itself) looks like this:
 
 @verbatim
@@ -326,6 +326,13 @@
 @node Extracting meta data
 @chapter Extracting meta data
 
+In order to extract meta data with @gnule{} you first need to
+load the respective plugins and then call the extraction API
+with the plugins and the data to process.  This section
+documents how to load and unload plugins, the various types
+and formats in which meta data is returned to the application
+and finally the extraction API itself.
+
 @menu
 * Plugin management::   How to load and unload plugins
 * Meta types::          About meta types
@@ -350,7 +357,7 @@
 plugin lists and using them concurrently is supported as long as
 the @code{EXTRACTOR_OPTION_IN_PROCESS} option is not used. 
 
-Generally, @le{} is fully thread-safe and mostly reentrant.
+Generally, @gnule{} is fully thread-safe and mostly reentrant.
 All plugin code is expected required to be reentrant and state-less,
 but due to the extensive use of 3rd party libraries this cannot
 be guaranteed.  Hence plugins are executed (by default) out of
@@ -402,7 +409,7 @@
 @deftypefun {struct EXTRACTOR_PluginList *} EXTRACTOR_plugin_add_defaults 
(enum EXTRACTOR_Options flags)
 @findex EXTRACTOR_plugin_add_defaults
 
-Loads all of the plugins in the plugin directory.  This function is what most 
@le{} applications should use to setup the plugins.
+Loads all of the plugins in the plugin directory.  This function is what most 
@gnule{} applications should use to setup the plugins.
 @end deftypefun
 
 
@@ -414,14 +421,14 @@
 @tindex enum EXTRACTOR_MetaType
 @findex EXTRACTOR_metatype_get_max
 
address@hidden|enum EXTRACTOR_MetaType|} is a C enum which defines a list of 
over 100 different types of meta data.  The total number can differ between 
different @le{} releases; the maximum value for the current release can be 
obtained using the @verb{|EXTRACTOR_metatype_get_max|} function.  All values in 
this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}.
address@hidden|enum EXTRACTOR_MetaType|} is a C enum which defines a list of 
over 100 different types of meta data.  The total number can differ between 
different @gnule{} releases; the maximum value for the current release can be 
obtained using the @verb{|EXTRACTOR_metatype_get_max|} function.  All values in 
this enumeration are of the form @verb{|EXTRACTOR_METATYPE_XXX|}.
 
 @deftypefun {const char *} EXTRACTOR_metatype_to_string (enum 
EXTRACTOR_MetaType type)
 @findex EXTRACTOR_metatype_to_string
 @cindex gettext
 @cindex internationalization
 
-The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a 
short English string @samp{s} describing the meta data type.  The string can be 
translated into other languages using GNU gettext with the domain set to @le{} 
(@verb{|dgettext("libextractor", s)|}).  
+The function @verb{|EXTRACTOR_metatype_to_string|} can be used to obtain a 
short English string @samp{s} describing the meta data type.  The string can be 
translated into other languages using GNU gettext with the domain set to 
@gnule{} (@verb{|dgettext("libextractor", s)|}).  
 @end deftypefun
 
 @deftypefun {const char *} EXTRACTOR_metatype_to_description (enum 
EXTRACTOR_MetaType type)
@@ -429,7 +436,7 @@
 @cindex gettext
 @cindex internationalization
 
-The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain 
a longer English string @samp{s} describing the meta data type.  The 
description may be empty if the short description returned by 
@code{EXTRACTOR_metatype_to_string} is already comprehensive.  The string can 
be translated into other languages using GNU gettext with the domain set to 
@le{} (@verb{|dgettext("libextractor", s)|}).  
+The function @verb{|EXTRACTOR_metatype_to_description|} can be used to obtain 
a longer English string @samp{s} describing the meta data type.  The 
description may be empty if the short description returned by 
@code{EXTRACTOR_metatype_to_string} is already comprehensive.  The string can 
be translated into other languages using GNU gettext with the domain set to 
@gnule{} (@verb{|dgettext("libextractor", s)|}).  
 @end deftypefun
 
 
@@ -490,11 +497,11 @@
 @cindex threads
 @cindex thread-safety
 
-This is the main function for extracting keywords with @le{}.  The first 
argument is a plugin list which specifies the set of plugins that should be 
used for extracting meta data.  The @samp{filename} argument is optional and 
can be used to specify the name of a file to process.  If @samp{filename} is 
NULL, then the @samp{data} argument must point to the in-memory data to extract 
meta data from.  If @samp{filename} is non-NULL, @samp{data} can be NULL.  If 
@samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes.  
Otherwise @samp{size} should be zero.  For each meta data item found, GNU 
libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the 
first argument to @samp{proc}.  The other arguments to @samp{proc} depend on 
the specific meta data found.  
+This is the main function for extracting keywords with @gnule{}.  The first 
argument is a plugin list which specifies the set of plugins that should be 
used for extracting meta data.  The @samp{filename} argument is optional and 
can be used to specify the name of a file to process.  If @samp{filename} is 
NULL, then the @samp{data} argument must point to the in-memory data to extract 
meta data from.  If @samp{filename} is non-NULL, @samp{data} can be NULL.  If 
@samp{data} is non-null, then @samp{size} is the size of @samp{data} in bytes.  
Otherwise @samp{size} should be zero.  For each meta data item found, GNU 
libextractor will call the @samp{proc} function, passing @samp{proc_cls} as the 
first argument to @samp{proc}.  The other arguments to @samp{proc} depend on 
the specific meta data found.  
 
 @cindex SIGBUS
 @cindex bus error
-Meta data extraction should never really fail --- at worst, @le{} should not 
call @samp{proc} with any meta data. By design, @le{} should never crash or 
leak memory, even given corrupt files as input.  Note however, that running 
@le{} on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can 
result in the operating system sending a SIGBUS (bus error) to the process.  
While @le{} runs plugins out-of-process, it first maps the file into memory and 
then attempts to decompress it.  During decompression it is possible to 
encounter a SIGBUS.   @le{} will @emph{not} attempt to catch this signal and 
your application is likely to crash.  Note again that this should only happen 
if the file @emph{system} is corrupt (not if individual files are corrupt).  If 
this is not acceptable, you might want to consider running @le{} itself also 
out-of-process (as done, for example, by 
@url{http://grothoff.org/christian/doodle/,doodle}).
+Meta data extraction should never really fail --- at worst, @gnule{} should 
not call @samp{proc} with any meta data. By design, @gnule{} should never crash 
or leak memory, even given corrupt files as input.  Note however, that running 
@gnule{} on a corrupt file system (or incorrectly @verb{|mmap|}ed files) can 
result in the operating system sending a SIGBUS (bus error) to the process.  
While @gnule{} runs plugins out-of-process, it first maps the file into memory 
and then attempts to decompress it.  During decompression it is possible to 
encounter a SIGBUS.   @gnule{} will @emph{not} attempt to catch this signal and 
your application is likely to crash.  Note again that this should only happen 
if the file @emph{system} is corrupt (not if individual files are corrupt).  If 
this is not acceptable, you might want to consider running @gnule{} itself also 
out-of-process (as done, for example, by 
@url{http://grothoff.org/christian/doodle/,doodle}).
 
 @end deftypefun
 
@@ -509,7 +516,7 @@
 @cindex PHP
 @cindex Ruby
 
address@hidden works immediately with C and C++ code. Bindings for Java, Mono, 
Ruby, Perl, PHP and Python are available for download from the main @le{} 
website.  Documentation for these bindings (if available) is part of the 
downloads for the respective binding.  In all cases, a full installation of the 
C library is required before the binding can be installed.
address@hidden works immediately with C and C++ code. Bindings for Java, Mono, 
Ruby, Perl, PHP and Python are available for download from the main @gnule{} 
website.  Documentation for these bindings (if available) is part of the 
downloads for the respective binding.  In all cases, a full installation of the 
C library is required before the binding can be installed.
 
 @section Java
 
@@ -571,7 +578,7 @@
 @cindex concurrency
 @cindex threads
 @cindex thread-safety
-This chapter describes various utility functions for @le{} usage. All of the 
functions are reentrant.
+This chapter describes various utility functions for @gnule{} usage. All of 
the functions are reentrant.
 
 @menu
 * Utility Constants::
@@ -724,6 +731,115 @@
 plugins.
 
 
address@hidden Example for a minimal extract method
+
+The following example shows how a plugin can return the mime type of
+a file.
address@hidden
+
+int
+EXTRACTOR_mymime_extract
+   (const char *data,
+    size_t data_size,
+    EXTRACTOR_MetaDataProcessor proc,
+    void *proc_cls,
+    const char * options)
+{
+  if (data_size < 4)
+    return 0;
+  if (0 != memcmp (data, "\177ELF", 4))
+    return 0;
+  if (0 != proc (proc_cls, 
+                 "mymime",
+                 EXTRACTOR_METATYPE_MIMETYPE,
+                 EXTRACTOR_METAFORMAT_UTF8,
+                 "text/plain",
+                 "application/x-executable",
+                 1 + strlen("application/x-executable")))
+    return 1;
+  /* more calls to 'proc' here as needed */
+  return 0;
+}
+
address@hidden example
+
address@hidden Plugin execution options
+
+Plugins can request that their execution be done in a particular way.
+For this, the plugin defines a function with the following signature:
+
address@hidden
+const char *
+EXTRACTOR_XXX_options (void);
address@hidden verbatim
+
+The function should return a string with the execution options.
+Individual options in this string should be separated by semicolons.
+Options that are included in the string but not known to the library
+are ignored.  The following options are supported:
+
address@hidden @bullet
address@hidden
address@hidden ensures that the plugin is only run out-of-process; if
+this is not possible, the plugin will not be executed at all if this
+option is set.
+
address@hidden
address@hidden ensures that @code{stderr} is closed during the
+execution of the plugin.  This is useful if the plugin uses libraries
+that write (error) messages to @code{stderr} and where this behavior cannot be 
+turned off.  This option only works if the plugin is executed out-of-process.
+
address@hidden
address@hidden ensures that @code{stdout} is closed during the
+execution of the plugin.  This is useful if the plugin uses libraries
+that write messages to @code{stdout} and where this behavior cannot be 
+turned off.  This option only works if the plugin is executed out-of-process.
+
address@hidden
address@hidden kills and restarts the plugin process for each
+file that is being analyzed.  This is useful if the plugin uses
+libraries that keep global state between runs that is problematic or
+if the plugin uses libraries that are known to have serious resource
+leaks (such as memory leaks).
+
address@hidden
address@hidden 
+In order to limit memory consumption, limit the amount if reading from
+disk and to keep the API simple, the @samp{data} argument passed to
+the @code{EXTRACTOR_XXX_extract} method bounded (to 32 MB of normal
+data; for compressed data, a limit of 16 MB is imposed)address@hidden
address@hidden was given a pointer to an existing, uncompressed block of
+data in memory, no bound is imposed for plugins executing in-process;
+for out-of-process plugins, a 32 MB limit is still imposed.}  Since
+some file formats contain meta data at the end of the file, this option
+provides a way for plugins to access not the first 16--32 MB of a file
+but instead the last (roughly) 32 MB. 
+
+Note that even for files larger than 32 MB, @samp{size} is not
+guaranteed to be 32 MB since @samp{data} will be aligned to the page
+size of the operating system.  However, the last byte of @samp{data}
+is guaranteed to be the last byte of the file.  Furthermore, if the
+file was large and compressed, unlike in the case of meta data
+extraction from the header, the end of the file will not be
+automatically decompressed by @gnule{}.  
+
address@hidden itemize
+
+Note that using options other than @code{want-tail} is pretty much
+always a kludge and should thus be avoided.
+
address@hidden Example for an options method
+
+The following example shows how a plugin can set some of the options listed 
above:
address@hidden
+const char *
+EXTRACTOR_id3_options ()
+{
+  return "close-stderr;want-tail";
+}
address@hidden example
+
 @node Internal utility functions
 @chapter Internal utility functions
 
@@ -752,12 +868,12 @@
 @cindex UTF-8
 @cindex character set
 @findex EXTRACTOR_common_convert_to_utf8
-Various @le{} plugins make use of the internal
+Various @gnule{} plugins make use of the internal
 @file{convert.h} header which defines a function
 
 @verb{|EXTRACTOR_common_convert_to_utf8|} which can be used to easily convert 
text from
 any character set to UTF-8.  This conversion is important since the
-linked list of keywords that is returned by @le{} is
+linked list of keywords that is returned by @gnule{} is
 expected to contain only UTF-8 strings.  Naturally, proper conversion
 may not always be possible since some file formats fail to specify the
 character set.  In that case, it is often better to not convert at
@@ -781,9 +897,9 @@
 @chapter Reporting bugs
 
 @cindex bug
address@hidden uses the @url{http://gnunet.org/bugs/,Mantis bugtracking
address@hidden uses the @url{http://gnunet.org/bugs/,Mantis bugtracking
 system}.  If possible, please report bugs there.  You can also e-mail
-the @le{} mailinglist at @url{libextractor@@gnu.org}.
+the @gnule{} mailinglist at @url{libextractor@@gnu.org}.
 
 
 

Modified: Extractor/doc/version.texi
===================================================================
--- Extractor/doc/version.texi  2010-01-11 22:13:37 UTC (rev 9981)
+++ Extractor/doc/version.texi  2010-01-13 13:42:34 UTC (rev 9982)
@@ -1,4 +1,4 @@
address@hidden UPDATED 1 January 2010
address@hidden UPDATED 13 January 2010
 @set UPDATED-MONTH January 2010
 @set EDITION 0.6.0
 @set VERSION 0.6.0

Modified: Extractor/src/main/extractor.c
===================================================================
--- Extractor/src/main/extractor.c      2010-01-11 22:13:37 UTC (rev 9981)
+++ Extractor/src/main/extractor.c      2010-01-13 13:42:34 UTC (rev 9982)
@@ -630,6 +630,7 @@
  */
 static void *
 get_symbol_with_prefix(void *lib_handle,
+                      const char *template,
                       const char *prefix,
                       const char **options)
 {
@@ -649,9 +650,9 @@
   dot = strstr (sym, ".");
   if (dot != NULL)
     *dot = '\0';
-  name = malloc(strlen(sym) + 32);
+  name = malloc(strlen(sym) + strlen(template) + 1);
   sprintf(name,
-         "_EXTRACTOR_%s_extract",
+         template,
          sym);
   /* try without '_' first */
   symbol = lt_dlsym(lib_handle, name + 1);
@@ -678,7 +679,8 @@
 #endif
     }
 
-  if (symbol != NULL)
+  if ( (symbol != NULL) &&
+       (NULL != options) )
     {
       /* get special options */
       sprintf(name,
@@ -741,6 +743,7 @@
       return -1;
     }
   plugin->extractMethod = get_symbol_with_prefix (plugin->libraryHandle,
+                                                 "_EXTRACTOR_%s_extract",
                                                  plugin->libname,
                                                  &plugin->specials);
   if (plugin->extractMethod == NULL) 
@@ -1094,10 +1097,9 @@
 
 
 /**
- * 'main' function of the child process.
- * Reads shm-filenames from 'in' (line-by-line) and
- * writes meta data blocks to 'out'.  The meta data
- * stream is terminated by an empty entry.
+ * 'main' function of the child process.  Reads shm-filenames from
+ * 'in' (line-by-line) and writes meta data blocks to 'out'.  The meta
+ * data stream is terminated by an empty entry.
  *
  * @param plugin extractor plugin to use
  * @param in stream to read from
@@ -1108,12 +1110,15 @@
                  int in,
                  int out)
 {
-  char fn[256];
+  char hfn[256];
+  char tfn[256];
+  char *fn;
   FILE *fin;
   void *ptr;
   int shmid;
   struct IpcHeader hdr;
   size_t size;
+  int want_tail;
 #ifdef WINDOWS
   HANDLE map;
 #endif
@@ -1129,8 +1134,15 @@
 #endif
       return;
     }  
+  want_tail = 0;
   if ( (plugin->specials != NULL) &&
        (NULL != strstr (plugin->specials,
+                       "want-tail")) )
+    {
+      want_tail = 1;
+    }
+  if ( (plugin->specials != NULL) &&
+       (NULL != strstr (plugin->specials,
                        "close-stderr")) )
     {
       close (2);
@@ -1144,12 +1156,27 @@
 
   memset (&hdr, 0, sizeof (hdr));
   fin = fdopen (in, "r");
-  while (NULL != fgets (fn, sizeof(fn), fin))
+  while (NULL != fgets (hfn, sizeof(hfn), fin))
     {
-      if (strlen (fn) == 0)
+      if (strlen (hfn) <= 1)
        break;
       ptr = NULL;
-      fn[strlen(fn)-1] = '\0'; /* kill newline */
+      hfn[strlen(hfn)-1] = '\0'; /* kill newline */
+      if (NULL == fgets (tfn, sizeof(tfn), fin))
+       break;
+      if ('!' != tfn[0])
+       break;
+      tfn[strlen(tfn)-1] = '\0'; /* kill newline */
+      if ( (want_tail) &&
+          (strlen (tfn) > 1) )
+       {
+         fn = &tfn[1];
+       }
+      else
+       {
+         fn = hfn;     
+       }
+
 #ifndef WINDOWS
       if ( (-1 != (shmid = shm_open (fn, O_RDONLY, 0))) &&
           (((off_t)-1) != (size = lseek (shmid, 0, SEEK_END))) &&
@@ -1161,12 +1188,13 @@
       if (ptr != NULL)
 #endif
        {
-         if (0 != plugin->extractMethod (ptr,
-                                         size,
-                                         &transmit_reply,
-                                         &out,
-                                         plugin->plugin_options))
-           break;
+         if ( (plugin->extractMethod != NULL) &&
+              (0 != plugin->extractMethod (ptr,
+                                           size,
+                                           &transmit_reply,
+                                           &out,
+                                           plugin->plugin_options)) )
+           break;          
          if (0 != write_all (out, &hdr, sizeof(hdr)))
            break;
        }
@@ -1195,8 +1223,10 @@
   close (out);
 }
 
+
 #ifdef WINDOWS
-static void write_plugin_data (HANDLE h, const struct EXTRACTOR_PluginList 
*plugin)
+static void 
+write_plugin_data (HANDLE h, const struct EXTRACTOR_PluginList *plugin)
 {
   size_t i;
   DWORD len;
@@ -1217,7 +1247,9 @@
   WriteFile (h, plugin->plugin_options, i, &len, NULL);
 }
 
-static struct EXTRACTOR_PluginList *read_plugin_data (FILE *f)
+
+static struct EXTRACTOR_PluginList *
+read_plugin_data (FILE *f)
 {
   struct EXTRACTOR_PluginList *ret;
   size_t i;
@@ -1239,7 +1271,9 @@
   return ret;
 }
 
-void CALLBACK RundllEntryPoint(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, 
int nCmdShow)
+
+void CALLBACK 
+RundllEntryPoint(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, int nCmdShow)
 {
   int in, out;
 
@@ -1253,6 +1287,7 @@
 }
 #endif
 
+
 /**
  * Start the process for the given plugin.
  */ 
@@ -1331,6 +1366,7 @@
  *
  * @param plugin which plugin to call
  * @param shmfn file name of the shared memory segment
+ * @param tshmfn file name of the shared memory segment for the end of the data
  * @param proc function to call on the meta data
  * @param proc_cls cls for proc
  * @return 0 if proc did not return non-zero
@@ -1338,6 +1374,7 @@
 static int
 extract_oop (struct EXTRACTOR_PluginList *plugin,
             const char *shmfn,
+            const char *tshmfn,
             EXTRACTOR_MetaDataProcessor proc,
             void *proc_cls)
 {
@@ -1347,7 +1384,9 @@
 
   if (plugin->cpid == -1)
     return 0;
-  if (0 >= fprintf (plugin->cpipe_in, "%s\n", shmfn))
+  if (0 >= fprintf (plugin->cpipe_in, 
+                   "%s\n",
+                   shmfn))
     {
       stop_process (plugin);
       plugin->cpid = -1;
@@ -1355,6 +1394,16 @@
        plugin->flags = EXTRACTOR_OPTION_DISABLED;
       return 0;
     }
+  if (0 >= fprintf (plugin->cpipe_in, 
+                   "!%s\n",
+                   (tshmfn != NULL) ? tshmfn : ""))
+    {
+      stop_process (plugin);
+      plugin->cpid = -1;
+      if (plugin->flags != EXTRACTOR_OPTION_DEFAULT_POLICY)
+       plugin->flags = EXTRACTOR_OPTION_DISABLED;
+      return 0;
+    }
   fflush (plugin->cpipe_in);
   while (1)
     {
@@ -1420,33 +1469,108 @@
 
 
 /**
- * Extract keywords from a file using the given set of plugins.
+ * Setup a shared memory segment.
  *
+ * @param ptr set to the location of the shm segment
+ * @param shmid where to store the shm ID
+ * @param fn name of the shared segment
+ * @param fn_size size available in fn
+ * @param size number of bytes to allocated for the segment
+ * @return 0 on success
+ */
+static int
+make_shm (int is_tail,
+         void **ptr,
+#ifndef WINDOWS
+         int *shmid,
+#else
+         HANDLE *mappedFile,
+         HANDLE *map,
+#endif   
+         char *fn,
+         size_t fn_size,
+         size_t size)
+{
+  snprintf (fn,
+           fn_size,
+#ifdef WINDOWS
+           "%TEMP%\\"
+#else
+           "/"
+#endif
+           "libextractor-%sshm-%u-%u",
+           (is_tail) ? "t" : "",
+           getpid(),
+           (unsigned int) RANDOM());
+#ifndef WINDOWS
+  *shmid = shm_open (fn, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
+  *ptr = NULL;
+  if (-1 == (*shmid))
+    return 1;    
+  if ( (0 != ftruncate (*shmid, size)) ||
+       (NULL == (*ptr = mmap (NULL, size, PROT_WRITE, MAP_SHARED, *shmid, 0))) 
||
+       (*ptr == (void*) -1) )
+    {
+      close (*shmid);  
+      *shmid = -1;
+      return 1;
+    }
+  return 0;
+#else
+  *mappedFile = CreateFile (fn, 
+                          GENERIC_READ | GENERIC_WRITE,
+                          FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, 
CREATE_ALWAYS,
+                          FILE_FLAG_DELETE_ON_CLOSE, NULL);
+  *map = CreateFileMapping (*mappedFile, NULL, PAGE_READWRITE, 1, 0, NULL);
+  ptr = MapViewOfFile (*map, FILE_MAP_READ, 0, 0, 0);
+  if (ptr == NULL)
+    {
+      CloseHandle (*map);
+      CloseHandle (*mappedFile);
+      return 1;
+    }
+#endif
+  return 0;
+}
+
+
+/**
+ * Extract keywords using the given set of plugins.
+ *
  * @param plugins the list of plugins to use
- * @param filename the name of the file, can be NULL 
  * @param data data to process, never NULL
  * @param size number of bytes in data, ignored if data is NULL
+ * @param tdata end of file data, or NULL
+ * @param tsize number of bytes in tdata
  * @param proc function to call for each meta data item found
  * @param proc_cls cls argument to proc
  */
 static void
 extract (struct EXTRACTOR_PluginList *plugins,
-        const char * filename,
         const char * data,
         size_t size,
+        const char * tdata,
+        size_t tsize,
         EXTRACTOR_MetaDataProcessor proc,
         void *proc_cls) 
 {
   struct EXTRACTOR_PluginList *ppos;
+  enum EXTRACTOR_Options flags;
+  void *ptr;
+  void *tptr;
+  char fn[255];
+  char tfn[255];
+  int want_shm;
+  int want_tail;
 #ifndef WINDOWS
   int shmid;
+  int tshmid;
 #else
-  HANDLE map, mappedFile;
+  HANDLE map;
+  HANDLE mappedFile;
+  HANDLE tmap;
+  HANDLE tmappedFile;
 #endif
-  enum EXTRACTOR_Options flags;
-  void *ptr;
-  char fn[255];
-  int want_shm;
 
   want_shm = 0;
   ppos = plugins;
@@ -1472,100 +1596,106 @@
        }      
       ppos = ppos->next;
     }
+  ptr = NULL;
+  tptr = NULL;
   if (want_shm)
     {
-      snprintf (fn,
-               sizeof(fn),
-#ifdef WINDOWS
-               "%TEMP%\\"
+      if (size > MAX_READ)
+       size = MAX_READ;
+      if (0 == make_shm (0, 
+                        &ptr,
+#ifndef WINDOWS
+                        &shmid,
 #else
-               "/"
+                        &mappedFile,
+                        &map,
 #endif
-               "libextractor-shm-%u-%u",
-               getpid(),
-               (unsigned int) RANDOM());
+                        fn, sizeof(fn), size))
+       {
+         memcpy (ptr, data, size);      
+         if ( (tdata != NULL) &&
+              (0 == make_shm (1,
+                              &tptr,
 #ifndef WINDOWS
-      shmid = shm_open (fn, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR);
-      ptr = NULL;
-      if (shmid != -1)
-       {
-         if ( (0 != ftruncate (shmid, size)) ||
-              (NULL == (ptr = mmap (NULL, size, PROT_WRITE, MAP_SHARED, shmid, 
0))) ||
-              (ptr == (void*) -1) )
+                              &tshmid,
+#else
+                              &tmappedFile,
+                              &tmap,
+#endif
+                              tfn, sizeof(tfn), tsize)) )
            {
-             close (shmid);    
-             shmid = -1;
+             memcpy (tptr, tdata, tsize);      
            }
          else
            {
-             memcpy (ptr, data, size);
+             tptr = NULL;
            }
        }
-#else
-      mappedFile = CreateFile (fn, GENERIC_READ | GENERIC_WRITE,
-          FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, CREATE_ALWAYS,
-          FILE_FLAG_DELETE_ON_CLOSE, NULL);
-      map = CreateFileMapping (mappedFile, NULL, PAGE_READWRITE, 1, 0, NULL);
-      ptr = MapViewOfFile (map, FILE_MAP_READ, 0, 0, 0);
-      if (ptr == NULL)
-        {
-          CloseHandle (map);
-          CloseHandle (mappedFile);
-          map = NULL;
-        }
       else
-        memcpy (ptr, data, size);
-#endif
+       {
+         want_shm = 0;
+       }           
     }
-  else
-#ifndef WINDOWS
-    shmid = -1;
-  if (want_shm && (shmid == -1))
-    _exit(1);
-#else
-    map = NULL;
-  if (want_shm && map == NULL)
-    _exit(1);
-#endif
   ppos = plugins;
   while (NULL != ppos)
     {
       flags = ppos->flags;
-#ifndef WINDOWS
-      if (shmid == -1)
-#else
-      if (map == NULL)
-#endif
+      if (! want_shm)
        flags = EXTRACTOR_OPTION_IN_PROCESS;
       switch (flags)
        {
        case EXTRACTOR_OPTION_DEFAULT_POLICY:
-         if (0 != extract_oop (ppos, fn, proc, proc_cls))
+         if (0 != extract_oop (ppos, fn, 
+                               (tptr != NULL) ? tfn : NULL,
+                               proc, proc_cls))
            return;
          if (ppos->cpid == -1)
            {
              start_process (ppos);
-             if (0 != extract_oop (ppos, fn, proc, proc_cls))
+             if (0 != extract_oop (ppos, fn, 
+                                   (tptr != NULL) ? tfn : NULL,
+                                   proc, proc_cls))
                return;
            }
          break;
        case EXTRACTOR_OPTION_OUT_OF_PROCESS_NO_RESTART:
-         if (0 != extract_oop (ppos, fn, proc, proc_cls))
+         if (0 != extract_oop (ppos, fn,
+                               (tptr != NULL) ? tfn : NULL,
+                               proc, proc_cls))
            return;
          break;
        case EXTRACTOR_OPTION_IN_PROCESS:                 
-         if (NULL == ppos->extractMethod)  
+         want_tail = ( (ppos->specials != NULL) &&
+                       (NULL != strstr (ppos->specials,
+                                        "want-tail")));
+         if (NULL == ppos->extractMethod) 
            plugin_load (ppos);     
          if ( ( (ppos->specials == NULL) ||
                 (NULL == strstr (ppos->specials,
-                                 "oop-only")) ) &&
-              (NULL != ppos->extractMethod) &&
-              (0 != ppos->extractMethod (data, 
-                                         size, 
-                                         proc, 
-                                         proc_cls,
-                                         ppos->plugin_options)) )
-           return;
+                                 "oop-only")) ) )
+           {
+             if (want_tail)
+               {
+                 if ( (NULL != ppos->extractMethod) &&
+                      (tdata != NULL) &&
+                      (0 != ppos->extractMethod (tdata, 
+                                                 tsize, 
+                                                 proc, 
+                                                 proc_cls,
+                                                 ppos->plugin_options)) )
+                   return;
+               }
+             else
+               {
+                 if ( (NULL != ppos->extractMethod) &&
+                      (0 != ppos->extractMethod (data, 
+                                                 size, 
+                                                 proc, 
+                                                 proc_cls,
+                                                 ppos->plugin_options)) )
+                   return;
+               }
+           }
          break;
        case EXTRACTOR_OPTION_DISABLED:
          break;
@@ -1580,10 +1710,21 @@
       if (shmid != -1)
        close (shmid);
       shm_unlink (fn);
+      if (NULL != tptr)
+       munmap (tptr, tsize);
+      if (tshmid != -1)
+       close (tshmid);
+      shm_unlink (tfn);
 #else
       UnmapViewOfFile (ptr);
       CloseHandle (map);
       CloseHandle (mappedFile);
+      if (tptr != NULL)
+       {
+         UnmapViewOfFile (tptr);
+         CloseHandle (tmap);
+         CloseHandle (tmappedFile);
+       }
 #endif
     }
 }
@@ -1595,17 +1736,19 @@
  * contents if they were not compressed).
  *
  * @param plugins the list of plugins to use
- * @param filename the name of the file, can be NULL 
  * @param data data to process, never NULL
- * @param size number of bytes in data, ignored if data is NULL
+ * @param size number of bytes in data
+ * @param tdata end of file data, or NULL
+ * @param tsize number of bytes in tdata
  * @param proc function to call for each meta data item found
  * @param proc_cls cls argument to proc
  */
 static void
 decompress_and_extract (struct EXTRACTOR_PluginList *plugins,
-                       const char * filename,
                        const unsigned char * data,
                        size_t size,
+                       const char * tdata,
+                       size_t tsize,
                        EXTRACTOR_MetaDataProcessor proc,
                        void *proc_cls) {
   unsigned char * buf;
@@ -1838,9 +1981,10 @@
       size = dsize;
     }
   extract (plugins,
-          filename,
           (const char*) data,
           size,
+          tdata, 
+          tsize,
           proc,
           proc_cls);
   if (buf != NULL)
@@ -1908,9 +2052,13 @@
 {
   int fd;
   void * buffer;
+  void * tbuffer;
   struct stat fstatbuf;
   size_t fsize;
+  size_t tsize;
   int eno;
+  off_t offset;
+  long pg;
 
   fd = -1;
   buffer = NULL;
@@ -1941,14 +2089,41 @@
   if ( (buffer == NULL) &&
        (data == NULL) )
     return;
+  /* for footer extraction */
+  tsize = 0;
+  tbuffer = NULL;
+  if ( (data == NULL) &&
+       (fstatbuf.st_size > fsize) &&
+       (fstatbuf.st_size > MAX_READ) )
+    {
+      pg = sysconf (_SC_PAGE_SIZE);      
+      if ( (pg > 0) &&
+          (pg < MAX_READ) )
+       {
+         offset = (1 + (fstatbuf.st_size - MAX_READ) / pg) * pg;
+         if (offset < fstatbuf.st_size)
+           {
+             tsize = fstatbuf.st_size - offset;
+             tbuffer = MMAP (NULL, tsize, PROT_READ, MAP_PRIVATE, fd, offset);
+             if ( (tbuffer == NULL) || (tbuffer == (void *) -1) ) 
+               {
+                 tsize = 0;
+                 tbuffer = NULL;
+               }
+           }
+       }
+    }
   decompress_and_extract (plugins,
-                         filename,
                          buffer != NULL ? buffer : data,
                          buffer != NULL ? fsize : size,
+                         tbuffer,
+                         tsize,
                          proc,
                          proc_cls);
   if (buffer != NULL)
     MUNMAP (buffer, fsize);
+  if (tbuffer != NULL)
+    MUNMAP (tbuffer, tsize);
   if (-1 != fd)
     close(fd);  
 }

Modified: Extractor/src/plugins/Makefile.am
===================================================================
--- Extractor/src/plugins/Makefile.am   2010-01-11 22:13:37 UTC (rev 9981)
+++ Extractor/src/plugins/Makefile.am   2010-01-13 13:42:34 UTC (rev 9982)
@@ -86,6 +86,7 @@
   libextractor_flv.la \
   libextractor_gif.la \
   libextractor_html.la \
+  libextractor_id3.la \
   libextractor_id3v2.la \
   libextractor_id3v23.la \
   libextractor_id3v24.la \
@@ -186,6 +187,13 @@
 libextractor_html_la_LIBADD = \
   $(top_builddir)/src/common/libextractor_common.la
 
+libextractor_id3_la_SOURCES = \
+  id3_extractor.c 
+libextractor_id3_la_LDFLAGS = \
+  $(PLUGINFLAGS)
+libextractor_id3_la_LIBADD = \
+  $(top_builddir)/src/common/libextractor_common.la
+
 libextractor_id3v2_la_SOURCES = \
   id3v2_extractor.c 
 libextractor_id3v2_la_LDFLAGS = \

Added: Extractor/src/plugins/id3_extractor.c
===================================================================
--- Extractor/src/plugins/id3_extractor.c                               (rev 0)
+++ Extractor/src/plugins/id3_extractor.c       2010-01-13 13:42:34 UTC (rev 
9982)
@@ -0,0 +1,305 @@
+/*
+     This file is part of libextractor.
+     (C) 2002, 2003, 2004, 2006, 2009, 2010 Vidyut Samanta and Christian 
Grothoff
+
+     libextractor is free software; you can redistribute it and/or modify
+     it under the terms of the GNU General Public License as published
+     by the Free Software Foundation; either version 2, or (at your
+     option) any later version.
+
+     libextractor is distributed in the hope that it will be useful, but
+     WITHOUT ANY WARRANTY; without even the implied warranty of
+     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+     General Public License for more details.
+
+     You should have received a copy of the GNU General Public License
+     along with libextractor; see the file COPYING.  If not, write to the
+     Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+     Boston, MA 02111-1307, USA.
+
+ */
+
+#include "platform.h"
+#include "extractor.h"
+#include "convert.h"
+#include <string.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+typedef struct
+{
+  char *title;
+  char *artist;
+  char *album;
+  char *year;
+  char *comment;
+  const char *genre;
+  unsigned int track_number;
+} id3tag;
+
+static const char *const genre_names[] = {
+  gettext_noop ("Blues"),
+  gettext_noop ("Classic Rock"),
+  gettext_noop ("Country"),
+  gettext_noop ("Dance"),
+  gettext_noop ("Disco"),
+  gettext_noop ("Funk"),
+  gettext_noop ("Grunge"),
+  gettext_noop ("Hip-Hop"),
+  gettext_noop ("Jazz"),
+  gettext_noop ("Metal"),
+  gettext_noop ("New Age"),
+  gettext_noop ("Oldies"),
+  gettext_noop ("Other"),
+  gettext_noop ("Pop"),
+  gettext_noop ("R&B"),
+  gettext_noop ("Rap"),
+  gettext_noop ("Reggae"),
+  gettext_noop ("Rock"),
+  gettext_noop ("Techno"),
+  gettext_noop ("Industrial"),
+  gettext_noop ("Alternative"),
+  gettext_noop ("Ska"),
+  gettext_noop ("Death Metal"),
+  gettext_noop ("Pranks"),
+  gettext_noop ("Soundtrack"),
+  gettext_noop ("Euro-Techno"),
+  gettext_noop ("Ambient"),
+  gettext_noop ("Trip-Hop"),
+  gettext_noop ("Vocal"),
+  gettext_noop ("Jazz+Funk"),
+  gettext_noop ("Fusion"),
+  gettext_noop ("Trance"),
+  gettext_noop ("Classical"),
+  gettext_noop ("Instrumental"),
+  gettext_noop ("Acid"),
+  gettext_noop ("House"),
+  gettext_noop ("Game"),
+  gettext_noop ("Sound Clip"),
+  gettext_noop ("Gospel"),
+  gettext_noop ("Noise"),
+  gettext_noop ("Alt. Rock"),
+  gettext_noop ("Bass"),
+  gettext_noop ("Soul"),
+  gettext_noop ("Punk"),
+  gettext_noop ("Space"),
+  gettext_noop ("Meditative"),
+  gettext_noop ("Instrumental Pop"),
+  gettext_noop ("Instrumental Rock"),
+  gettext_noop ("Ethnic"),
+  gettext_noop ("Gothic"),
+  gettext_noop ("Darkwave"),
+  gettext_noop ("Techno-Industrial"),
+  gettext_noop ("Electronic"),
+  gettext_noop ("Pop-Folk"),
+  gettext_noop ("Eurodance"),
+  gettext_noop ("Dream"),
+  gettext_noop ("Southern Rock"),
+  gettext_noop ("Comedy"),
+  gettext_noop ("Cult"),
+  gettext_noop ("Gangsta Rap"),
+  gettext_noop ("Top 40"),
+  gettext_noop ("Christian Rap"),
+  gettext_noop ("Pop/Funk"),
+  gettext_noop ("Jungle"),
+  gettext_noop ("Native American"),
+  gettext_noop ("Cabaret"),
+  gettext_noop ("New Wave"),
+  gettext_noop ("Psychedelic"),
+  gettext_noop ("Rave"),
+  gettext_noop ("Showtunes"),
+  gettext_noop ("Trailer"),
+  gettext_noop ("Lo-Fi"),
+  gettext_noop ("Tribal"),
+  gettext_noop ("Acid Punk"),
+  gettext_noop ("Acid Jazz"),
+  gettext_noop ("Polka"),
+  gettext_noop ("Retro"),
+  gettext_noop ("Musical"),
+  gettext_noop ("Rock & Roll"),
+  gettext_noop ("Hard Rock"),
+  gettext_noop ("Folk"),
+  gettext_noop ("Folk/Rock"),
+  gettext_noop ("National Folk"),
+  gettext_noop ("Swing"),
+  gettext_noop ("Fast-Fusion"),
+  gettext_noop ("Bebob"),
+  gettext_noop ("Latin"),
+  gettext_noop ("Revival"),
+  gettext_noop ("Celtic"),
+  gettext_noop ("Bluegrass"),
+  gettext_noop ("Avantgarde"),
+  gettext_noop ("Gothic Rock"),
+  gettext_noop ("Progressive Rock"),
+  gettext_noop ("Psychedelic Rock"),
+  gettext_noop ("Symphonic Rock"),
+  gettext_noop ("Slow Rock"),
+  gettext_noop ("Big Band"),
+  gettext_noop ("Chorus"),
+  gettext_noop ("Easy Listening"),
+  gettext_noop ("Acoustic"),
+  gettext_noop ("Humour"),
+  gettext_noop ("Speech"),
+  gettext_noop ("Chanson"),
+  gettext_noop ("Opera"),
+  gettext_noop ("Chamber Music"),
+  gettext_noop ("Sonata"),
+  gettext_noop ("Symphony"),
+  gettext_noop ("Booty Bass"),
+  gettext_noop ("Primus"),
+  gettext_noop ("Porn Groove"),
+  gettext_noop ("Satire"),
+  gettext_noop ("Slow Jam"),
+  gettext_noop ("Club"),
+  gettext_noop ("Tango"),
+  gettext_noop ("Samba"),
+  gettext_noop ("Folklore"),
+  gettext_noop ("Ballad"),
+  gettext_noop ("Power Ballad"),
+  gettext_noop ("Rhythmic Soul"),
+  gettext_noop ("Freestyle"),
+  gettext_noop ("Duet"),
+  gettext_noop ("Punk Rock"),
+  gettext_noop ("Drum Solo"),
+  gettext_noop ("A Cappella"),
+  gettext_noop ("Euro-House"),
+  gettext_noop ("Dance Hall"),
+  gettext_noop ("Goa"),
+  gettext_noop ("Drum & Bass"),
+  gettext_noop ("Club-House"),
+  gettext_noop ("Hardcore"),
+  gettext_noop ("Terror"),
+  gettext_noop ("Indie"),
+  gettext_noop ("BritPop"),
+  gettext_noop ("Negerpunk"),
+  gettext_noop ("Polsk Punk"),
+  gettext_noop ("Beat"),
+  gettext_noop ("Christian Gangsta Rap"),
+  gettext_noop ("Heavy Metal"),
+  gettext_noop ("Black Metal"),
+  gettext_noop ("Crossover"),
+  gettext_noop ("Contemporary Christian"),
+  gettext_noop ("Christian Rock"),
+  gettext_noop ("Merengue"),
+  gettext_noop ("Salsa"),
+  gettext_noop ("Thrash Metal"),
+  gettext_noop ("Anime"),
+  gettext_noop ("JPop"),
+  gettext_noop ("Synthpop"),
+};
+
+#define GENRE_NAME_COUNT \
+    ((unsigned int)(sizeof genre_names / sizeof (const char *const)))
+
+
+
+#define OK         0
+#define INVALID_ID3 1
+
+static void
+trim (char *k)
+{
+  while ((strlen (k) > 0) && (isspace (k[strlen (k) - 1])))
+    k[strlen (k) - 1] = '\0';
+}
+
+static int
+get_id3 (const char *data, size_t size, id3tag * id3)
+{
+  const char *pos;
+
+  if (size < 128)
+    return INVALID_ID3;
+
+  pos = &data[size - 128];
+  if (0 != strncmp ("TAG", pos, 3))
+    return INVALID_ID3;
+  pos += 3;
+
+  id3->title = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
+  trim (id3->title);
+  pos += 30;
+  id3->artist = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
+  trim (id3->artist);
+  pos += 30;
+  id3->album = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
+  trim (id3->album);
+  pos += 30;
+  id3->year = EXTRACTOR_common_convert_to_utf8 (pos, 4, "ISO-8859-1");
+  trim (id3->year);
+  pos += 4;
+  id3->comment = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
+  trim (id3->comment);
+  if ( (pos[28] == '\0') &&
+       (pos[29] != '\0') )
+    {
+      /* ID3v1.1 */
+      id3->track_number = pos[29];
+    }
+  else
+    {
+      id3->track_number = 0;
+    }
+  pos += 30;
+  id3->genre = "";
+  if (pos[0] < GENRE_NAME_COUNT)
+    id3->genre = dgettext (PACKAGE, genre_names[(unsigned) pos[0]]);
+  return OK;
+}
+
+
+#define ADD(s,t) do { if (0 != (ret = proc (proc_cls, "id3", t, 
EXTRACTOR_METAFORMAT_UTF8, "text/plain", s, strlen(s)+1))) goto FINISH; } while 
(0)
+
+
+const char *
+EXTRACTOR_id3_options ()
+{
+  return "want-tail";
+}
+
+
+int 
+EXTRACTOR_id3_extract (const char *data,
+                      size_t size,
+                      EXTRACTOR_MetaDataProcessor proc,
+                      void *proc_cls,
+                      const char *options)
+{
+  id3tag info;
+  char track[16];
+  int ret;
+
+  fprintf (stderr, "called with %llu bytes\n", (unsigned long long) size);
+  if (OK != get_id3 (data, size, &info))
+    return 0;
+  if (strlen (info.title) > 0)
+    ADD (info.title, EXTRACTOR_METATYPE_TITLE);
+  if (strlen (info.artist) > 0)
+    ADD (info.artist, EXTRACTOR_METATYPE_ARTIST);
+  if (strlen (info.album) > 0)
+    ADD (info.album, EXTRACTOR_METATYPE_ALBUM);
+  if (strlen (info.year) > 0)
+    ADD (info.year, EXTRACTOR_METATYPE_PUBLICATION_YEAR);
+  if (strlen (info.genre) > 0)
+    ADD (info.genre, EXTRACTOR_METATYPE_GENRE);
+  if (strlen (info.comment) > 0)
+    ADD (info.comment, EXTRACTOR_METATYPE_COMMENT);
+  if (info.track_number != 0)
+    {
+      snprintf(track, 
+              sizeof(track), "%u", info.track_number);
+      ADD (track, EXTRACTOR_METATYPE_TRACK_NUMBER);
+    }
+FINISH:
+  free (info.title);
+  free (info.year);
+  free (info.album);
+  free (info.artist);
+  free (info.comment);
+  return ret; 
+}
+
+/* end of id3_extractor.c */

Modified: Extractor/src/plugins/mp3_extractor.c
===================================================================
--- Extractor/src/plugins/mp3_extractor.c       2010-01-11 22:13:37 UTC (rev 
9981)
+++ Extractor/src/plugins/mp3_extractor.c       2010-01-13 13:42:34 UTC (rev 
9982)
@@ -36,172 +36,6 @@
 #include <unistd.h>
 #include <stdlib.h>
 
-typedef struct
-{
-  char *title;
-  char *artist;
-  char *album;
-  char *year;
-  char *comment;
-  const char *genre;
-  unsigned int track_number;
-} id3tag;
-
-static const char *const genre_names[] = {
-  gettext_noop ("Blues"),
-  gettext_noop ("Classic Rock"),
-  gettext_noop ("Country"),
-  gettext_noop ("Dance"),
-  gettext_noop ("Disco"),
-  gettext_noop ("Funk"),
-  gettext_noop ("Grunge"),
-  gettext_noop ("Hip-Hop"),
-  gettext_noop ("Jazz"),
-  gettext_noop ("Metal"),
-  gettext_noop ("New Age"),
-  gettext_noop ("Oldies"),
-  gettext_noop ("Other"),
-  gettext_noop ("Pop"),
-  gettext_noop ("R&B"),
-  gettext_noop ("Rap"),
-  gettext_noop ("Reggae"),
-  gettext_noop ("Rock"),
-  gettext_noop ("Techno"),
-  gettext_noop ("Industrial"),
-  gettext_noop ("Alternative"),
-  gettext_noop ("Ska"),
-  gettext_noop ("Death Metal"),
-  gettext_noop ("Pranks"),
-  gettext_noop ("Soundtrack"),
-  gettext_noop ("Euro-Techno"),
-  gettext_noop ("Ambient"),
-  gettext_noop ("Trip-Hop"),
-  gettext_noop ("Vocal"),
-  gettext_noop ("Jazz+Funk"),
-  gettext_noop ("Fusion"),
-  gettext_noop ("Trance"),
-  gettext_noop ("Classical"),
-  gettext_noop ("Instrumental"),
-  gettext_noop ("Acid"),
-  gettext_noop ("House"),
-  gettext_noop ("Game"),
-  gettext_noop ("Sound Clip"),
-  gettext_noop ("Gospel"),
-  gettext_noop ("Noise"),
-  gettext_noop ("Alt. Rock"),
-  gettext_noop ("Bass"),
-  gettext_noop ("Soul"),
-  gettext_noop ("Punk"),
-  gettext_noop ("Space"),
-  gettext_noop ("Meditative"),
-  gettext_noop ("Instrumental Pop"),
-  gettext_noop ("Instrumental Rock"),
-  gettext_noop ("Ethnic"),
-  gettext_noop ("Gothic"),
-  gettext_noop ("Darkwave"),
-  gettext_noop ("Techno-Industrial"),
-  gettext_noop ("Electronic"),
-  gettext_noop ("Pop-Folk"),
-  gettext_noop ("Eurodance"),
-  gettext_noop ("Dream"),
-  gettext_noop ("Southern Rock"),
-  gettext_noop ("Comedy"),
-  gettext_noop ("Cult"),
-  gettext_noop ("Gangsta Rap"),
-  gettext_noop ("Top 40"),
-  gettext_noop ("Christian Rap"),
-  gettext_noop ("Pop/Funk"),
-  gettext_noop ("Jungle"),
-  gettext_noop ("Native American"),
-  gettext_noop ("Cabaret"),
-  gettext_noop ("New Wave"),
-  gettext_noop ("Psychedelic"),
-  gettext_noop ("Rave"),
-  gettext_noop ("Showtunes"),
-  gettext_noop ("Trailer"),
-  gettext_noop ("Lo-Fi"),
-  gettext_noop ("Tribal"),
-  gettext_noop ("Acid Punk"),
-  gettext_noop ("Acid Jazz"),
-  gettext_noop ("Polka"),
-  gettext_noop ("Retro"),
-  gettext_noop ("Musical"),
-  gettext_noop ("Rock & Roll"),
-  gettext_noop ("Hard Rock"),
-  gettext_noop ("Folk"),
-  gettext_noop ("Folk/Rock"),
-  gettext_noop ("National Folk"),
-  gettext_noop ("Swing"),
-  gettext_noop ("Fast-Fusion"),
-  gettext_noop ("Bebob"),
-  gettext_noop ("Latin"),
-  gettext_noop ("Revival"),
-  gettext_noop ("Celtic"),
-  gettext_noop ("Bluegrass"),
-  gettext_noop ("Avantgarde"),
-  gettext_noop ("Gothic Rock"),
-  gettext_noop ("Progressive Rock"),
-  gettext_noop ("Psychedelic Rock"),
-  gettext_noop ("Symphonic Rock"),
-  gettext_noop ("Slow Rock"),
-  gettext_noop ("Big Band"),
-  gettext_noop ("Chorus"),
-  gettext_noop ("Easy Listening"),
-  gettext_noop ("Acoustic"),
-  gettext_noop ("Humour"),
-  gettext_noop ("Speech"),
-  gettext_noop ("Chanson"),
-  gettext_noop ("Opera"),
-  gettext_noop ("Chamber Music"),
-  gettext_noop ("Sonata"),
-  gettext_noop ("Symphony"),
-  gettext_noop ("Booty Bass"),
-  gettext_noop ("Primus"),
-  gettext_noop ("Porn Groove"),
-  gettext_noop ("Satire"),
-  gettext_noop ("Slow Jam"),
-  gettext_noop ("Club"),
-  gettext_noop ("Tango"),
-  gettext_noop ("Samba"),
-  gettext_noop ("Folklore"),
-  gettext_noop ("Ballad"),
-  gettext_noop ("Power Ballad"),
-  gettext_noop ("Rhythmic Soul"),
-  gettext_noop ("Freestyle"),
-  gettext_noop ("Duet"),
-  gettext_noop ("Punk Rock"),
-  gettext_noop ("Drum Solo"),
-  gettext_noop ("A Cappella"),
-  gettext_noop ("Euro-House"),
-  gettext_noop ("Dance Hall"),
-  gettext_noop ("Goa"),
-  gettext_noop ("Drum & Bass"),
-  gettext_noop ("Club-House"),
-  gettext_noop ("Hardcore"),
-  gettext_noop ("Terror"),
-  gettext_noop ("Indie"),
-  gettext_noop ("BritPop"),
-  gettext_noop ("Negerpunk"),
-  gettext_noop ("Polsk Punk"),
-  gettext_noop ("Beat"),
-  gettext_noop ("Christian Gangsta Rap"),
-  gettext_noop ("Heavy Metal"),
-  gettext_noop ("Black Metal"),
-  gettext_noop ("Crossover"),
-  gettext_noop ("Contemporary Christian"),
-  gettext_noop ("Christian Rock"),
-  gettext_noop ("Merengue"),
-  gettext_noop ("Salsa"),
-  gettext_noop ("Thrash Metal"),
-  gettext_noop ("Anime"),
-  gettext_noop ("JPop"),
-  gettext_noop ("Synthpop"),
-};
-
-#define GENRE_NAME_COUNT \
-    ((unsigned int)(sizeof genre_names / sizeof (const char *const)))
-
-
 #define MAX_MP3_SCAN_DEEP 16768
 const int max_frames_scan = 1024;
 enum
@@ -270,64 +104,15 @@
 #define SYSERR     1
 #define INVALID_ID3 2
 
-static void
-trim (char *k)
-{
-  while ((strlen (k) > 0) && (isspace (k[strlen (k) - 1])))
-    k[strlen (k) - 1] = '\0';
-}
-
-static int
-get_id3 (const char *data, size_t size, id3tag * id3)
-{
-  const char *pos;
-
-  if (size < 128)
-    return INVALID_ID3;
-
-  pos = &data[size - 128];
-  if (0 != strncmp ("TAG", pos, 3))
-    return INVALID_ID3;
-  pos += 3;
-
-  id3->title = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
-  trim (id3->title);
-  pos += 30;
-  id3->artist = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
-  trim (id3->artist);
-  pos += 30;
-  id3->album = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
-  trim (id3->album);
-  pos += 30;
-  id3->year = EXTRACTOR_common_convert_to_utf8 (pos, 4, "ISO-8859-1");
-  trim (id3->year);
-  pos += 4;
-  id3->comment = EXTRACTOR_common_convert_to_utf8 (pos, 30, "ISO-8859-1");
-  trim (id3->comment);
-  if ( (pos[28] == '\0') &&
-       (pos[29] != '\0') )
-    {
-      /* ID3v1.1 */
-      id3->track_number = pos[29];
-    }
-  else
-    {
-      id3->track_number = 0;
-    }
-  pos += 30;
-  id3->genre = "";
-  if (pos[0] < GENRE_NAME_COUNT)
-    id3->genre = dgettext (PACKAGE, genre_names[(unsigned) pos[0]]);
-  return OK;
-}
-
-
 #define ADDR(s,t) do { if (0 != proc (proc_cls, "mp3", t, 
EXTRACTOR_METAFORMAT_UTF8, "text/plain", s, strlen(s)+1)) return 1; } while (0)
 
-static int
-mp3parse (const unsigned char *data, size_t size,
-         EXTRACTOR_MetaDataProcessor proc,
-         void *proc_cls)
+/* mimetype = audio/mpeg */
+int 
+EXTRACTOR_mp3_extract (const unsigned char *data,
+                      size_t size,
+                      EXTRACTOR_MetaDataProcessor proc,
+                      void *proc_cls,
+                      const char *options)
 {
   unsigned int header;
   int counter = 0;
@@ -474,50 +259,4 @@
   return 0;
 }
 
-
-#define ADD(s,t) do { if (0 != (ret = proc (proc_cls, "mp3", t, 
EXTRACTOR_METAFORMAT_UTF8, "text/plain", s, strlen(s)+1))) goto FINISH; } while 
(0)
-
-
-/* mimetype = audio/mpeg */
-int 
-EXTRACTOR_mp3_extract (const char *data,
-                      size_t size,
-                      EXTRACTOR_MetaDataProcessor proc,
-                      void *proc_cls,
-                      const char *options)
-{
-  id3tag info;
-  char track[16];
-  int ret;
-
-  if (0 != get_id3 (data, size, &info))
-    return 0;
-  if (strlen (info.title) > 0)
-    ADD (info.title, EXTRACTOR_METATYPE_TITLE);
-  if (strlen (info.artist) > 0)
-    ADD (info.artist, EXTRACTOR_METATYPE_ARTIST);
-  if (strlen (info.album) > 0)
-    ADD (info.album, EXTRACTOR_METATYPE_ALBUM);
-  if (strlen (info.year) > 0)
-    ADD (info.year, EXTRACTOR_METATYPE_PUBLICATION_YEAR);
-  if (strlen (info.genre) > 0)
-    ADD (info.genre, EXTRACTOR_METATYPE_GENRE);
-  if (strlen (info.comment) > 0)
-    ADD (info.comment, EXTRACTOR_METATYPE_COMMENT);
-  if (info.track_number != 0)
-    {
-      snprintf(track, 
-              sizeof(track), "%u", info.track_number);
-      ADD (track, EXTRACTOR_METATYPE_TRACK_NUMBER);
-    }
-  ret = mp3parse ((const unsigned char *) data, size, proc, proc_cls);
-FINISH:
-  free (info.title);
-  free (info.year);
-  free (info.album);
-  free (info.artist);
-  free (info.comment);
-  return ret; 
-}
-
 /* end of mp3_extractor.c */





reply via email to

[Prev in Thread] Current Thread [Next in Thread]