nuxeo-localizer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nuxeo-localizer] Re: Searching multilingual CMF sites


From: Juan David Ibáñez Palomar
Subject: Re: [Nuxeo-localizer] Re: Searching multilingual CMF sites
Date: Mon, 10 Mar 2003 16:57:00 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9

Greg Ward wrote:

On 07 March 2003, To address@hidden said:
[sorry for posting to two lists, but I'm really not sure if the
Localizer community or the CMF community is the right place to ask!]

What's the best way to implement searching on a multilingual site?  I've
got a CMF site with bilingual content up-and-running thanks to
Localizer, and managed to cobble together a fairly functional "search"
box by stealing some scripts from Plone.  But it gets weird when you
cross language boundaries.

OK, I'm well on the way to solving this problem.  Thought I'd share my
approach for posterity -- future archive-searchers will no doubt thank
me.  ;-)


Maybe a howto ;-)


Turns out this was a CMF/Zope question; Localizer barely enters into it.
(It's only needed to find out the user's current language at search
time.)  Here's what I did:

 * in $portal/portal_catalog, create two new vocabularies:
   vocab_en and vocab_fr

 * then pop over to the "Indexes" tab and create two new indeces:
   SearchableText_en and SearchableText_fr.  Use the corresponding
   language-specific vocabulary in each index.

 * I already had a SearchableText() method in LocDublinCore,
   which all of my content classes inherit from (shamelessly
   stolen from Rainer Thaden's LocCMFProduct); I extended it to
   have a language-neutral mode and language-specific modes,
   then added trivial SearchableText_en() and SearchableText_fr(
   wrappers.  Here's the code:

     def SearchableText (self, language=None):
         words = []
         for pty in self._local_properties.keys():
             pty_val = self._local_properties[pty]
             if language is None:        # index all languages
                 for (lang, val) in pty_val.items():
                     if lang and val:
                         words.append(val)
             else:                       # only index selected language
                 val = pty_val.get(language)
                 if val:
                     words.append(val)

         return " ".join(words)

     def SearchableText_en (self):
         return self.SearchableText(language="en")

     def SearchableText_fr (self):
         return self.SearchableText(language="fr")

   This is fairly evil, since it grubs rudely through data structures
   inherited from LocalPropertyManager (part of Localizer).  I didn't
   see a clean + efficient way to do this, so I went with rude +
   efficient.  ;-(

   Also, hard-coding the set of languages into those two wrapper
   methods is Just Wrong.  I think I can get around that with a clever
   __getattr__() method, but haven't done that yet.


LocalPropertyManager already implements __getattr__, if "title"
is a local property and your object has the languages english
and spanish then __getattr__ provides the attributes "title_en"
and "title_es". Maybe you can use them someway. Or see the LPM
source for the __gettattr__ code.


 * finally, I modified the search method to select the index to
   search based on the user's current language.  My search form
   looks (roughly) like this:

     <form name="searchform" action="search"
           tal:attributes="action string:${portal_url}/search" method="GET">
<input id="searchGadget" name="text" type="text" size="15" value="">
     </form>

   And here's the Python Script that processes this form:

     text = context.REQUEST.get("text")
     if text:
         lang = context.Localizer.get_selected_language()
         key = "SearchableText_%s" % lang
         query = {key : text}
         return context.portal_catalog(query)
     else:
         return []

...and this works fine!  There are only two problems left:

 * search results are shown in the language that was current when
   the object was cataloged, presumably because of the way ZCatalog
   harvests meta-data at catalog-time.  I suspect I can fix this if
   I can persuade ZCatalog to harvest meta-data in all available
   languages.


You can use the "title_en", "title_es", etc.. attributes.


 * searching for words with non-ASCII characters is tricky -- IMHO,
   searching for "francais" should yield the same as searching for
   "français", ie. the index should take care of collapsing accented
   characters somehow.  But I'm no linguist -- that might just
   squeak by with accents in French, but whether the same approach
   would work for Nordic å or German ß, I don't know.  Anyways,
   this should be up to either the index or the vocabulary -- it's
   not my problem!



Regards,

--
J. David Ibáñez, http://www.j-david.net
Software Engineer / Ingénieur Logiciel / Ingeniero de Software






reply via email to

[Prev in Thread] Current Thread [Next in Thread]